move-literals - Move string literals

Maybe you're writing a reverse engineering challenge, and you want to obfuscate your code. Maybe you're stuck with a huge legacy codebase in dire need of refactoring. You want to perform a set of operations on source code, but doing it manually is iterative, time consuming and boring. Your IDE isn't as helpful as you would want it to be. Maybe you try using awk or sed and regular expressions, but after a while you need another tool in your toolbox not provided by regular or context-free grammars.

Enter Parsing Expression Grammars and LPeg.

It's fast, it's elegant, it makes you never want to think about regular expressions ever again.

This repo contains an example of how PEGs can be used. The example finds all string literals in a context where they would be compiled, and replaces them with preprocessor definitions. Why? Well, sometimes you just want to move all your strings around. You don't have to move them, you could just as easy replace them with a rot13, base64 encoded version and force push it to your company's master branch. Try it out!

dependencies

Lua (probably/maybe > 5.1) LPeg

example

$ cat example.c

#include <stdio.h>

#ifdef __FOO_PLATFORM
#error \
  "unsupported platform"
#endif

int main(int argc, char *argv[]) {
  // char *x = "foobar";
  char *x = "foobarbaz";
  printf("%s\n", x);
  return 42;
}

$ ./move-literals.lua example.c > result.c
$ cat result.c

#define STRSYM_FOOBARBAZ \
   "foobarbaz"
#define STRSYM__S_N \
   "%s\n"
#include <stdio.h>

#ifdef __FOO_PLATFORM
#error \
  "unsupported platform"
#endif

int main(int argc, char *argv[]) {
  // char *x = "foobar";
  char *x = STRSYM_FOOBARBAZ;
  printf(STRSYM__S_N, x);
  return 42;
}

dealing with parse errors

move-literals.lua accepts any input, which is not always what you want.

Signalling syntax errors can be done by using match time captures (Cmt) or the function capture operator '/' together with a rule that should never be encountered on valid input:

  lpeg = require "lpeg"
  P, Cmt, V, S = lpeg.P, lpeg.Cmt, lpeg.V, lpeg.S
  function errfunc(match)
    error("invalid token: "..tostring(match))
  end
  P{
    "tokens";
    space   = S" \t",
    invalid = (1-V"space")^1/errfunc,
    token   = P"foo" + P"bar" + P "baz" + V"invalid",
    tokens  = (V"token" * V"space"^0)^0  * -1
  }:match("foo bar baz woops foo")

  lpeg = require "lpeg"
  P, Cmt, V, S = lpeg.P, lpeg.Cmt, lpeg.V, lpeg.S
  function errfunc(match, pos, cap)
    error(string.format("invalid token at position %d: %s",
      pos-#cap, cap))
  end
  p = P{
    "tokens";
    space   = S" \t",
    invalid = Cmt((1-V"space")^1, errfunc),
    token   = P"foo" + P"bar" + P "baz" + V"invalid",
    tokens  = (V"token" * V"space"^0)^0  * -1
  }:match("foo bar baz woops foo")

In more complex grammars, you may need to signal different types of syntax errors. This can be done by adding multiple rules and referencing them with V in places where you can only end up on invalid input.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
example.c		example.c
move-literals.lua		move-literals.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

move-literals - Move string literals

dependencies

example

dealing with parse errors

About

Releases

Packages

Languages

License

sebcat/move-literals

Folders and files

Latest commit

History

Repository files navigation

move-literals - Move string literals

dependencies

example

dealing with parse errors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages