This is the second part of the Protecting a Python codebase series. This time we will be playing with Python Abstract Syntax Trees in order to protect the original code of a Python based project.

Python Abstract Syntax Trees

As the Python Documentation says:

The ast module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like.

The built-in module provides the tools to navigate, create or modify a tree. A Python AST is tree-like representation of the python code that’s closer to the bytecode implementation than the source code itself.

Take for instance the code in a little python module named hello.py:

1
2
def hello_world(name='World'):
    print('Hello {0}'.format(name))

We can get the tree representation using the ast module:

1
2
3
4
5
import ast

with open('hello.py') as f:
    code = ''.join(f.readlines())
    root = ast.parse(code)

We can also write a node visitor to inspect the tree structure:

1
2
3
4
5
6
7
8
9
10
11
import ast

class Visitor(ast.NodeVisitor):
    def __init__(self):
        self.stack = 0

    def visit(self, node):
        print "  " * self.stack, node
        self.stack += 1
        super(Visitor, self).visit(node)
        self.stack -= 1

Applying the visitor to the parsed code will result in something like:

>>> Visitor().visit(node)
<_ast.Module object at 0x7fb5200282d0>
  <_ast.FunctionDef object at 0x7fb5200281d0>
    <_ast.arguments object at 0x7fb520028210>
      <_ast.Name object at 0x7fb520028d90>
        <_ast.Param object at 0x7fb521c46810>
      <_ast.Str object at 0x7fb520028550>
    <_ast.Print object at 0x7fb520028dd0>
      <_ast.Call object at 0x7fb520028ed0>
        <_ast.Attribute object at 0x7fb520028f10>
          <_ast.Str object at 0x7fb520028f50>
          <_ast.Load object at 0x7fb521c46590>
        <_ast.Name object at 0x7fb520028f90>
          <_ast.Load object at 0x7fb521c46590>

Each element has its own set of attributes to fulfill the operation, Module objects have a body attribute which is a list with the different elements defined in it. Call have the function name to call (in the example above it refers to the format method in String). There’s a _field property with the list of attributes supported by the different types.

Python Abstract Syntax Trees are a really powerful tool, we can build really neat tools with them, tools to gather code statistics, tools to translate Python to other languages. We will use it to write a code scrambling and unscrambling tool, that combined with an import hook will serve as a protection mechanism.

For that we need a NodeTransformer class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from ast import NodeTransformer


class Scrambler(NodeTransformer):
    HEADER = '# scrambled'

    def __init__(self, scramble=True):
        self.do_scramble = scramble

    def visit(self, node):
        node_out = super(Scrambler, self).visit(node)
        if hasattr(node_out, 'body') and isinstance(node_out.body, list):
            if self.do_scramble:
                node_out.body = self.scramble(node_out.body)
            else:
                node_out.body = self.unscramble(node_out.body)
        return node_out

    def scramble(self, items):
        return self._step2(self._step1(items[:]))

    def unscramble(self, items):
        return self._step1(self._step2(items[:]))

    def _step1(self, items):
        i = 0
        length = len(items)
        while (i + 1) < length:
            items[i], items[i + 1] = items[i + 1], items[i]
            i += 2
        return items

    def _step2(self, items):
        length = len(items)
        if length % 2 == 0:
            items[:length / 2], items[length / 2:] = \
                items[length / 2:], items[:length / 2]
        else:
            items[:(length - 1) / 2], items[(length + 1) / 2:] = \
                items[(length + 1) / 2:], items[:(length - 1) / 2]
        return items

This code will take a Python AST, and scramble it by swapping them items in those with a body attribute that’s of type list. The transformation applied will be similar to:

>>> original = [1, 2, 3, 4, 5, 6]
>>> after_step_1 = step1(original)
[2, 1, 4, 3, 6, 5]
>>> after_step_2 = step2(after_step_1)
[3, 6, 5, 2, 1, 4]

Which is easy to transform back, it’s enough to apply the transformations in the reverse order to get the original object.

If we apply it to the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import os
import pprint


def files_tree(path=None):
    out = {}
    path = path or os.getcwd()
    files = os.listdir(path)
    for file in files:
        if file.startswith('.'):
            continue

        sub_path = os.path.join(path, file)
        if os.path.isdir(sub_path):
            value = files_tree(sub_path)
        else:
            value = sub_path
        out[file] = value
    return out


tree = files_tree()
print 'Items in the top level: {0}'.format(len(tree.keys()))
print 'Items in the second level: {0}'.format(
    sum(len(val.keys()) for val in tree.values()
        if isinstance(val, dict))
)
pprint.pprint(tree)

It will come out as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# scrambled

print 'Items in the second level: {0}'.format(sum((len(val.keys()) for val in tree.values() if isinstance(val, dict))))
print 'Items in the top level: {0}'.format(len(tree.keys()))
pprint.pprint(tree)

def files_tree(path=None):
    files = os.listdir(path)
    return out
    for file in files:
        out[file] = value
        if os.path.isdir(sub_path):
            value = files_tree(sub_path)
        else:
            value = sub_path
        sub_path = os.path.join(path, file)
        if file.startswith('.'):
            continue
    path = (path or os.getcwd())
    out = {}
import pprint
import os
tree = files_tree()

And this is just a simple transformation, more changes can be implemented in order to make things a bit more interesting, some options to take into account are:

  • Change parameters order in functions
  • Change decorators order in functions
  • Change arguments order in function calls
  • Change arguments order in class creations
  • Rename variable names
  • Encode strings with rot13 or similar

This scrambler with the corresponding import hook are published in the companion examples repository.

Conclusion

Python Abstract Syntax Trees is a very versatile tool that allows us to create a solution that better fits the project where it’s used.

But, as it’s mentioned in the previous article, the code can be inspected from a shell once it was imported.

If you plan digging more in Python AST, I recommend Green Tree Snakes - the missing Python AST docs docs.