Simple, hassle-free, dependency-free, AST based source code refactoring toolkit.
refactor is an end-to-end refactoring framework that is built on top
of the ‘simple but effective refactorings’ assumption. It is much easier
to write a simple script with it rather than trying to figure out what
sort of a regex you need in order to replace a pattern (if it is even
matchable with regexes).
Every refactoring rule offers a single entrypoint,
match(), where they
AST node (from the
ast module in the standard library) and
respond with either returning an action to refactor or nothing. If the
rule succeeds on the input, then the returned action will build a
replacement node and
refactor will simply replace the code segment
that belong to the input with the new version.
Here is a complete script that will replace every
42 (not the definitions) on the given list of files:
import ast from refactor import Rule, ReplacementAction, run class Replace(Rule): def match(self, node): assert isinstance(node, ast.Name) assert node.id == 'placeholder' replacement = ast.Constant(42) return ReplacementAction(node, replacement) if __name__ == "__main__": run(rules=[Replace])
If we run this on a file,
refactor will print the diff by default;
--- test_file.py +++ test_file.py @@ -1,11 +1,11 @@ def main(): - print(placeholder * 3 + 2) - print(2 + placeholder + 3) + print(42 * 3 + 2) + print(2 + 42 + 3) # some commments - placeholder # maybe other comments + 42 # maybe other comments if something: other_thing - print(placeholder) + print(42) if __name__ == "__main__": main()
CST vs AST¶
It is a common misconception that AST should not be used for source code refactorings since it doesn’t preserve any of the stylings about the code that is visible to humans. Even though this statement is partially true, it is wrong on the point of “we can’t/shouldn’t do any transformations through AST”. As explained above, we aim to tackle the smaller and simpler problems (e.g refactoring simple expressions statements) and while doing that we preserve all details about the surrounding code. And even for the stuff in the same line, we preserve as much as we can (e.g refactoring a simple name between 2 different operations won’t change any style). It is obviously possible to abuse this and do full source refactors, in that case, you will lose most of the information, which even though is not preferred, might apply to some use cases (e.g feeding the output directly to the interpreter).
We have some great CST implementations (parso,
LibCST) and even though they are pretty useful for
doing major transformations, they can’t be expected to keep up with the latest syntax updates
on the upstream python. It is also an extra layer of indirection in some cases, considering that
it is a general practice to do analysis on the AST and refactoring on the CST and for most of the
cases these would be interchangeable through
refactor. In any scenario, I’d highly recommend you
to check out these libraries (as well as some tools like Fixit)
if you are interested in doing a considerable amount of source code processing.