Interactive, language-agnostic workflow for match-and-transform code refactoring
An essential part of a software engineering workflow is a good method for
applying code transformations across a codebase: renaming variables and
functions, reordering arguments, changing import statements, etc. Personally I
have the following goals in mind when thinking about a refactoring workflow:
language and editor-agnostic: I want to learn at least one general
method that works for any codebase and is not tied to a specific editor
interactive with visual feedback: I want to see a preview of
each substitution before it’s applied
easy to hold correctly: Simple text substitutions should be really easy
to apply. Any complexity should come from the complexity of the transformation,
not from the tooling.
In these notes we describe a workflow that seems to work great for these
Background: Unix tool soup
Firstly, let’s take a quick look at what we’re not going to do. One standard
way of applying transformations across files is to use a combination of standard
Unix command line tools with a carefully crafted regular expression. In its
simplest form this can look like this:
find . -type f -name 'example*' | xargs sed -i 's/bazinga_/foo_/g'
The command above uses find to find all files in the current
directory whose name start with example, then uses sed to substitute
all instances of bazinga_ with foo_. The intended goal is probably to
replace all ocurrences of a member variable bazinga_ by foo_
in both example.cc and example.h. Probably.
Uff, but that’s already pretty convoluted and we didn’t even need to use a real
regular expression in it! Is there another way?
is a command line tool developed
at Facebook which replaces the Unix tool soup of find, pipes, xargs, grep,
sed with an intuitive, interactive and iterative interface.
We can install it via pip (pip3 install --user codemod), and then
codemod bazinga_ foo_
will run the given substitution (replace instances of bazinga_ with foo_) in
the current directory. Much simpler to invoke than the standard Unix approach,
but it gets better than that – codemod is interactive, meaning that it walks
us through all occurences of bazinga_ and displays a preview of each
substitution for review. In mass substitutions, we can hit A once we have the
confidence that the substution is correct to apply it automagically for all
I found that replacing ad hoc chains of find, xargs, grep & sed with
codemod was an immediate quality of life improvement: the tool is pretty natural
to hold right (codemod <replace this> <with that>), exactly not like the
Unix tool soup, and its very satisfying to refactor your way through all matches
by hitting enter repeatedly.
So what about making changes that cannot be expressed as simple substitutions?
The canonical mechanism for crafting non-trivial find-and-replace operations is
using regular expressions, ie. a language for defining patterns and
transformations. In the Unix tool soup approach we can enter the regular
expression as the argument given to sed, and codemod similarly does support
regular expressions. Can we somehow make the process of preparing the regex for
codemod easier and, again, more interactive?
is a web tool allowing us to craft regex-based
transformations via interactive experimentation, and is a great companion to
codemod. We’ll see how the process of finding the right regex works using an
example of reordering function arguments: we want to replace all instances of
RunBazinga(a, b) with RunBazinga(b, a), with a and b possibly being
We go to https://regex101.com/ and we start with settings – we should pick
“Python” as the flavor of the transformation (because this is the syntax that
codemod uses internally. Note that “Python” here is just the regular expression
language – it doesn’t matter what language the codebase we want to transform is
written in) and “Substitution” as the mode of operation:
The second thing we should do is to populate the list of example strings to
test our transformation on. We add those examples before starting to write the
regular expression, so that we can get immediate interactive feedback when we
get to that. For example, we may put together a set of examples like this one:
Here we will want to make sure that the three first examples are correctly
transformed, with the order of the arguments changed, while the last example
Now that we know what we want to do, time to write the first half of the regular
expression – the one that matches the instances to be transformed. We start by
typing RunBazinga in the “Regular expression” field, and immediately see it
matched in the example list below:
Then, we add \( to match the opening parenthesis. If we forget the escape \,
we will immediately see that due to immediate feedback. We can also hover over
any part of the expression to see the explanation.
Because our transformation will reorder two arguments, we want to capture the
value of each, so that we can then put them together in a different order. The
simplest way to capture something is the magic sequence if (.*). As we want to
capture two comma separated arguments, the resulting expression looks like this:
RunBazinga\((.*), (.*)\). Once more, we get immediate visual feedback each
step of the way:
Once we have the regular expression used to match instances of the
transformation ready, we move on to write the substitution expression below.
It’s even simpler – we just need to know that \1 can be used to insert the
content matched by the first capture group, \2 for the second capture group,
and so on. In order to reconstruct each instance of RunBazinga(a, b), but with
the arguments swapped, we just write RunBazinga(\2, \1). Once again, the tool
presents a preview of the transformation below the expression.
Once we are happy with the results, we simply run codemod passing the matching
regular expression as the first arguments, and the substitution expression as
the second argument:
regular expressions will often have corner cases. For example, the
RunBazinga argument reordering will fail on cases where there is a comma in
the subexpressions, e.g. RunBazinga(SubCall(1, 1), SubCall(1, 1)). We leave
finding a way to address this as an exercise for the reader ;).
if your editor setup supports managed refactorings powered by language
server, that can be even more handy (right click on a function, “reorder
arguments”) and not prone to corner cases as noted above – by all means use
such features when available. The point of the workflow described here is to
have a method that will always be in our backpocket regardless of the
capabilities of the editor setup.
In these notes we saw how codemod and regex101 can be used to prepare and apply
match-and-transform refactorings in a way that’s interactive, visual, and
doesn’t actually require good mastery of regular expressions.
If you have any thoughts on this workflow or want to share your own approach to
refactoring, please leave us a comment in the section below!