• ✏️ Home
  • Posts
  • Projects
  • Newsletter

Code transformations using codemod and regex101

Interactive, language-agnostic workflow for match-and-transform code refactoring

By Przemek, November 2020

An essential part of a software engineering workflow is a good method for applying code transformations across a codebase: renaming variables and functions, reordering arguments, changing import statements, etc. Personally I have the following goals in mind when thinking about a refactoring workflow:

  • language and editor-agnostic: I want to learn at least one general method that works for any codebase and is not tied to a specific editor setup.
  • interactive with visual feedback: I want to see a preview of each substitution before it’s applied
  • easy to hold correctly: Simple text substitutions should be really easy to apply. Any complexity should come from the complexity of the transformation, not from the tooling.

In these notes we describe a workflow that seems to work great for these objectives.

Background: Unix tool soup

Firstly, let’s take a quick look at what we’re not going to do. One standard way of applying transformations across files is to use a combination of standard Unix command line tools with a carefully crafted regular expression. In its simplest form this can look like this:

find . -type f -name 'example*' | xargs sed -i 's/bazinga_/foo_/g'

The command above uses find to find all files in the current directory whose name start with example, then uses sed to substitute all instances of bazinga_ with foo_. The intended goal is probably to replace all ocurrences of a member variable bazinga_ by foo_ in both example.cc and example.h. Probably.

Uff, but that’s already pretty convoluted and we didn’t even need to use a real regular expression in it! Is there another way?

Enter codemod

codemod is a command line tool developed at Facebook which replaces the Unix tool soup of find, pipes, xargs, grep, sed with an intuitive, interactive and iterative interface.

We can install it via pip (pip3 install --user codemod), and then

codemod bazinga_ foo_

will run the given substitution (replace instances of bazinga_ with foo_) in the current directory. Much simpler to invoke than the standard Unix approach, but it gets better than that – codemod is interactive, meaning that it walks us through all occurences of bazinga_ and displays a preview of each substitution for review. In mass substitutions, we can hit A once we have the confidence that the substution is correct to apply it automagically for all matches.

codemod presents a preview of each match, hit enter to accept and continue

codemod presents a preview of each match, hit enter to accept and continue

I found that replacing ad hoc chains of find, xargs, grep & sed with codemod was an immediate quality of life improvement: the tool is pretty natural to hold right (codemod <replace this> <with that>), exactly not like the Unix tool soup, and its very satisfying to refactor your way through all matches by hitting enter repeatedly.

Regular expressions

XKCD 208: Regular Expressions.

XKCD 208: Regular Expressions.

So what about making changes that cannot be expressed as simple substitutions? The canonical mechanism for crafting non-trivial find-and-replace operations is using regular expressions, ie. a language for defining patterns and transformations. In the Unix tool soup approach we can enter the regular expression as the argument given to sed, and codemod similarly does support regular expressions. Can we somehow make the process of preparing the regex for codemod easier and, again, more interactive?

Enter regex101

Regex101 is a web tool allowing us to craft regex-based transformations via interactive experimentation, and is a great companion to codemod. We’ll see how the process of finding the right regex works using an example of reordering function arguments: we want to replace all instances of RunBazinga(a, b) with RunBazinga(b, a), with a and b possibly being non-trivial expressions.

We go to https://regex101.com/ and we start with settings – we should pick “Python” as the flavor of the transformation (because this is the syntax that codemod uses internally. Note that “Python” here is just the regular expression language – it doesn’t matter what language the codebase we want to transform is written in) and “Substitution” as the mode of operation:

The second thing we should do is to populate the list of example strings to test our transformation on. We add those examples before starting to write the regular expression, so that we can get immediate interactive feedback when we get to that. For example, we may put together a set of examples like this one:

RunBazinga(foo, 42)
RunBazinga(foo, 21 * 2)
RunBazinga(foo, (21 * 2))
DispatchRobots(foo, 42)

Here we will want to make sure that the three first examples are correctly transformed, with the order of the arguments changed, while the last example remains unchanged.

Now that we know what we want to do, time to write the first half of the regular expression – the one that matches the instances to be transformed. We start by typing RunBazinga in the “Regular expression” field, and immediately see it matched in the example list below:

Then, we add \( to match the opening parenthesis. If we forget the escape \, we will immediately see that due to immediate feedback. We can also hover over any part of the expression to see the explanation.

Because our transformation will reorder two arguments, we want to capture the value of each, so that we can then put them together in a different order. The simplest way to capture something is the magic sequence if (.*). As we want to capture two comma separated arguments, the resulting expression looks like this: RunBazinga\((.*), (.*)\). Once more, we get immediate visual feedback each step of the way:

Once we have the regular expression used to match instances of the transformation ready, we move on to write the substitution expression below. It’s even simpler – we just need to know that \1 can be used to insert the content matched by the first capture group, \2 for the second capture group, and so on. In order to reconstruct each instance of RunBazinga(a, b), but with the arguments swapped, we just write RunBazinga(\2, \1). Once again, the tool presents a preview of the transformation below the expression.

Once we are happy with the results, we simply run codemod passing the matching regular expression as the first arguments, and the substitution expression as the second argument:

codemod "RunBazinga\((.*), (.*)\)" "RunBazinga(\2, \1)"

Caveats

There are a few caveats to note on this approach:

  • regular expressions will often have corner cases. For example, the RunBazinga argument reordering will fail on cases where there is a comma in the subexpressions, e.g. RunBazinga(SubCall(1, 1), SubCall(1, 1)). We leave finding a way to address this as an exercise for the reader ;).
  • if your editor setup supports managed refactorings powered by language server, that can be even more handy (right click on a function, “reorder arguments”) and not prone to corner cases as noted above – by all means use such features when available. The point of the workflow described here is to have a method that will always be in our backpocket regardless of the capabilities of the editor setup.

Summary

In these notes we saw how codemod and regex101 can be used to prepare and apply match-and-transform refactorings in a way that’s interactive, visual, and doesn’t actually require good mastery of regular expressions.

If you have any thoughts on this workflow or want to share your own approach to refactoring, please leave us a comment in the section below!

Stay in touch! ✉️ Free monthly newsletter, short and sweet: links to new posts on the blog + up to three prompts for thought (example). Join below, all 19 existing subscribers cannot be wrong 😅.

Przemek

Topics

  • Command line

Outline

  • Background: Unix tool soup
  • Enter codemod
  • Regular expressions
  • Enter regex101
  • Caveats
  • Summary
Przemek Pietrzkiewicz