refactoring – On Responsive Development

C macros are probably one of my least favourite things… So when I encounter them my spontaneous reaction is to do away with them. They have a nasty resemblance to functions but they have completely different semantics, text replacement, and are usually almost impossible to get under test. So it’s better to get rid of them.

In my current hobby project, refactoring the C refactoring tool c-xrefactory (https://github.com/thoni56/c-xrefactory) to a maintainable state, I encountered a particularly macro-infested area, the lexer. I’ll leave my speculations about how the code gotten this way for another day, and focus on a fairly mechanical sequence of steps that I have been using to refactor a number of macros into nice, clean, testable C functions.

Here is a short excerpt of some code using a macro:

static  void processLine(void) {
    Lexem lexem;
    int l, h, v=0, len;
    Position pos;

    ...

    PassLex(cInput.currentLexem, lexem, l, v, h, pos, len, 1);
    if (lexem != CONSTANT) return;
...

Here’s (again, a part of) the macro (yes, it’s long…) :

#define PassLex(input, lexem, lineval, val, hash, pos, length, linecount) { \
        if (lexem > MULTI_TOKENS_START) {                                 \
            if (isIdentifierLexem(lexem)){                                \
                char *tmpcc,tmpch;                                        \
                hash = 0;                                                 \
                for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc) { \
                    SYMTAB_HASH_FUN_INC(hash, tmpch);                     \
                }                                                         \
                SYMTAB_HASH_FUN_FINAL(hash);                              \
                tmpcc ++;                                                 \
                GetLexPosition((pos),tmpcc);                              \
                input = tmpcc;                                            \
            } else if (lexem == STRING_LITERAL) {                         \
                char *tmpcc,tmpch;                                        \
                for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc);  \
                tmpcc ++;                                                 \
                GetLexPosition((pos),tmpcc);                              \
                Input = tmpcc;                                            \
            } else if (lexem == LINE_TOK) {                               \
                GetLexToken(lineval,input);                               \
...

As you can see there are multiple levels of macros invoking macros invoking macros. So it is extremely difficult and frustrating to try to follow the expansion manually. Of course, you could let the C preprocessor do it, which I tried, but that turns the code into something almost impossible to recognize, so that did not work out well.

The strategy I devised was to first manually expand one of the invocations, then do any adjustments necessary to be able to do the third step, extracting to a “real” C function. Luckily, c-xrefactory is fully operational so I could use it to do the extraction.

Here are the detailed steps:

1 – Align local variable names with macro arguments

Macro semantics are textual replacement, e.g. the formal arguments to a macro can be seen as the text, not the value or even address, you use as the actual argument in the invocation. A clever way to do away with this replacement is to use actual arguments that are names that are exactly the same as the formal arguments, then you can just paste the body of the macro in place for the invocation.

In the example we would change

    Lexem lexem;
    int l, h, v=0, len;
    Position pos;
...
    PassLex(cInput.currentLexem, lexem, l, v, h, pos, len, 1);

so that variable names align, to

    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
...
    PassLex(cInput.currentLexem, lexem, lineval, val, hash, pos, length, 1);

Now, it we would paste the body of the macro here, most of the use of arguments in the body would just match local variables.

2. Create adapter variables

The first and last actual arguments are not local variables that we can change to match. But we can create such local variables:

    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    {
        char *input = cInput.currentLexem;
        int linecount = 1;
        PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
        ...
        cInput.currentLexem = input;
    }
    ...

Of course, in the general case you would also have to restore any modified data, in this example copying input back to cInput.currentLexem.

And the good thing is that at this point all your tests should still pass. You have test coverage for this code, right?

3. Replace the invocation with the body

As the heading says, comment out the invocation and paste the body of the macro just after it:

    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    {
        char *input = cInput.currentLexem;
        int linecount = 1;
        // PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
        if (lexem > MULTI_TOKENS_START) {                                 \
            if (isIdentifierLexem(lexem)){                                \
                char *tmpcc,tmpch;                                        \
                hash = 0;                                                 \
                for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc) { \
                    SYMTAB_HASH_FUN_INC(hash, tmpch);                     \
                }                                                         \
                SYMTAB_HASH_FUN_FINAL(hash);                              \
                tmpcc ++;                                                 \
                GetLexPosition((pos),tmpcc);                              \
                input = tmpcc; 
        ...
    }

As we can see, the parameters to the macro will now directly correspond to local variables in the context of the replacement.

Run your tests! They should still pass.

4. Clean up

Now we can clean up the code in preparation for the extraction.

5. Extract the function

With the help of a C refactoring browser, like c-xrefactory, that does semantic flow analysis, an extraction will be smooth and optimal in the sense that variables that are not input or output will be restricted to local use inside the function.

In this particular example, extracting a C function from the first line after the commented out invocation to just before any restoring of values will give us


    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    {
        char *input = cInput.currentLexem;
        int linecount = 1;
        // PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
        input = passLex(input, lexem, pos, linecount);
        cInput.currentLexem = input;
    }
    ...

And you have turned a macro into a function!

We can, of course, do some final touch up, like doing away with the linecount and input temporary variables by using the values directly as arguments to the function.

Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    // PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
    cInput.currentLexem = passLex(cInput.currentLexem, lexem, pos, 1);
    ...

I was really lucky with this example because not only did I get a function instead of a macro, but also the argument list became shorter, and some local variables were no longer needed. I’m not sure this was because of the lousy macro in the first place, but I think this strategy could be of general use for cleaning up messy old C code.

Actually, I did this twice for two different invocations and extracted to two different functions. As the code for the two was identical (and all the tests passed) I was confident that the operation had succeeded and could proceed with just replacing the other calls after the established pattern.

I’m working on a legacy project with catastrophically bad code. Two small examples are variable names like ‘ttt’, ‘iii’ and so on, and using such variables for multiple purposes. This makes it very hard to trace data dependencies and understand what are actual dependencies.

In one instance I found code similar to the following

if (function1(&variable)) {
    function2(&variable);
}

The ‘variable’ is obviously an “in/out argument” but it is not obvious if it actually transfers information from ‘ function1’ to ‘function2’. This seems to indicate a tight connection between the two functions.

If there are tests that cover this, one simple way to make that connection obvious, is to separate the ‘variable’ into two.

if (function1(&variable1)) {
    function2(&variable2);
}

This will immediately make it evident if the out-data from ‘function1’ is used in the second call. It will, of course, not guarantee that there is no context in which it would, but if your tests fails, then you have a case when it really did.

Refactoring is often dependent on your knowledge about the code, this is one small trick to learn more about it.

Tag: refactoring

Refactoring macros to functions

1 – Align local variable names with macro arguments

2. Create adapter variables

3. Replace the invocation with the body

4. Clean up

5. Extract the function

Refactoring: Variable interdependencies

Xrefactory to c-xref – refactoring C with Emacs