Refactoring macros to functions

C macros are probably one of my least favourite things… So when I encounter them my spontaneous reaction is to do away with them. They have a nasty resemblance to functions but they have completely different semantics, text replacement, and are usually almost impossible to get under test. So it’s better to get rid of them.

In my current hobby project, refactoring the C refactoring tool c-xrefactory (https://github.com/thoni56/c-xrefactory) to a maintainable state, I encountered a particularly macro-infested area, the lexer. I’ll leave my speculations about how the code gotten this way for another day, and focus on a fairly mechanical sequence of steps that I have been using to refactor a number of macros into nice, clean, testable C functions.

Here is a short excerpt of some code using a macro:

static  void processLine(void) {
    Lexem lexem;
    int l, h, v=0, len;
    Position pos;

    ...

    PassLex(cInput.currentLexem, lexem, l, v, h, pos, len, 1);
    if (lexem != CONSTANT) return;
...

Here’s (again, a part of) the macro (yes, it’s long…) :

#define PassLex(input, lexem, lineval, val, hash, pos, length, linecount) { \
        if (lexem > MULTI_TOKENS_START) {                                 \
            if (isIdentifierLexem(lexem)){                                \
                char *tmpcc,tmpch;                                        \
                hash = 0;                                                 \
                for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc) { \
                    SYMTAB_HASH_FUN_INC(hash, tmpch);                     \
                }                                                         \
                SYMTAB_HASH_FUN_FINAL(hash);                              \
                tmpcc ++;                                                 \
                GetLexPosition((pos),tmpcc);                              \
                input = tmpcc;                                            \
            } else if (lexem == STRING_LITERAL) {                         \
                char *tmpcc,tmpch;                                        \
                for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc);  \
                tmpcc ++;                                                 \
                GetLexPosition((pos),tmpcc);                              \
                Input = tmpcc;                                            \
            } else if (lexem == LINE_TOK) {                               \
                GetLexToken(lineval,input);                               \
...

As you can see there are multiple levels of macros invoking macros invoking macros. So it is extremely difficult and frustrating to try to follow the expansion manually. Of course, you could let the C preprocessor do it, which I tried, but that turns the code into something almost impossible to recognize, so that did not work out well.

The strategy I devised was to first manually expand one of the invocations, then do any adjustments necessary to be able to do the third step, extracting to a “real” C function. Luckily, c-xrefactory is fully operational so I could use it to do the extraction.

Here are the detailed steps:

1 – Align local variable names with macro arguments

Macro semantics are textual replacement, e.g. the formal arguments to a macro can be seen as the text, not the value or even address, you use as the actual argument in the invocation. A clever way to do away with this replacement is to use actual arguments that are names that are exactly the same as the formal arguments, then you can just paste the body of the macro in place for the invocation.

In the example we would change

    Lexem lexem;
    int l, h, v=0, len;
    Position pos;
...
    PassLex(cInput.currentLexem, lexem, l, v, h, pos, len, 1);

so that variable names align, to

    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
...
    PassLex(cInput.currentLexem, lexem, lineval, val, hash, pos, length, 1);

Now, it we would paste the body of the macro here, most of the use of arguments in the body would just match local variables.

2. Create adapter variables

The first and last actual arguments are not local variables that we can change to match. But we can create such local variables:

    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    {
        char *input = cInput.currentLexem;
        int linecount = 1;
        PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
        ...
        cInput.currentLexem = input;
    }
    ...

Of course, in the general case you would also have to restore any modified data, in this example copying input back to cInput.currentLexem.

And the good thing is that at this point all your tests should still pass. You have test coverage for this code, right?

3. Replace the invocation with the body

As the heading says, comment out the invocation and paste the body of the macro just after it:

    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    {
        char *input = cInput.currentLexem;
        int linecount = 1;
        // PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
        if (lexem > MULTI_TOKENS_START) {                                 \
            if (isIdentifierLexem(lexem)){                                \
                char *tmpcc,tmpch;                                        \
                hash = 0;                                                 \
                for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc) { \
                    SYMTAB_HASH_FUN_INC(hash, tmpch);                     \
                }                                                         \
                SYMTAB_HASH_FUN_FINAL(hash);                              \
                tmpcc ++;                                                 \
                GetLexPosition((pos),tmpcc);                              \
                input = tmpcc; 
        ...
    }

As we can see, the parameters to the macro will now directly correspond to local variables in the context of the replacement.

Run your tests! They should still pass.

4. Clean up

Now we can clean up the code in preparation for the extraction.

5. Extract the function

With the help of a C refactoring browser, like c-xrefactory, that does semantic flow analysis, an extraction will be smooth and optimal in the sense that variables that are not input or output will be restricted to local use inside the function.

In this particular example, extracting a C function from the first line after the commented out invocation to just before any restoring of values will give us


    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    {
        char *input = cInput.currentLexem;
        int linecount = 1;
        // PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
        input = passLex(input, lexem, pos, linecount);
        cInput.currentLexem = input;
    }
    ...

And you have turned a macro into a function!

We can, of course, do some final touch up, like doing away with the linecount and input temporary variables by using the values directly as arguments to the function.

Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    // PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
    cInput.currentLexem = passLex(cInput.currentLexem, lexem, pos, 1);
    ...

I was really lucky with this example because not only did I get a function instead of a macro, but also the argument list became shorter, and some local variables were no longer needed. I’m not sure this was because of the lousy macro in the first place, but I think this strategy could be of general use for cleaning up messy old C code.

Actually, I did this twice for two different invocations and extracted to two different functions. As the code for the two was identical (and all the tests passed) I was confident that the operation had succeeded and could proceed with just replacing the other calls after the established pattern.

Refactoring: Variable interdependencies

I’m working on a legacy project with catastrophically bad code. Two small examples are variable names like ‘ttt’, ‘iii’ and so on, and using such variables for multiple purposes. This makes it very hard to trace data dependencies and understand what are actual dependencies.

In one instance I found code similar to the following

if (function1(&variable)) {
    function2(&variable);
}

The ‘variable’ is obviously an “in/out argument” but it is not obvious if it actually transfers information from ‘ function1’ to ‘function2’. This seems to indicate a tight connection between the two functions.

If there are tests that cover this, one simple way to make that connection obvious, is to separate the ‘variable’ into two.

if (function1(&variable1)) {
    function2(&variable2);
}

This will immediately make it evident if the out-data from ‘function1’ is used in the second call. It will, of course, not guarantee that there is no context in which it would, but if your tests fails, then you have a case when it really did.

Refactoring is often dependent on your knowledge about the code, this is one small trick to learn more about it.

TDD Series Part 1 – First write a test? Not.

I’m an avid TDD’er. I just love it. I teach it a lot. To those who think they know and to beginners. And it is not so easy to grook. And maybe it’s an acquired taste. I feel it is an essential tool in a programmers tricks, and am amazed that so few actually know it well enough. And that there are still those that actively denounce it. It is not universally applicable, true, but in my mind, if you consider yourself a programmer, you must know how to do it fluently. Period.

In a few installments I plan on showing the way I do it. And possibly allow a few others to acquire the taste, and some proficiency. Maybe even some more seasoned coders out there will pick up some tricks. Or teach me some…

For me TDD is very much about three words: focus, confidence and pace.

TDD done right allows, and forces, you to focus. We all know that focus is key to progress. But also that you usually only manage to focus on smaller, clear, things. Big, fuzzy things are hard to pinpoint, to focus on and of course, to get any progress on. So the first step to TDD is partitioning into smaller chunks.

Let’s get started. You have probably read, or heard, that the first thing that you should do is write a test.

Wrong.

Continue reading “TDD Series Part 1 – First write a test? Not.”

Texas Hold’em with Cgreen

I’ve ben involved with the development of Cgreen for a few years, so when Software Craftsmanship Linköping asked me to do a TDD session for them I obviously choose that as my basis.

Cgreen is nice for allowing modern TDDing in C (and C++) using fluent API, mocks and the rest.

I talked and we coded. I selected the Texas Hold’em kata, which is interesting because of the multitude of dimensions that need to be covered. It is also a good kata to retry to experiment with different order of the tests. (Actually, I did it from memory and got it wrong, players have 2 private cards and community cards are delt until player folds. So the tests below are inaccurate.) Continue reading “Texas Hold’em with Cgreen”

Xrefactory to c-xref – refactoring C with Emacs

For a long time, probably around a decade, I have been using a refactoring tool built by Slovakian researcher Marián Vittek. It was probably one of the first refactoring tools to cross the “Refactoring Rubicon“.

It is an Emacs plugin that adds refactoring, navigation, completion and crossreference functionality for the C language. There is also some Java support, and they built a commercial C++ version.

Mostly it just works. Of course it has some trouble with heavy macro usage, it’s missing a few basic refactorings, e.g. it doesn’t extract an expression to a function returning a value correctly, so you need to edit the result. I haven’t really thought much about it until I started developing on a new computer and just took a quick look for a new version. I knew the project was kind of hibernating so I hadn’t been up-to-date with events.

To my sadness the xref-tech site was no more. After some googling I found that the C-version had survived as a SourceForge-project created already in 2009 by Marián.

This post is very much a payback for the good service Xrefactory have been giving me during many years. And a strong recommendation to you to look into c-xref if you are into Emacs and C-programming.

Do you know of any similar tools for C?

UPDATE: I’ve been working on trying to resurrect the project, actually it is a challenge to understand and to evolve, but I’m making progress. Follow, or participate in, the project at github.

From CVS to Git, the short story

I decided, finally, to move my main hobby project from CVS to Git. I wasn’t new to Git but I hadn’t worked with it for real. So I thought it was a good idea to start doing that and learning the ropes.

Of course there where two parts to that, first migrating the repository, which this blog post will not talk about at all. My only tip is to do that on a genuine Linux system, on Cygwin I ran into a lot of problems.

The second part was to start learning to use Git on a day to day basis. So here’s a very short tutorial on Git for CVS users.

Continue reading “From CVS to Git, the short story”

Debugging memory leaks with Valgrind and GDB

While debugging memory leaks in one of my private projects, I discovered that GDB and Valgrind can actually operate together in a very nice fashion.

GDB is capable of debugging remote programs, like for embedded device software development, by using a remote protocol to communicate with a proxy within the device.

Valgrind is an almost necessary tool if you are working in an environment of dynamically allocated and returned memory. It follows each allocation in your program and tracks it to see if it is returned properly, continue to be referenced or is lost in space, which is a ‘memory leak’. And as any leak, given enough time you will drown, in this case require more and more memory, until either you program is eating up your whole computer, or you get out of memory. Continue reading “Debugging memory leaks with Valgrind and GDB”

XP2012

I’m just back from XP2012 which was held at Malmömässan in Malmö, Sweden. So when it’s “at home” you just can’t miss out. I stayed almost the whole week and felt injected with a lot of inspiration, which is exhausting.

While, in my eyes, the program didn’t look quite the same level as some of the previous XP’s, I had some very inspiring sessions, including information packed keynote by Dave Snowden, a promising workshop with Tobias Anderberg and Ola Ellnestam, presentation on agile contracts by lawyer Lars Ahrred which brought hope and tips for future cooperations, a workshop with Ivana Gancheva and Bent Myllerup on coaching/teaching balance and the new concept “Validated Influencing”.

Open Space session about “Deliberate Practice” initiated by me became an extreme success thanks to Willem Larsen, David Campey and Markku Åhman. Thanks guys.

Willem also did a language/fluency hunting session which was inspiring to see. I’ll take away the “Live” tool (as a participant being engaged, inspired, finding forms that makes us get into that operating mode).

The conference dinner prepared by Jan Boris-Möller (an engineer turned master chef challenging almost every cooking preconception) was so much more interesting after his presentation and the following conversation with him on the height of chefs hats, chemistry and physic of cooking and waiters selling what works (rather than what the customer thinks he wants).

Also my own presentation, which was a remake of my “Agile Analysis” titled “Continuous Analysis, or Kanban for Product Owners” got some traction.