Refactoring macros to functions

C macros are probably one of my least favourite things… So when I encounter them my spontaneous reaction is to do away with them. They have a nasty resemblance to functions but they have completely different semantics, text replacement, and are usually almost impossible to get under test. So it’s better to get rid of them.

In my current hobby project, refactoring the C refactoring tool c-xrefactory (https://github.com/thoni56/c-xrefactory) to a maintainable state, I encountered a particularly macro-infested area, the lexer. I’ll leave my speculations about how the code gotten this way for another day, and focus on a fairly mechanical sequence of steps that I have been using to refactor a number of macros into nice, clean, testable C functions.

Here is a short excerpt of some code using a macro:

static  void processLine(void) {
    Lexem lexem;
    int l, h, v=0, len;
    Position pos;

    ...

    PassLex(cInput.currentLexem, lexem, l, v, h, pos, len, 1);
    if (lexem != CONSTANT) return;
...

Here’s (again, a part of) the macro (yes, it’s long…) :

#define PassLex(input, lexem, lineval, val, hash, pos, length, linecount) { \
        if (lexem > MULTI_TOKENS_START) {                                 \
            if (isIdentifierLexem(lexem)){                                \
                char *tmpcc,tmpch;                                        \
                hash = 0;                                                 \
                for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc) { \
                    SYMTAB_HASH_FUN_INC(hash, tmpch);                     \
                }                                                         \
                SYMTAB_HASH_FUN_FINAL(hash);                              \
                tmpcc ++;                                                 \
                GetLexPosition((pos),tmpcc);                              \
                input = tmpcc;                                            \
            } else if (lexem == STRING_LITERAL) {                         \
                char *tmpcc,tmpch;                                        \
                for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc);  \
                tmpcc ++;                                                 \
                GetLexPosition((pos),tmpcc);                              \
                Input = tmpcc;                                            \
            } else if (lexem == LINE_TOK) {                               \
                GetLexToken(lineval,input);                               \
...

As you can see there are multiple levels of macros invoking macros invoking macros. So it is extremely difficult and frustrating to try to follow the expansion manually. Of course, you could let the C preprocessor do it, which I tried, but that turns the code into something almost impossible to recognize, so that did not work out well.

The strategy I devised was to first manually expand one of the invocations, then do any adjustments necessary to be able to do the third step, extracting to a “real” C function. Luckily, c-xrefactory is fully operational so I could use it to do the extraction.

Here are the detailed steps:

1 – Align local variable names with macro arguments

Macro semantics are textual replacement, e.g. the formal arguments to a macro can be seen as the text, not the value or even address, you use as the actual argument in the invocation. A clever way to do away with this replacement is to use actual arguments that are names that are exactly the same as the formal arguments, then you can just paste the body of the macro in place for the invocation.

In the example we would change

    Lexem lexem;
    int l, h, v=0, len;
    Position pos;
...
    PassLex(cInput.currentLexem, lexem, l, v, h, pos, len, 1);

so that variable names align, to

    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
...
    PassLex(cInput.currentLexem, lexem, lineval, val, hash, pos, length, 1);

Now, it we would paste the body of the macro here, most of the use of arguments in the body would just match local variables.

2. Create adapter variables

The first and last actual arguments are not local variables that we can change to match. But we can create such local variables:

    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    {
        char *input = cInput.currentLexem;
        int linecount = 1;
        PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
        ...
        cInput.currentLexem = input;
    }
    ...

Of course, in the general case you would also have to restore any modified data, in this example copying input back to cInput.currentLexem.

And the good thing is that at this point all your tests should still pass. You have test coverage for this code, right?

3. Replace the invocation with the body

As the heading says, comment out the invocation and paste the body of the macro just after it:

    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    {
        char *input = cInput.currentLexem;
        int linecount = 1;
        // PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
        if (lexem > MULTI_TOKENS_START) {                                 \
            if (isIdentifierLexem(lexem)){                                \
                char *tmpcc,tmpch;                                        \
                hash = 0;                                                 \
                for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc) { \
                    SYMTAB_HASH_FUN_INC(hash, tmpch);                     \
                }                                                         \
                SYMTAB_HASH_FUN_FINAL(hash);                              \
                tmpcc ++;                                                 \
                GetLexPosition((pos),tmpcc);                              \
                input = tmpcc; 
        ...
    }

As we can see, the parameters to the macro will now directly correspond to local variables in the context of the replacement.

Run your tests! They should still pass.

4. Clean up

Now we can clean up the code in preparation for the extraction.

5. Extract the function

With the help of a C refactoring browser, like c-xrefactory, that does semantic flow analysis, an extraction will be smooth and optimal in the sense that variables that are not input or output will be restricted to local use inside the function.

In this particular example, extracting a C function from the first line after the commented out invocation to just before any restoring of values will give us


    Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    {
        char *input = cInput.currentLexem;
        int linecount = 1;
        // PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
        input = passLex(input, lexem, pos, linecount);
        cInput.currentLexem = input;
    }
    ...

And you have turned a macro into a function!

We can, of course, do some final touch up, like doing away with the linecount and input temporary variables by using the values directly as arguments to the function.

Lexem lexem;
    int lineval, hash, val=0, length;
    Position pos;
    ...
    // PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
    cInput.currentLexem = passLex(cInput.currentLexem, lexem, pos, 1);
    ...

I was really lucky with this example because not only did I get a function instead of a macro, but also the argument list became shorter, and some local variables were no longer needed. I’m not sure this was because of the lousy macro in the first place, but I think this strategy could be of general use for cleaning up messy old C code.

Actually, I did this twice for two different invocations and extracted to two different functions. As the code for the two was identical (and all the tests passed) I was confident that the operation had succeeded and could proceed with just replacing the other calls after the established pattern.

Refactoring: Variable interdependencies

I’m working on a legacy project with catastrophically bad code. Two small examples are variable names like ‘ttt’, ‘iii’ and so on, and using such variables for multiple purposes. This makes it very hard to trace data dependencies and understand what are actual dependencies.

In one instance I found code similar to the following

if (function1(&variable)) {
    function2(&variable);
}

The ‘variable’ is obviously an “in/out argument” but it is not obvious if it actually transfers information from ‘ function1’ to ‘function2’. This seems to indicate a tight connection between the two functions.

If there are tests that cover this, one simple way to make that connection obvious, is to separate the ‘variable’ into two.

if (function1(&variable1)) {
    function2(&variable2);
}

This will immediately make it evident if the out-data from ‘function1’ is used in the second call. It will, of course, not guarantee that there is no context in which it would, but if your tests fails, then you have a case when it really did.

Refactoring is often dependent on your knowledge about the code, this is one small trick to learn more about it.

Texas Hold’em with Cgreen

I’ve ben involved with the development of Cgreen for a few years, so when Software Craftsmanship Linköping asked me to do a TDD session for them I obviously choose that as my basis.

Cgreen is nice for allowing modern TDDing in C (and C++) using fluent API, mocks and the rest.

I talked and we coded. I selected the Texas Hold’em kata, which is interesting because of the multitude of dimensions that need to be covered. It is also a good kata to retry to experiment with different order of the tests. (Actually, I did it from memory and got it wrong, players have 2 private cards and community cards are delt until player folds. So the tests below are inaccurate.) Continue reading “Texas Hold’em with Cgreen”

Xrefactory to c-xref – refactoring C with Emacs

For a long time, probably around a decade, I have been using a refactoring tool built by Slovakian researcher Marián Vittek. It was probably one of the first refactoring tools to cross the “Refactoring Rubicon“.

It is an Emacs plugin that adds refactoring, navigation, completion and crossreference functionality for the C language. There is also some Java support, and they built a commercial C++ version.

Mostly it just works. Of course it has some trouble with heavy macro usage, it’s missing a few basic refactorings, e.g. it doesn’t extract an expression to a function returning a value correctly, so you need to edit the result. I haven’t really thought much about it until I started developing on a new computer and just took a quick look for a new version. I knew the project was kind of hibernating so I hadn’t been up-to-date with events.

To my sadness the xref-tech site was no more. After some googling I found that the C-version had survived as a SourceForge-project created already in 2009 by Marián.

This post is very much a payback for the good service Xrefactory have been giving me during many years. And a strong recommendation to you to look into c-xref if you are into Emacs and C-programming.

Do you know of any similar tools for C?

UPDATE: I’ve been working on trying to resurrect the project, actually it is a challenge to understand and to evolve, but I’m making progress. Follow, or participate in, the project at github.

From CVS to Git, the short story

I decided, finally, to move my main hobby project from CVS to Git. I wasn’t new to Git but I hadn’t worked with it for real. So I thought it was a good idea to start doing that and learning the ropes.

Of course there where two parts to that, first migrating the repository, which this blog post will not talk about at all. My only tip is to do that on a genuine Linux system, on Cygwin I ran into a lot of problems.

The second part was to start learning to use Git on a day to day basis. So here’s a very short tutorial on Git for CVS users.

Continue reading “From CVS to Git, the short story”

XP2012

I’m just back from XP2012 which was held at Malmömässan in Malmö, Sweden. So when it’s “at home” you just can’t miss out. I stayed almost the whole week and felt injected with a lot of inspiration, which is exhausting.

While, in my eyes, the program didn’t look quite the same level as some of the previous XP’s, I had some very inspiring sessions, including information packed keynote by Dave Snowden, a promising workshop with Tobias Anderberg and Ola Ellnestam, presentation on agile contracts by lawyer Lars Ahrred which brought hope and tips for future cooperations, a workshop with Ivana Gancheva and Bent Myllerup on coaching/teaching balance and the new concept “Validated Influencing”.

Open Space session about “Deliberate Practice” initiated by me became an extreme success thanks to Willem Larsen, David Campey and Markku Åhman. Thanks guys.

Willem also did a language/fluency hunting session which was inspiring to see. I’ll take away the “Live” tool (as a participant being engaged, inspired, finding forms that makes us get into that operating mode).

The conference dinner prepared by Jan Boris-Möller (an engineer turned master chef challenging almost every cooking preconception) was so much more interesting after his presentation and the following conversation with him on the height of chefs hats, chemistry and physic of cooking and waiters selling what works (rather than what the customer thinks he wants).

Also my own presentation, which was a remake of my “Agile Analysis” titled “Continuous Analysis, or Kanban for Product Owners” got some traction.

Increments and iterations

Describing the difference, and similarities, of the two words iteration and increment have been very hard for most of us. Using paintings have never really “clicked” with me…

But now, there is a nice and clearifying description by Eivind Nordby. With the help of some well known guys and maybe someone not so well known, Eivind takes a step forward in understanding the concepts and explains that both can be applied in the process and the product dimensions, but might still mean “adding” (incremental) and “reworking” (iteratively).

And I suppose that it’s the fact that both addition and re-work can be applied in both the process and the product dimension, that makes it so hard to pinpoint and describe.

In a “true” agile sense we really like re-work in the process dimension (repeating the activities, so that we can get good at them). But we dislike re-work in the product dimension (could be considered “unknown amount of work left to do”) becase we want the functionality to be Done Done. In real life though, mostly we aren’t really, really Done Done. Sometimes because of misunderstandings, time constrains and what not, but also not seldom because of the “systems implementation uncertainty principle”, the fact that implementing a system changes both the perception of that system and the needs it should fulfill.

So I guess that we should continue to strive for pure incrementatlity in the product dimension, but sometimes accept a “failure” and then iterate a bit. Particularly to get the early feedback that is so essential for delivering the functionality and properties that are really needed, and not the percieved needs.

The (im)possibility of planning development

When I talk to project managers, or managers in general, one of their main concerns is the precision, or lack there of, in their planning. It is still common that development projects overrun their deadlines, resulting in frustration, loss of money and trust, and cause a lot of extra work in re-planning dependent activities. So many managers look to Agile for a solution to this problem.

But very few seem to realize the inherent problem in planning development work. It is not uncommon for managers of large projects to think of planning as a simple process of converting required functionality to manhours and then allocate enough people to do the hours. It seems to be working when planning other types of projects, so why shouldn’t it work for development?

Well, first, does it really work for other types of projects? Software people have always been blaimed for beeing the worst when it comes to planning; road work, house building, are always on time. Well, no, their not. At least not most of them.

What is development, really? Some people view software development as a production process: from the requirements manufacture the software. Sometimes it can be, but usually we then quickly create a tool that can do that repetetive work for us. So what’s left? Only the parts that are not repeatable. The ones that require engineering and design. That means that development work is a creative process. Or rather, it is problem solving. Constantly solving new problems is what development is. At its core it’s like solving a continuous flow of crossword puzzles. And like crossword puzzles, some things are easy, sometimes as hard as we thought. And some things, much, much harder. Did you ever give up on a crossword puzzle?

So can you tell me how long it’s going to take you to solve the Sunday crossword puzzle? Of course you can’t. You don’t know what the questions will be, what problems you will have to solve. So development work is inherently unplannable. Period.

How can we then promise anything at all about when something will be ready? Agile planning uses statistical methods to get planning to work. And statistical methods needs multiple values to work, you can’t use a single value as a basis for statistics. And you don’t do statistics by guessing, you measure. And the more data points you have, the better your statistics will be.

Doing the Sunday crosswords for a year will give you 52 datapoints, resulting in a reasonable probability in your guess for the next one. It still won’t give you any guarantees for how long the next one will take, but on average you will know. If you wanted to know the total time for the next 52, you’d have a pretty good guess.

If you do various sizes and types of crossword puzzles you could probably find some statistical correlation between the number of squares or questions in a puzzle and the time it took. This adds to you statistical samples, maybe up to a few hundred squares over a year, increasing the statistical probabillity of your future projections.

If you want to have good statistically based projections you need many actual samples, and many planned samples. And how do Agile planning help us with that?

By

  • breaking down functionality into small parts
  • always include everything required to keep quality
  • measuring average development speed

Because development work is problem solving we need statistical support for our planning, and because we need statistical support for our planning we need many samples. The agile techniques to do that are small stories, done criteria, velocity and story points. And as many of them as we can get.

Short tour

I’m doing a short “tour” together with two other speakers, Janne Lundberg from Atlas Copco Tools and Joakim Pilborg from KnowIT Technology Management. The theme of the tour is of course lean and agile and I’m talking about “Agile in the Large”, one of my favourite topics. Here are the slides that I show.