Makefiles for mathematicians

A tutorial for using make.
To be read after the tutorial about the compiler toolchain.

Abstract
The make utility formalizes as a directed graph the problem of compiling several source files. The vertices of the graph are files and the edges are the instructions to transform the corresponding files. Vertices with only outgoing edges are the source files. Vertices with only incoming edges are the target files. Vertices with both incoming and outgoing edges are intermediary files. The build process is performed by creating the target files according to the instructions given on this directed graph. The edges of this directed graph are called rules. The make program comes with a large set of predefined rules, and these rules can be augmented or changed by writing a Makefile.

Disclaimer
The goal of this tutorial is to showcase the timeless beauty of Makefiles, not to give a set of recipes.

1. Introduction

The make program is an extremely elegant UNIX utility. It solves the problem of building a complex program given a formally described set of dependencies. With the -j option, it builds as many files as possible in parallel.

In this tutorial we explain several use cases. We start with the simplest possible case (compiling a single file), and we move to more complicated cases (like compiling several files, or linking against libraries).

2. Making without makefiles

It is not necessary to write any makefile for using make; for example, when we want to build a single file. We start with this simple example, that already illustrates almost all the features of make.

2.1. Basic structure

Imagine that you want to compile a simple program hello.c

#include 

int main(void)
{
	return printf("hello world\n");
}

Then you run

make hello

And it will compile your program (by running cc hello.c -o hello).

This is the whole story. Using make is always the same: you ask make to build one file, and he tries to do it, in the most meaningful way. Everything else are minor details, like setting the compiler options; or minor variations, like building more than one file.

2.2. Idempotence and dependences

In the previous example, note that running make hello a second time does nothing. Since the file hello is already built, there's nothing else to do. The program make is idempotent: running make two times is the same as running it once. This is a fundamental property. How does the program know that there is no more work to do? Because it looks at the dates of each file.

Thus, to force a recompilation, you can change the date of the source code (e.g., by running touch hello.c) and then make will build it again.

Now try the following:

rm -f hello    # delete the compiled file
make hello.o   # will run "cc -c hello.c -o hello.o"
make hello     # will run "cc hello.o -o hello"

Notice that, when the file hello.o already exists, make nows that the source is already compiled, and it only needs to link it to produce a final executable. It always tries to build the requested file with the least possible amount of work. If you touch hello.o and then you run make hello, it will only do the linking, because the file hello is older than hello.o.

How does make know what to build from what? Because he has a secret list of implicit rules that tell it so. These rules form a directed graph, which in our case is the following:

   hello.c ---------------------------> hello.o -----------------------> hello
            cc -c hello.c -o hello.o             cc hello.o -o hello.c

The nodes of this graph are the filenames. The edges of this graph are the instructions to create one file from another. When you run make hello, the program checks if the requested file appears in the graph, and finds a directed path from a file that exists to the requested file. Then, the program runs the instructions corresponding to each edge in the path. It only runs the part of the path from the file that is newer than the requested target, if any.

This is all that make does. The only control left to the user is the specification of the graph.

2.3. Variables

You can set variables to fine-tune the build process. The most important variable is CC, that specifies the C compiler, and the default value is typically "cc". There are two ways to change the value of a variable (without writing a Makefile): either you set an environment variable of the same name, or you specify it as argument to the make invocation.

# first technique: give argument to make
make CC=clang hello    # compile "hello" using clang

# second technique: specify environment variable
export CC=clang
make hello             # compile "hello" using clang

The following table summarizes the most important variables

variable meaning default example assignement
CC C compiler cc tcc
CXX C++ compiler c++ clang
CFLAGS flags for C compiler -O2 -Wall
CXXFLAGS flags for C++ compiler -O2 -Wall
CPPFLAGS flags for preprocessor -I /path/to/my/includes
-DMACRO=value
LDFLAGS flags for linker -L /path/to/my/libs
LDLIBS libraries for linker -lmylib

Variables are very powerful. For example, the following shell script builds the same program using five C compilers and different compiler options (debug and release mode):

for i in gcc clang tcc icc suncc; do
	make CC=$i CFLAGS="-Wall -g" hello
	mv hello hello_debug_$i

	make CC=$i CFLAGS="-Wall -O3" hello
	mv hello hello_release_$i
done

This script creates 10 different executables with all the compilers and all the compiler options (assuming than the named compilers are installed). These kind of scripts are useful to check that your program gives zero warnings for a large set of compilers and compiler options. And we have not written a Makefile yet!

3. Makefiles

Making without makefiles may be an interesting exercice, but it is not more practical than calling the compiler directly. The real interest of make is that it allows to compile many files into one program—or many programs—in a single stroke.

3.1. A very simple Makefile

This is how you would specify the graph of the example above in the Makefile language:

hello.o: hello.c
	cc -c hello.c -o hello.o

hello: hello.o
	cc hello.o -o hellox

That's a complete Makefile. It is a list of rules. Each rule has the following form:

target: source1 source2 ... sourcen
	instructions to build target from sources

Very important: the instructions are indented using one tab. Spaces will not work.

Once you have written a makefile, you can request to build a target by running make target. If you don't specify a target, the first target will be built.

3.2. Using variables

This simple makefile, where the rules are explicit, is actually less powerful than an empty makefile that uses the implicit rules. There are two ways in which it is less powerful: (1) we cannot change the compiler or the flags; and (2) the name hello is hardcoded, it does not specify how to build files with different names. The first problem is solved using variables. The second problem will be solved later using pattern rules.

You can define variables inside the makefile and use them later in the rules:

COMPILER=gcc
COMPILER_FLAGS=-O3

hello: hello.c
	$(COMPILER) $(COMPILER_FLAGS) hello.c -o hello

Variables defined inside the makefile are taken as default values. They can be overridden by redefining them as arguments in the call to make. Of course, the variable names that we have chosen in this example are preposterous. Here, we should have used the standard names, which are already given default values, and other people can expect default behaviour from our makefile (such as taking into account their preferred compiler flags):

hello: hello.c
	$(CC) $(CFLAGS) hello.c -o hello

Notice that the makefile above is equivalent to an empty file, because it matches an implicit rule. However, for clarity we will work on this simple example and add more files to build. Until section 3.6, forget about implicit rules.

3.3. Multiple source files

For now, we have just dealt with a single file to compile. In a typical case, the source code of a program will span several files (let's say, three files hello, options and lib). Then, we can specify the rules for building each file:

hello: hello.o options.o lib.o
	$(CC) hello.o options.o lib.o -o hello

hello.o: hello.c
	$(CC) $(CFLAGS) -c hello.c -o hello.o

options.o: options.c
	$(CC) $(CFLAGS) -c options.c -o options.o

lib.o: lib.c
	$(CC) $(CFLAGS) -c lib.c -o lib.o

Now, running make will call the compiler four times: one for each object file, and then one to link all the objects into one exectuable. This is embarrassingly parallelizable; indeed running make -j will launch the compilation of the three object files in parallel.

3.4. Automatic variables

In the example above, there is a lot of redundancy: the names of the files appear many times. The redundancy can be removed by using local or automatic variables:

variable meaning
%@ name of the target
%^ list of all prerequisites
%< the first prerequisite
%* the stem of a pattern rule

Thus, when the name of the target or of the prerequisites appears inside a rule (which happens almost always), we can simplify the rule using local variables:

hello: hello.o options.o lib.o
	$(CC) $^ -o $@

hello.o: hello.c
	$(CC) $(CFLAGS) -c $< -o $@

options.o: options.c
	$(CC) $(CFLAGS) -c $< -o $@

lib.o: lib.c
	$(CC) $(CFLAGS) -c $< -o $@

3.5. Pattern rules

The makefile that we have just written above exhibits a higher-level type of redundancy: the rules themselves are all the same! Moreover, the names of each target and its prerequisite are the same, only differing by extension. Pattern rules allow to express this kind of redundancy:

hello: hello.o options.o lib.o
	$(CC) $^ -o $@

%.o: %.c
	$(CC) $(CFLAGS) -c $< -o $@

The character % is a placeholder for an arbitrary string. If you request to build a file with the extension .o, then this rule will match, and it will try to build the .o file from the .c file in the indicated way.

3.6. Mixing implicit and explicit rules

In the makefile above, notice that the rules for building the object files are unnecessary, because they do exactly the same thing as the implicit rules. Thus, an equivalent makefile is the following.

hello: hello.o options.o lib.o
	$(CC) $^ -o $@

Rather short, isn't it? You just say that you need file.o, and the implicit rules take care of building it from file.c.

3.7. Dependencies without rules

Now we are in the rarefied atmosphere of theories of excessive beauty and we are nearing a high plateau on which geometry, optics, mechanics, and wave mechanics meet on a common ground. Only concentrated thinking, and a considerable amount of re-creation, will reveal the full beauty of our subject in which the last word has not been spoken yet. —Cornelius Lanczos, The Variational Principles of Mechanics.

The two-line makefile above can be further shortened to this thing of beauty:

hello: hello.o options.o lib.o

This is a complete makefile, equivalent to the examples given above. How is that even possible? What kind of sorcery is going on here?

This works because the multiple prerequisites of the same target can be stated on separate lines, and they are simply added to the rule (there must be exactly one rule per target). Thus, without using implicit rules and writing the prerequisites separately, this is equivalent to the following

hello: hello.o

hello: options.o

hello: lib.o

%: %.o
	$(CC) $^ -o $@

%.o: %.c
	$(CC) $(CFLAGS) -c $< -o $@

When we put all the prerequisites on the same line and expand all the patterns, we recover EXACTLY the same text as in section 3.3.

3.8. Implicit rules

We have talked before about a "secret" list of implicit rules. Actually, there is nothing secret about it. The implicit rules are defined explicitly by pattern rules and they look exactly like the last two patterns of the previous section. To look at the complete list of implicit rules run the following command:

make -p -f /dev/null > implicit_rules.txt

This will create a text file with the list of all implicit rules (and many other information). Running make without a makefile is exactly equivalent to using this file. Now, this file may seem overwhelming; it is probably very long, because make uses a lot of heuristics, and they are all specified here. But somewhere in the middle it must contain lines that look more or less like this:

%: %.o
	$(CC) $^ -o $@ $(LDLIBS)

%.o: %.c
	$(CC) $(CFLAGS) -c $< -o $@

...which is just the two pattern rules on section 3.6. See section 6.4 below for a more complete view of the default pattern rules.

It is highly recommended to print the list of implicit rules for your make setup, and read it thoroughly. Even if it is long, it is nothing more than a sequence of variable assignments and pattern rules.

3.9. Phony targets

It is not necessary that a rule creates any file, make will not verify anyway; you can run all sort of crazy stuff in the instructions. The most typical is to have a clean target, that instead of creating a file called "clean" simply removes all the executable files. Or you can have a check target that runs unit tests in your code. This is then our fancy makefile:

OBJ = main.o options.o lib.o

hello: $(OBJ)
	$(CC) $(OBJ) -o hello

clean:
	rm -f $(OBJ) hello

check: hello
	./hello -test

Notice that the clean target has no dependences. The check target has the file hello as dependency, so it will compile the hello file if needed.

NOTE: if you have files named clean or check then all hell will break loose. To protect against this risk, you can precede these targets with a line that says ".PHONY:". But I like to live on the edge.

Given the makefile above, the following shell script builds the program using five C compilers, and runs the test suite for each of the resulting executables, both in debug and in release mode (for a total of 10 checks of the test suite):

for i in gcc clang tcc icc suncc; do
for m in "-O3 -DNDEBUG" "-g"; do
	make clean check CC=$i CFLAGS=$m
done
done

If you want to be really neat, design the test suite so that it is silent upon success, and add the -s (silent) option to make. Then, the script will only produce output when something fails.

Try doing that with cmake!

4. Practical issues

In an ideal world (from the point of view of the makefile writer), your program is written from scratch using an old standard of the programming language, and it uses no external libraries. In practice, however, your program may rely in some modern features of the language—that require compiler flags—and it may need external libraries which are installed under strange names. Also, the dependences between the source files may be somewhat convoluted, and writing them by hand is error-prone. Let's see what we can do about all these problems.

4.1. Using external libraries

The program make will never try to find where external libraries are located; it is just not his job. In theory, this is not a problem at all. If your program requires e.g., the libtiff library, then you simply add -ltiff to the compilation line. The following makefile compiles a program that requires libtiff:

OBJ=main.o options.o lib.o

hello: $(OBJ)
	$(CC) $(CFLAGS) $(OBJ) -o hello -ltiff

This will work correctly as long as libtiff is installed on your system. What does it mean, exactly "to be installed in your system"? Well, by definition, it means that this makefile works! More precisely, it means that the following three things are true:

  1. When you write #include <tiffio.h> in your source code, the preprocessor finds this include file.
  2. When you add -ltiff to the compilation line, the linker is able to find the library file.
  3. (Just in the case of dynamic linking) when you run your program, the dynamic linker is able
    to find the file libtiff.so in your system.

For example, if the program hello of section 3.7 requires libtiff, then this is a complete makefile for compiling the program

LDLIBS = -ltiff
hello  : hello.o options.o lib.o

This works because $(LDLIBS) is used in the implicit rule for linking objects.

4.2. Using non-installed external libraries

For GNU and BSD systems, libraries are often correctly installed: once you install a library using the package manager, this library becomes available to the compiler without further ado. In case the library is not installed, the compilation will produce a clear error message, which I suppose is the desired behaviour.

In other situations (e.g., bizarre systems without package managers like OSX, or user-installed libraries), you may want to use a library that is not "correctly installed" according to the three points above. Then, the solution is to correctly install it! This can be done by setting three environment variables:

  1. Add the path to the include files of your library to the variable CPATH.
  2. Add the path to the object files of your library (.so or .a) to the variable LIBRARY_PATH.
  3. If necessary, add the path to the dynamic objects to the variable LD_LIBRARY_PATH.

Once these three variables are set, then the library is correctly installed, and your makefile can be run. Notice that this task is independent to the usage of the makefile; the task is part of the installation of the library, for systems that do not have a decent package manager. These variables are recognized by all compilers that I know of (GCC, CLANG, TCC, INTEL and SUN compilers).

4.3. Setting platform-specific flags

It is strongly advised to write portable code that compiles out of the box in any system. Today, this is much, much, easier than a few years ago because most systems are POSIX-compliant (with slightly different versions of the POSIX standard, though). Thus, horrible ``portability'' tools like automake, autoconf, and cmake are mostly unnecessary.

Still, there are a few situations when your code with a simple makefile is not straightforwardly portable to all the systems that you may want to. I show to examples: to compile ANSI C in older versions of GCC, and to ``enable'' openmp in the platforms where it is available. Once you understand these hacks, you can easily rewrite them for other simtuations. The basic idea is to run a shell command that will return or not an empty output, according to the condition you want to check, and then capture this output from within a $(shell ...) directive:

# The following hack allows to compile modern ANSI C (C99 and newer), on
# very old and unmantained versions of the gcc compiler (older than
# gcc 5.1, released on april 2015).  These old versions of GCC are able
# to compile C99 and C11, but some features are not enabled by default, thus
# the hack enables these features explicitly if the compiler seems to
# be pre-ANSI C.  The clang compiler does not typically need such a hack.
#
# hack for older compilers (adds gnu99 option to CFLAGS, if necessary)
ifeq (,$(shell $(CC) $(CFLAGS) -dM -E - < /dev/null | grep __STDC_VERSION__))
CFLAGS := $(CFLAGS) -std=gnu99
endif

# use OpenMP only if not clang
ifeq (0,$(shell $(CC) -v 2>&1 | grep -c "clang"))
CFLAGS := $(CFLAGS) -fopenmp
endif

If you have a favorite Makefile hack, you can send it to me and I add it here.

4.4. Automatic generation of dependences

Often, automatic generation of dependences is unnecessary: each object file depends on the corresponding source file (that has the same name but different extension). This case is already covered by the implicit rule %.o:%.c.

However, there are still some situations when there are other dependences between source files. For example, when you #include another file from your source code. In principle, since the file is explicitly included, it should not appear in the command line and the compilation will be successful without explicitly stating this dependency. Thus, the short answer is that even in that case, this dependence need not be known by make.

But when you are developing code, the situation is different: if you change the included file, you may want to recompile the object. Thus, make must know about this dependency.

Fortunately, most compilers accept the -MM option, that prints the list of files included by a source file, conveniently formatted as makefile dependencies. The following makefile deals automatically with this

# regular makefile stuff
LDLIBS  = -ltiff
hello   : hello.o lib.o options.o

# generation and inclusion of missing dependencies
deps.mk :
	$(CC) -MM $(shell ls *.c) > deps.mk
-include deps.mk

5. Example makefiles

5.1. Single executable, many source files

This is the case that we have solved above

BIN    = hello
OBJ    = hello.o options.o lib.o

$(BIN) : $(OBJ)

clean  :
	$(RM) $(BIN) $(OBJ)

5.2. Many executables, one source file for each

This can also be done using only implicit rules.

BIN = foo bar baz

all: $(BIN)

clean:
	$(RM) $(BIN)

5.3. Many executables, many source files for each

When you have a set of object files common to a set of separate executables.

BIN     = foo bar baz
OBJ     = lib1.o lib2.o lib3.o

default : $(BIN)

$(BIN)  : $(OBJ)

clean   :
	$(RM) $(BIN) $(OBJ)

5.4. Buidling a static library

This is just like the previous case but putting all the objects inside a static library for ease of linking.

BIN     = foo bar baz
OBJ     = lib1.o lib2.o lib3.o
LIB     = libmine.a

default : $(BIN)
$(BIN)  : $(OBJ)
$(LIB)  : $(LIB)($(OBJ))

clean   :
	$(RM) $(BIN) $(LIB) $(OBJ)

5.5. Options and libraries

You can ``enhance'' the example above with compiler options and additional libraries to obtain a very general Makefile:

# user-editable configuration
CFLAGS  = -march=native -Os

# required libraries
LDLIBS  = -lpng -ljpeg -ltiff

# files
BIN     = foo bar baz
OBJ     = lib1.o lib2.o lib3.o

# default target: build all the binaries
default : $(BIN)

# each binary depends on all the object files
$(BIN)  : $(OBJ)

# bureaucracy
clean   : ; $(RM) $(BIN) $(OBJ)
.PHONY  : default clean

Notice that there is no harm at all in linking unused object files. The linker will actually ignore symbols that are unused. Similarly with libraries given with as -llib.

5.6. Interaction with pkg-config and the like

The pkg-config tool provides a way for people to use libraries that are not fully installed on their system. This is a simple program that prints whatever horrible flags are necessary for compiling and linking against the library. It has three relevant options, here shown with their output on my system (for lib-poppler):

pkg-config poppler --cflags       # prints  -I/usr/include/poppler
pkg-config poppler --libs         # prints -lpoppler
pkg-config poppler --version      # prints 0.26

The output of pkg-config is straightforward to use inside Makefiles

Other packages, suck as gdal, prefer to avoid the standard pkg-config system and provide their own gdal-config with similar behaviour. Thus, if your program requires support for gdal, you simply do the following:

# variables
CFLAGS  = -march=native -Os `shell gdal-config --cflags`
LDLIBS  = `gdal-config --libs`

# files
BIN     = foo bar baz
OBJ     = lib1.o lib2.o lib3.o

# default target: build all the binaries
default : $(BIN)

# each binary depends on all the object files
$(BIN)  : $(OBJ)

# bureaucracy
clean   : ; $(RM) $(BIN) $(OBJ)
.PHONY  : default clean

5.7. Multiple directories

All the examples given above work on a flat directory structure. This simplification allows to harness the full power of implicit rules. If you want to work with more complex directory structures, you will have to write the patterns yourself. Here we show an example of separate source and output directory (for the many objects/many executables case)

# files
BIN     := prog1 prog2 prog3
OBJ     := lib1.o lib2.o lib3.o

# add appropriate prefix to filenames
BIN     := $(addprefix bin/,$(BIN))
OBJ     := $(addprefix src/,$(OBJ))

# default target
default : $(BIN)

# rule to build each executable
bin/%   : src/%.o  $(OBJ)
	$(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS)

# bureaucracy
clean   : ; $(RM) $(BIN) $(OBJ)
.PHONY  : default clean

I would advise to only split your source code into subdirectories when you have a lot of files (say, more than 100).

6. The Makefile language

In what language is a Makefile written? The answer is: in three separate and different languages:

  1. The core language, that describes the dependences between files, and indicates which rule should be used to build each file.
  2. The shell language, that describes the rules themselves. This is just regular unix shell, and is actually interpreted by the shell, not by make.
  3. The macro language, that allows to define variables, implicit and pattern rules, and include other files.

Moreover, there is a set of pre-defined macros. This is actually very important since it allows to write extremely succint makefiles.

6.1. The core makefile language

The core makefile language describes a directed graph explicitly. Each edge is written either using a tab:

to : from
	edge

or a semicolon

to : from ; edge

The vertices are filenames, and the edges are shell instructions.

This core language is extremely portable along all the historical implementations of make.

6.2. The shell language

The edges of the graph, or rules, are written in plain UNIX shell. This text is first pre-processed by the makefile macro language, replacing the dollar-variables that it finds before sending the text to the shell. Thus, if you want that the shell receives dollar characters, you have to escape them (with another dollar character).

Formally, it is easy to distinguish between the make language and the shell language parts of a Makefile: lines starting with TAB are interpreted by the shell, and the other lines are interpreted by make. This is almost true, the shell is also used inside makefiles in another place: as the first and only argument of the $(shell ...) directive.

If you really need to, you can change the actual shell used for running the programs by changing the make variable SHELL. For example SHELL=/bin/zsh. But it is strongly advised to use only posix shell features.

6.3. The macro language

The makefile macro language is the ``fancy'' part of the Makefile. It is largely non-portable, but equivalent constructions exists between the two main implementations of Make: BSD make and GNU make. While it is possible to write makefiles in a portable way, they do not tend to be beautiful (mainly because the implicit rules are slightly different).

You have to think of the macro language as a pre-processor of your makefile, just like the C preprocessor. It expands the macros in your makefile until creating a makefile with only the core language constructions. The following features are available

There is a ton more of available features in the GNU macro language. The GNU Make manual has more than 200 pages!

6.4. The set of predefined macros

The output of make -p is indeed overwhelming. Yet, the only lines of concern for C and C++ are the following:

SHELL    = /bin/sh  # shell to run the rules
RM       = rm -f    # command to delete files
CC       = cc       # default C compiler
CXX      = c++      # default C++ compiler
CFLAGS   =          # C compiler flags
CPPFLAGS =          # C preprocessor flags
CXXFLAGS =          # C++ compiler flags
LDFLAGS  =          # linker flags
LDLIBS   =          # libraries

# build an object from a C source file
%.o : %.c  ; $(CC) $(CFLAGS) $(CPPFLAGS) -c -o $@ $<

# link an executable from an object
%   : %.o  ; $(CC) $(LDFLAGS) $^ $(LDLIBS) -o $@

# directly compile and link a C source file
%   : %.c  ; $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $^ $(LDLIBS) -o $@

# build an object from a C++ source file
%.o : %.cc  ; $(CXX) $(CXXFLAGS) $(CPPFLAGS) -c -o $@ $<

# directly compile and link a C++ source file
%   : %.cc  ; $(CXX) $(CXXFLAGS) $(CPPFLAGS) $(LDFLAGS) $^ $(LDLIBS) -o $@

When you run make without a Makefile, it is just as if this file was already present.

6. Bourbakist Makefiles

The human hand is perfectly optimized for grabbing a stone and throwing it to the eye of a mammoth. Using it for playing the violin is extremely awkward and unnatural; clearly not what it was made for. Yet, it is a beautiful act. Human civilization is all about using our God-given tools for other purposes that they were designed for. —Cmdr. Armando Rampas.

Forget about compiling. The make program is useful in a very general situation: whenever you want to run a complex pipeline with many intermediary files. Typically this task is correctly accomplished by writing a shell script or (god forbid) a python script. However, we show that using a makefile may be a better idea.

6.1. A simple shell script

For example, consider the following simple script that registers several images and computes the fusion of them all:

# input images: i{0..11}.png
# output image: out_med.png
# intermediate: i*.sift p*.txt h*.txt reg_*.png

IDX=`seq -w 0 11`
SIZE=`imprintf "%w %h" i00.png`

# compute sift descriptors of each image
for i in $IDX; do
        sift i$i.png > i$i.sift
done

# register each image to the first one
for i in $IDX; do
        siftu pairR 0.8 i00.sift i$i.sift p$i.txt  # match pairs
        ransac hom 1000 1 30 h$i.txt < p$i.txt     # find homography
        homwarp h$i.txt $SIZE i$i.png reg_$i.png   # warp
done

# compute the median value at each position
vecov med reg_*.png -o out_med.png

This is a classical computational photography problem, and this script is a perfectly acceptable way of solving it. Shell scripts are cool. Yet, we will see how to improve it a bit.

6.2. How to parallelize your shell script

This script runs the tasks in series. This is wasteful on a large computer with, say, 32 cores, because it could be running the tasks in parallel. No problem, GNU parallel is very easy to use. You simply print the instructions that you want to run, and pass the resulting text to parallel:

IDX=`seq -w 0 11`
SIZE=`imprintf "%w %h" i00.png`

# 1. compute sift descriptors of each image
for i in $IDX; do
        echo "sift i$i.png > i$i.sift"
done | parallel

# 2. register each image to the reference one

# 2.1. match pairs
for i in $IDX; do
        echo siftu pairR 0.8 i00.sift i$i.sift p$i.txt
done | parallel

# 2.2. find homography
for i in $IDX; do
        echo "ransac hom 1000 1 30 h$i.txt < p$i.txt"
done | parallel

# 2.3. warp
for i in $IDX; do
        echo homwarp h$i.txt $SIZE i$i.png reg_$i.png
done | parallel

# 3. compute the median value at each position
vecov med reg_*.png -o out_med.png

This version of the script will run all the tasks in parallel and will be much faster.

Is this the best parallelization possible? No. Notice that if, for example, one of the tasks on step 2.1. takes much longer than the others, there will be a long wait between steps 2.1. and 2.2., during which only one processor will be working. How can we solve this problem? In this case it seems easy, we just have to parallelize at a coarser level, sending to GNU parallel lines that contain the whole computation for each file. But if the dependences between files are more complicated, the problem becomes difficult very soon.

6.3. How to make your shell script restartable

Another issue with the first script is that it always runs all the steps. Imagine that you change the fusion criterion in the last line of the script. Then, when you re-run the script all the steps are performed, but this is wasteful because all intermediary files are identical. The typical solution to this problem is to COMMENT all the script except the lines that you want to re-run. But of course this is ugly. A slightly better option is to check whether the updated files will be changed or not before recomputing them:

IDX=`seq -w 0 11`
SIZE=`imprintf "%w %h" i00.png`

# 1. compute sift descriptors of each image
for i in $IDX; do
        test i$i.png -ot i$i.sift ||
        sift i$i.png > i$i.sift
done

# 2. register each image to the reference one

# 2.1. match pairs
for i in $IDX; do
        test i$i.sift -ot p$i.txt ||
        siftu pairR 0.8 i00.sift i$i.sift p$i.txt
done

# 2.2. find homography
for i in $IDX; do
        test p$i.txt -ot h$i.txt ||
        ransac hom 1000 1 30 h$i.txt < p$i.txt
done

# 2.3. warp
for i in $IDX; do
        test h$i.txt -ot reg_$i.png ||
        homwarp h$i.txt $SIZE i$i.png reg_$i.png
done

# 3. compute the median value at each position
test reg_0.png -ot out_med.png ||
vecov med reg_*.png -o out_med.png

Notice that this is just the original script, but you add an explicit test before executing each line. If the input file is older than the output file, then you do not run the line. This is a simple modification that turns your script into a much more useful one. It has the following nice properties:

Yet, it has the following problems

6.4. Replace your shell script by a makefile

There is, after all, a free lunch. If you rewrite your original script as a makefile:

# variables
SIZE       := $(shell imprintf "%w %h" i00.png)
INPUTS     := $(wildcard i*.png)
REGISTERED := $(INPUTS:i%.png=reg_%.png)

# default target
default: out_med.png

# 1. compute sift descriptors of each image
i%.sift: i%.png
	sift i$*.png > i$*.sift

# 2.1. match pairs
p%.txt: i00.sift i%.sift
	siftu pairR 0.8 i00.sift i$*.sift p$*.txt

# 2.2. find homography
h%.txt: p%.txt
	ransac hom 1000 1 30 h$*.txt < p$*.txt

# 2.3. warp
reg_%.png: i%.png h%.txt
	homwarp h$*.txt $(SIZE) i$*.png reg_$*.png

# 3. fusion
out_med.png: $(REGISTERED)
	vecov med $^ -o $@

Now you get, for free:

I say ``for free'' because this makefile has essentially the same length and complexity as the original script, and it is just as easy to write (once you are fluent in makefile language).

Notice that you can join several rules into the same target...

# variables
SIZE      := $(shell imprintf "%w %h" i00.png)
INPUTS    := $(shell ls i*.png)

# fusion of all registered images
out_med.png: $(addprefix reg_,$(INPUTS))
	vecov med $^ -o $@

# register each image to the first one
reg_%.png: i00.png i%.png
	sift i$*.png > i$*.sift
	siftu pairR 0.8 i00.sift i$*.sift p$*.txt
	ransac hom 1000 1 30 h$*.txt < p$*.txt
	homwarp h$*.txt $(SIZE) i$*.png reg_$*.png

...to obtain a very short makefile. Some people is really into this sort of thing. Personally, I prefer to give an explicit target for each intermediate file, but this is just a matter of taste. This is the kind of discussion to have among other native makefile speakers, sipping scotch next to the fireplace.

8. References

  1. The most important reference is the man page of make in your system: man make.
    For example, in no particular order:
  2. The output of make -p -f /dev/null, that prints the set of implicit rules.
  3. The comprehensive documentation of GNU make.
  4. The UNIX Programming Environment, B.W.Kernighan, R.Pike, Prentice-Hall 1984
  5. Andy Chu's two absolutely amazing articles on the combination of shell, awk and make.
  6. Gagallium's cool tricks for writing portable conditionals.
Last updated: 11 june 2017, Enric Meinhardt-Llopis