Building C projects
Tags: aimake internals c development | Written by Alex Smith, 2014-11-05
C is a compiled language, and as such, C programs need to be converted to executables in order to use them. Typically, programmers do this with some sort of build system, rather than running the necessary commands manually, because the compilation process for a large project tends to be quite complex.
NetHack's build system is one of the parts of the code that is more
frequently replaced in variants. The problem is that the build system
in NetHack 3.4.3 requires manual intervention from the end user in
order to work correctly, which nowadays (in 2014) is almost unheard
of. However, the normal methods of modifing the build system
(e.g. using autoconf
as in UnNetHack or AceHack, or cmake
as in
NitroHack) basically just add extra layers of abstraction; and in a
world as poorly standardised as that of building C, what typically
happens is that each of these layers is trying to work around
incompatibilities and nonstandard behaviour in the layers below it,
meaning that you have a complex stack built mostly out of workarounds.
Another issue is that there are many more steps in building a C program than most people realise. Some of these are handled by the build system, but others are typically left to the programmer or end user to do manually. This wastes everyone's time, really; the vast majority of the build process can be automated, and thus should be, because the time of a computer is so much less valuable than the time of a human (and computers are faster at it, anyway).
For NetHack 4, I aimed to flatten these levels of abstraction as much
as possible; instead of writing yet another wrapper around yet another
set of poorly standardised existing tools, I'm using aimake
, a
build system I wrote originally for NetHack 4 (but is not specific to
it). It's still somewhat unfinished – there are known bugs, missing
features, and interface issues – but it works quite effectively for
building NetHack 4, and hopefully will become more widely used some
day, when it's ready for release.
There are two main differences between aimake
and other existing
build tools. One is that aimake
aims to avoid relying on
lower-level build tools; most build systems work by producing output
that serves as input to other build systems, and this can go down
multiple layers before reaching anything that actually builds the code
(e.g. with GNU Autotools, Makefile.am
is the source for
Makefile.in
which is the source for Makefile
, which contains
instructions for actually building the code). The other is that
aimake
automates many more of these steps than most build systems
do, handling steps which are often left to humans to deal with.
One problem with this method, though, is that it can lead to gaps in
understanding; most people who work on build systems are merely trying
to adapt to existing tools, rather than actually handling the build
themselves. This can make aimake
quite hard to understand, as it
works at a relatively unfamiliar level of abstraction. Therefore, I'm
writing this blog post to explain what actually goes on when you build
a C program. A disclaimer: this post focuses on how things typically
work on modern consumer operating systems (especially Linux and
Windows), rather than trying to cover all possible obscure systems. C
doesn't have to work like this. It just normally does in practice.
Normally, I'd start with the first step of the build and move forwards from there. However, that order of presenting things is actually quite confusing when it comes to building C; many of the steps are only needed to prepare for future steps, and it's hard to explain something without first explaining why it's needed. Thus, I'll start in the middle and work backwards and forwards from there. However, I'll first list the steps in order, before explaining them out of order (this order is not set in stone, but is a sensible and logical one to use):
- Configuration
- Standard directory detection
- Source file dependency calculation
- Header file location
- Header precompilation
- Preprocessing
- Compilation and assembly
- Object file dependency calculation
- Linking
- Installation
- Resource linking
- Package generation
- Dynamic linking
Most build systems handle steps 3, 5 to 7, 9 and 10 in this list
(leaving the others to be done manually, before or after the build).
aimake
currently handles 1 to 11 and has made a start on 12. Step
13 is almost always handled by the operating system.
And now, the main body of this post, a description of the build itself.
6: Preprocessing
As an example throughout this blog post, I'm going to use the following Hello World program in C.
#include <stdio.h> int main(void) { fputs("Hello, world!\n", stdout); return 0; }
Right now, I want to focus on the #include <stdio.h>
line. A common
misconception among people new to C is that it has something to do
with libraries, typically that it causes the definition of fputs
to
be "linked in" to the program somehow. Actually, its effect is quite
different: it serves to tell the compiler what types fputs
and
stdout
have. The above program is, on Linux, 100% equivalent to
this one:
struct foo; typedef struct foo FILE; extern int fputs(const char *, FILE *); extern FILE *stdout; int main(void) { fputs("Hello, world!\n", stdout); return 0; }
We no longer have any mention of stdio.h
anywhere, and yet the
program still works. So clearly, it's nothing to do with linking.
What's actually going on here is that stdio.h
is just a list of
declarations of types, functions, and variables, together with some
macro definitions. The C standard places no restrictions on how a
compiler chooses to implement these definitions, but in practice,
headers like stdio.h
are nearly always implemented in a slightly
extended dialect of C (that uses compiler-specific extensions to do
things that could not be expressed in C directly, such as adding
buffer overflow check code based on the lengths of arrays in the
source program).
In this case, we're defining FILE
as an incomplete type (the
compiler doesn't need to know the definition of FILE
to run the
program, just that it's some sort of struct
, because all sane
compilers treat pointers to all struct
identically); fputs
as a
function from a const char *
and FILE *
to an int
; and stdout
as a variable of type FILE
. The extern
means that this is just a
declaration, and this file does not necessarily contain a definition
of the function and variable in question. (We'll see where the
definition actually is later on, when the linker gets involved.)
On Linux, stdout
is actually a variable. On Windows using mingw
,
however, an accessor function is needed to find the value of the
standard filehandles stdin
, stdout
and stderr
, so the
preprocessor expands the program to something like this (I've omitted
some details, and thousands of irrelevant lines):
struct foo; typedef struct foo FILE; extern int fputs(const char *, FILE *); extern FILE *__iob_func(); int main(void) { fputs("Hello, world!\n", &__iob_func()[1]); return 0; }
We can now see what the purpose of standard header files like
stdio.h
is: it's to paper over system-specific differences in the
implementation details of the standard library. What the preprocessor
is doing, then, is evaluating "preprocessor directives" like
#include
(to use definitions in a header file) or #define
(to do
symbol substitution on a source file, such as replacing stdout
with
&__iob_func()[1]
). The result is a C source file that you could
just have written directly, but is a little more system-specific than
would be appropriate for the source files of your project.
In addition to standard header files like stdio.h
, projects can also
have their own header files (e.g. NetHack's hack.h
). From the point
of view of the preprocessor, these work exactly the same way as
standard header files, and they contain similar content (although,
obviously, tend to be less system-specific). These are distinguished
by using double quotes rather than angle brackets, i.e. you would
write #include "hack.h"
but #include <stdio.h>
.
The preprocessor is also traditionally responsible for removing comments from the source. It still does by default, for backwards compatibility, but modern compilers have no issue with comments, and the build process will work just fine even if you tell the preprocessor to leave the comments in.
Nowadays, the preprocessor is nearly always built into the compiler,
because they work so closely together. However, the preprocessor can
also be run individually. The standard name for a command that does
preprocessing is cpp
(standing for, I believe, "C
Preprocessor"); this is normally implemented as a wrapper for
the C compiler, providing the -E
command-line option to tell it to
do preprocessing only (and in fact, aimake
currently uses the -E
method when it needs to access the preprocessor individually in order
to prevent mismatches between which preprocessor and compiler it
uses). The standard filename for preprocessed output is .i
.
5: Header precompilation
In the early of C programming, the preprocessor worked on a very low
level, pretty much just textually manipulating the source. Although
preprocessors can still do that – in case someone decides to use them
for something other than C, which is not unheard of – this rapidly
runs into scaling problems in practice. The problem is that the
standard headers have to be designed to handle all possible programs
that might want to include them; our Hello World program just used
fputs
and stdout
, but stdio.h
has many other features, any or
all of which a program might use. The result is that our preprocessed
Hello World program is 1192 lines long; most of them are blank, and
most of the rest are unused, but the compiler has no way to know this
until after preprocessing.
A traditional build process would preprocess each source file individually, meaning that each header file would have to be read by the preprocessor once for each source file. This has two problems. One of these is that it takes a long time (especially when using C++, whose build process is pretty similar to that of C). Modern computers are fast enough that the time taken is of the order of tens of seconds, rather than days, but even having to wait an extra minute in your build can significantly slow down your development process (because it reduces the rate at which you can use your compiler to catch errors, and means you can't get useful feedback from an IDE). The other problem is that if there is, say, a mistake in a declaration in a header file, this will cause an error compiling every single file that includes that header, whether it cares about the declaration or not.
Given that modern compilers work so closely with their preprocessor
counterparts, there is a solution to these problems: the preprocessor
and compiler can cooperate to compile as much of a header file as
possible ahead of time, without needing to know which source files
will eventually use it. Although not every part of a header file can
be precompiled (e.g. the effect of the textual substitution that
#define
performs is impossible to precalculate because it might end
up being stringified), much of a header file is declarations, which
can be. This means that the header file only needs to be parsed once
(rather than once per file that uses it), and that errors in it may be
caught before it's used. (It also helps in the common case where the
source files are changed without changing any headers; the same
precompiled header can be used as last time.)
Although relatively old (I remember using them back in the 1990s),
precompiled headers are nonetheless a newer innovation than most other
parts of the C toolchain, and thus are relatively nonstandardised.
gcc
and clang
use the extension .h.gch
extension for precompiled
headers; Visual Studio seems to use .ipch
.
aimake
currently always precompiles headers. I'm not 100% convinced
this actually saves time – although it speeds up the compiler in most
cases, it slows down aimake
because it has to keep track of them –
but the improved error messages are worth it.
4: Header file location
In order to be able to include header files, you have to be able to
find them on the filesystem. The algorithm used by the vast majority
of preprocessors is very simple: the preprocessor has a hardcoded list
of directories, which the user can add to, and the preprocessor will
scan them in sequence until it finds the file it's looking for. In
the case of include "file.h"
, it will also scan the directory
containing the source file.
Although sufficient for small projects, in large projects, this
approach is highly inadequate. Unlike in a small project, large
projects will typically be formed out of several, mostly unrelated,
parts (e.g. NetHack 4's source ships with libuncursed, which was
originally written for NetHack 4 but is conceptually separate, and
with libjansson, which was written entirely independently of NetHack
and chosen as a dependency by the NitroHack developer). And with
disparate sources comes naming clashes. So far, NetHack 4 has managed
to avoid naming clashes among header files we ship (although there are
currently six names which are used for multiple source files:
dump.c
, log.c
, messages.c
, options.c
, topten.c
,
windows.c
), but this increasingly becomes unlikely as projects get
larger.
There is another problem, which has a simple solution, but is worth
mentioning because it catches some people out: not all C compilers are
configured correctly for the system they run on, and may miss a few
system headers as a result. I've most commonly seen this happen with
the directory /usr/local/include
on Linux systems, which is intended
for system-wide header files that were installed manually, rather than
by the operating system's package manager. The fix is just to add the
directory as an include directory for every compile using command-line
options.
Anyway, as a result of the potential for name clashes, the set of header files to include for each individual source file needs to be determined for that file individually. The normal solution in build systems is to do this manually; the user will specify a list of directories to scan for header files, typically a project-wide list with options to override it for particular source files. Although these specifications tend to be quite simple to specify with some sort of loop, and are not hard to write, the fact that they need to be written at all increases the chance of human error (and this sort of error has definitely caused problems in practice). Their existence also encourages the tendency to use multiple nested build systems with different configurations that is frequently seen in large projects, and which is nearly always a bad idea. (There is a relatively famous paper about this called Recursive Make Considered Harmful [PDF].)
In aimake
, I use the approach of scanning the entire source tree for
header files with appropriate names, then working out which header is
intended via comparing paths (we take the closest matching header file
in terms of directory structure). Different parts of a project have
to be placed in different directories anyway when using a hierarchical
build, and doing so is sensible anyway, so this is not a particularly
odious requirement.
It's also necessary to scan the system include directories for header
files, because they aren't always at a consistent location on every
system. For instance, on my Ubuntu laptop, the Postgresql header that
NetHack 4's server daemon needs is stored at
/usr/include/postgresql/libpq-fe.h
. On some other people's systems,
it's just /usr/include/libpq-fe.h
. The solution to this problem
I've most commonly seen used in practice is to require the user to
specify the location as a configuration variable, then use conditional
compilation to select between #include <postgresql/libpq-fe.h>
and
#include <libpq-fe.h>
respectively. Getting the build system to
just search for the header file in question on the file system is
simpler, and requires less human intervention, and thus is preferable.
aimake
does this, in addition to scanning the source tree.
There is one other subtlety here: because the preprocessor works at
the level of directories rather than files, it's impossible to exclude
a file from the header file search if another file in the same
directory is included. This might seem minor, as there's no reason to
arrange header files like that during the actual build. What it does
do, however, is make it much more difficult to reliably discover
dependency loops; a build system wants to avoid using files that
aren't meant to have been built yet (and are left over from a previous
compile), but the directory-level granularity of a preprocessor's
header file search makes this impossible to avoid without using a
separate directory for each generated header file (in case some are
meant to have been generated at that point in the build, but not
others). As a workaround for this problem, and for numerous other
reasons, aimake
now copies each header file in the source tree to a
directory of its own in the build tree before building (with a #line
directive so that errors are still reported against the original
source file); I haven't had to resort to this for system headers,
because they can't be confused with user headers due to the different
syntax and system toolchain maintainers work to avoid clashes, but it
would be a possibility if necessary.
It should be noted that the original reason I wrote aimake
was that
I'd accidentally introduced a dependency loop into a beta of my
"attempt to merge AceHack with NitroHack" in late March 2012, a couple
of days before it got the name NetHack 4, and couldn't figure out what
was going wrong with the build (back then, I was using the cmake
build system inherited from NitroHack, which didn't realise that
anything was amiss and was hopeless at giving appropriate feedback for
this kind of mistake). Because I was under time pressure (wanting to
get the program ready for April 1), I quickly knocked a script
together as a replacement build system to get the build working, and
after a couple of rewrites and numerous smaller changes, that script
eventually became aimake
.
3: Source file dependency calculation
When working on a large project, you want to avoid recompiling it
every time there are changes made to any file; rather, you want to
compile just the bits that actually changed. This is probably the
motivating factor behind the earliest build process automation tools;
this sort of optimization is hard to express in a shellscript or batch
file, leading to the creation of tools like make
, which compare
timestamps to determine what needs to be done in any given build.
I'd argue that nowadays, trying to generate input for make
is
probably the #1 biggest mistake you can make in a build system (even
though it's very common practice; the problems being that dependency
resolution is a relatively small part of a build, and working around
differences in different make
implementations (some of which are
very primitive) is quite complex, and thus it's normally simpler to
just resolve the dependencies yourself rather than trying to get
make
to do it. make
is also quite limited in what it can do; for
example, it has no obvious support for rebuilding a file when the
build rules for that file in the Makefile
change (as opposed to
rebuilding nothing, or the entire project). Most build systems work
as wrappers around make
(or the equivalent for IDEs like Visual
Studio); aimake
just skips this level of abstraction and handles the
dependencies itself. (I'm not the first person to come to this
conclusion; for instance, the standard toolchain for Perl used to use
ExtUtils::MakeMaker
, which generated Makefile
s, but is now moving
in the direction of Module::Build
, which does the dependencies
itself.)
Anyway, regardless of whether you use make
, aimake
, or some
completely different tool to process dependency calculation, you need
to know what those dependencies actually are. This was originally
left to the person writing the code to specify, and often still is in
simple projects. However, as a task that can reasonably be automated
(and that is somewhat time-consuming to do by hand for large
projects), automating it makes a lot of sense.
Something many people don't realise about dependency calculation is that correct dependency information is actually needed for two different purposes:
- Ensuring that if a file's dependencies change, it will be rebuilt.
- Ensuring that a file won't be built until after its dependencies.
The most common method of dependency automation nowadays is to get the preprocessor to report on every header file it includes, during the build (and preprocessors have pretty advanced support for this; most can generate Makefile fragments directly). However, this leads to a chicken-and-egg problem (you can't calculate dependencies until you compile, but can't compile without knowing the dependencies). The standard solution to this (e.g. as used by GNU autotools) is to start the first build with no dependencies. This satisfies goal 1 (because on the first build, everything has to be rebuilt anyway, thus it doesn't matter that you don't know its dependencies), but fails goal 2, which tends to be dealt with using ugly workarounds. (Just look at this mess.)
An alternative tradeoff (and one that aimake
uses, although other
choices are reasonable depending on what your goals are) is to run the
source files through the preprocessor in a separate stage, without
compiling them, to get hold of the dependencies. Thus, in aimake
,
this is step 3 of the build (whereas with many other build systems, it
would be step 6).
Although this seems simpler than producing dependencies as a
side-effect of a compile would be, modern preprocessors actually make
it much harder than you would think. The first problem is that they
include header files transitively; this is what you want for a build
(because you're actually building), but not what you want for
dependency calculations (it's much faster, in the sense of "linear
rather than quadratic", to look at what headers are needed without
actually compiling them). The next problem is that the headers in
question might not exist yet (e.g. they're generated files), and (in
the order aimake
does things) won't have been located yet. At least
gcc
and clang
can be told to report on but ignore nonexistent
files, which would seem to solve this problem.
My first attempt at implementing dependency automation was to ignore
the first problem (as it only affected performance) and use the
obvious solution to the second, but it lead to unexpected compile
failures in NetHack, due to a reasonably subtle problem. What
happened was that one of the files in NetHack is called magic.h
;
although it's part of the source tree, it hadn't been located yet (and
in general, because a good build system would handle arbitrary files
being generated during the compile without prior warning, there's no
point in locating header files early because that won't help with
generated files). This is fine if there's no file magic.h
anywhere,
because then gcc
and clang
will treat it as empty and just report
on the dependency, which is exactly what we want. However, some
operating systems have a system header magic.h
(which I believe is
related to detecting file formats), leading the preprocessor to report
a dependency on the system magic.h
(and all its transitive
dependencies) rather than the one in the source tree!
Clearly, the behaviour I actually needed was to treat all header files
as empty, rather than actually including them. Sadly, though,
typically preprocessors don't have an option to do this. I actually
seriously considered writing my own preprocessor to deal with the
problem, but it would have been a mess and probably too large
(aimake
is implemented as a single file that you can ship with your
project, much like configure
, so I want to keep the size down if
possible; it's already a little over ten thousand lines long). I
eventually found a simpler solution, even if it is a bit sad: aimake
reads the header files itself, looking for #include
statements, then
creates an empty file named after each header it finds, and tells the
preprocessor to look there for include files.
Note that one limitation of this method of dependency handling is that a file can't determine which header files to include dynamically based on constants included in a different header file. There are plausible useful reasons to do this, and it's permitted by the C standard. However, looking at what actually happened in practice in NetHack 4, it turned out that almost every location where it happened was a mistake that caused header file inclusion order to be relevant when it shouldn't have been (and the other cases were reasonably fixable). So again, this seems like a reasonable tradeoff to make.
2: Standard directory detection
In order to be able to do any sort of processing of the system headers
and libraries, you first have to know where they are. With header
files, you need to know the location of the system headers to be able
to scan the subdirectories of that location for header files like
libpq-fe.h
, which I discussed earlier. Libraries intended for
public use are rarely stored in subdirectories; however, aimake needs
to be able to look inside them in order to determine which library
holds a particular symbol, so it needs to find the set of standard
libraries, too.
It's actually a surprisingly hard problem to determine where the system headers and libraries are stored with no hardcoded knowledge of the system you're on, and this may be part of the reason why header file location and object file dependency calculation are normally left to humans by most build systems, rather than being automated. The headers and libraries could be just about anywhere on a system (e.g. on my Windows system, I installed the compiler to a subdirectory of my Documents directory, which could have had an entirely arbitrary name); assuming that all we know the location of is the executables of the C toolchain, can we find the rest?
I tried several techniques, and eventually settled upon two that seem
reasonably portable (by which I mean that they work on Linux with
gcc
/clang
and GNU ld
/gold
, on Windows with mingw, and
apparently on Mac OS X too, although that's mostly been tested by
other people because I don't have a Mac handy):
For header files, the preprocessor must know where the header files are in order to be able to include them; and although there's no standard option for asking where they are, there is a standard option (
-M
) for reporting dependencies. Thus, all we need is to produce a file with one dependency in each of the possible standard header file directories.Currently,
aimake
uses five header files as dependencies. There's a full explanation in its source, but here's a quick summary of the possible directories that header files can be sorted into on systems I've seen, and which header fileaimake
uses as a test:Header files that ship with the compiler (
<iso646.h>
, because there is literally no reason not to ship it with a compiler, especially as many C libraries don't ship it);Header files that were patched by the compiler (
<limits.h>
, the only file thatgcc
patches unconditionally);Architecture-dependent header files (
<sys/types.h>
; none of the standard C headers are stored here on most systems which have it, so I used the most architecture-dependent POSIX header);Header files that ship with the C library (
<setjmp.h>
is the only header I've found that's in this category on every system I tested, probably because it's too weird for the compiler to mess with it);Nonstandard (i.e. non-C/POSIX) header files that shipped with libraries (
<zlib.h>
, because zlib is one of the world's most widely-used libraries).
For libraries, again there's no standard way to find out where the linker is looking, even though it must know; I attempted to use
gcc --print-search-dirs
even though it'sgcc
-specific, but it appears to only find library directories thatgcc
needs for its own internal use, rather than all the library directories on the system. The most portable method I've found is to ask the linker to link in each of the libraries we care about to a small test file (one at a time so that the failure to find a library won't affect the search for the others), and get it to produce a debug trace as it does so.There are several problems with this method (even though I use it anyway). The first is that although the option to ask linkers for a trace is standard (
-t
), the format of the output is not, and can vary wildly even on a single linker depending on the reason that the library is being linked in. Here's a current regexaimake
might use for parsing the output (with the partaimake
uses to avoid matching the source file removed, and with file extensions set appropriately for Linux; the file extensions need to change on other systems):/^(?:(?:-l|lib)[^ ]+ )?\(?(.*?\/.*?(?:\.a|\.o|\.so)(?:\.[0-9]+)?)(?:\)|\([^\/]*\)|\z)/
The next problem is that, looking at the files matched by this, sometimes some of them will be text files! This problem is due to the fact that library developers sometimes have to do weird things for backwards compatibility. Here's what
libncurses.so
looks like on my Ubuntu system, in full:INPUT(libncurses.so.5 -ltinfo)
I don't know for certain, but my guess as to what happened here is that
libncurses
got split into two libraries sometime in the past (libncurses
andlibtinfo
), and thuslibncurses.so
became a wrapper linker script that pulls in both libraries. Obviously, this script is useless for things like determining which symbols exist in which libraries. However, the linker trace will also contain both of the libraries it pulls in, so all that's necessary with such wrappers is that we ignore them.aimake
scans the first kilobyte or so of the library looking for non-ASCII characters; if it fails to find any, it assumes that the "library" is actually just a linker script.Another issue is that, if we don't use at least one symbol from a library, the linker may optimize it out; and in some circumstances (e.g. if it's a static library), it then won't show on the linker trace output.
aimake
currently only searches for libraries that the user mentions might be required in the configuration file (for performance reasons; searching every library on the entire system for symbols takes about an hour on my laptop), so I solved this by asking the user to mention the name of a symbol that they expect to be in that library, andaimake
then forces the linker to link that symbol using the standard-u
option. An added benefit of this approach is that it ensures that we find the library we expected, rather than a different library with the same name.A bad side-effect of this is with respect to the standard libraries of the system, the ones that are linked in by default. (In addition to needing to analyse these in order to do dependency calculations correctly, as any dependencies on them won't need to be explicitly satisfied, we also need to not pass them to the linker, because Mac OS X's linker complains and refuses to build if you try to explicitly link a standard library.) We can't know how those are divided up between files, so specifying a symbol is of no use. Thus, I ended up having to settle on the rather drastic method of asking the linker to link in every symbol in the standard libraries (
--whole-archive
, plus--allow-multiple-definition
to reduce the amount of error spam you get as a result). This normally doesn't produce a usable output file (nor could it really be expected to), butaimake
doesn't care about this, just about the trace output. Incidentally, it causesmingw
's linker to crash, but only after it's already printed the trace outputaimake
needs. (This is why building NetHack 4 on Windows shows an error dialogue box aboutld.exe
crashing.)
All this annoys me, as something this apparently basic really shouldn't be this difficult. I guess the problem is that nobody really expects someone to try to determine inherently system-specific information (where the headers and libraries are installed) in a system-agnostic way.
1: Configuration
Traditionally, the first thing done in any build is to fix values for
the parts of the build that vary from build to build. This includes
everything that needs to be specified by a human before the build
starts; thus, configuration is quite lengthy and time-consuming with
NetHack 3.4.3, as you have to answer questions like "what terminal
codes library does your system use?". aimake
cuts down considerably
on this sort of question, because it can determine so much by itself,
but there are always going to be questions that only a human can
answer, such as "are you interested in building the tiles ports", or
"do you want me to install this to your home directory or
system-wide?". (At least aimake
mostly manages to condense all
these choices down to a couple of command-line options. It does have
one major problem in terms of user-friendliness right now, though,
which is that there's no simple way to specify unusual options you
want to pass to the compiler, or similar special build rules; you can
do it but the syntax is horrendous. This is something I need to work
on before aimake
is really ready for production use.)
However, there are also lots of questions whose answers can be
determined experimentally, and which a computer can answer as easily
(in fact, more easily) than a human. This is normally performed by a
program named configure
(typically, but not always, generated by a
GNU autoconf
), with a separate configure
program written for each
project and shipped with the distribution tarball. configure
's job
is to paper over nonportabilities between systems by determining what
needs to be done on each one.
Just as happened with curses, though (as I wrote in
my blog post on portable terminal control codes), configure
is
increasingly solving the wrong problem. I have written systems based
on GNU Autotools that aimed to do everthing by the book as much as
possible, but it was mostly as a joke / thought exercise; modern
programs don't need to worry about, say, stdlib.h
not existing (and
don't really have an obvious workaround for if it doesn't). At least
autoconf
's documentation mentions that that particular check is
obsolete nowadays, and recommends leaving it out.
There are several checks that are more useful, but so far, I've only
actually needed two checks of this nature in NetHack 4 (and aimake
has support for doing them), a very small portion of the build system
as a whole (and not something that warrants being one third of the
"standard" build sequence, which for autotools, is configure
,
make
, make install
). One is about how to set the compiler to C11
mode (--std=c11
, -std=c11
, or nothing and let the compiler fall
back to non-standards-compliance mode, which contains all the C11 and
C99 features we use in the case of gcc
and clang
). The other is
how to indicate that a function never returns: is it _Noreturn
(the
version recently standardised in C11), __declspec(noreturn)
(used by
Microsoft compilers), or __attribute__((noreturn))
(used by gcc
and clang
before the standard syntax was invented)? I originally
tried to detect this using compiler predefined macros, but this caused
problems on some specific old versions of clang
(which are still
used nowadays on some Mac OS X systems). Neither of these checks
exists in my current version of autoconf
, so it wouldn't have helped
much here.
It should be noted that this is the most user-visible step (because it
requires the most manual intervention), and also the one that varies
the most between build systems. Thus, it needs the best interface,
and it's a particular pity that aimake
falls down here.
7: Compilation and assembly
We've now seen pretty much everything that happens at the C level, from the source files up to the preprocessor. As a recap from earlier, the preprocessor's output is typically a slightly extended C, which contains declarations for all the relevant parts of the standard library (and typically for irrelevant parts, too, but those get optimized out). The next part of the build process is the compilation and assembly, which takes the preprocessor's output and transforms it into an "object file".
Just as the preprocessor's output is slightly extended C, the object
file (which is the linker's input) is slightly extended machine code.
The difference from the machine code that a processor actually runs is
that it's full of instructions to the linker, as well as instructions
to the processor; typical linker instructions are along the lines of
"this string is read-only data, place it with the other strings for me
and tell the MMU to mark it read-only", or "I know I said to call the
all-zeroes address as a subroutine, but can you replace that with the
address of strcmp
when you figure out where it is?". In short, an
object file contains as much of the machine code as can be calculated
by looking at one file in isolation; the rest is made out of linker
instructions to patch up the file later on in the build.
There are two basic approaches that can be used for this C to machine
code transformation. One is to do it directly; the other is to go via
assembly code (which is almost as low-level as machine code, but which
is much more human-readable, and which can express various linker
instructions that machine code can't. When assembly code exists as an
intermediate step (or sometimes even when it doesn't – some compilers
can produce it on demand), compilers typically support the -S
option
to let you take a look at it. Our Hello World program expands to just
30 lines of assembly. It's also possible to "disassemble" the object
file, to determine what its machine code would look like as assembly,
to make it more human readable. Here's the Hello World program as C
(as a reminder):
#include <stdio.h> int main(void) { fputs("Hello, world!\n", stdout); return 0; }
And here's how it looks as assembly produced by the compiler, and how the object file looks after disassembly (rearranged so that common lines are next to each other; my disassembler also printed the machine code as hexadecimal, but I've removed that to save space, even though it leads to the newline and NUL at the end of "Hello, world!" displaying ambiguously as periods):
.file "t.c" .section .rodata .LC0: .string "Hello, world!\n" .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movq stdout(%rip), %rax movq %rax, %rcx movl $14, %edx movl $1, %esi movl $.LC0, %edi call fwrite movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2" .section .note.GNU-stack,"",@progbits |
Contents of section .rodata: Hello, world!.. Disassembly of section .text: push %rbp mov %rsp,%rbp mov 0x0(%rip),%rax mov %rax,%rcx mov $0xe,%edx mov $0x1,%esi mov $0x0,%edi callq 22 <main+0x22> mov $0x0,%eax pop %rbp |
On the left is the assembly output for the Hello World program; we can
see that it contains many notes for the linker (the lines starting
with .
), and various constructs that don't exist in machine code
(e.g. it can say call fwrite
, and mention stdout
, even without
knowing where fwrite
and stdout
are in memory). Comparing this to
the machine code produced by disassembling the object file, we can see
that many of the lines are basically the same, but some have bits
replaced by zeroes; movq stdout(%rip), %rax
cannot sensibly be
interpreted as mov 0x0(%rip),%rax
, for instance. Likewise, the call
to fwrite
has been replaced by a call to the current instruction
pointer location (which also translates as zeroes in machine code).
It's worth noting that, even though this program was compiled without
optimization, the compiler has done a small optimization anyway; the
original Hello World program called fputs
, but the compiler has
compiled it into a call to fwrite
. Both these functions need to use
a buffer to store their output, according to the C standard; fputs
has to do a series of strncpy
or equivalent to fill the buffer
(strncpy
instead of strcpy
because it has to allow for the
potential that the string might be very long and so overflow the
buffer, causing it to be output in sections), whereas fwrite
knows
the length before it even starts, and (for short strings like this)
can memcpy
it into the buffer directly. memcpy
is faster than
strncpy
because it can copy more than one character at a time
(strncpy
has to take care not to read past the end of the string,
and doesn't know where that will be in advance), so fwrite
should be
faster than fputs
. (We can see the "14" in the assembly code, that
isn't in the original program, containing the length of the string
"Hello, world!\n"
that appears in our program, not counting the
terminating NUL because that isn't output.)
The object file places its output into various sections, which tells
the linker (and eventually the OS kernel) about the purpose of the
data stored there. Our main
function goes into a section that is,
rather confusingly, named .text
; this contains the actual code that
makes up our program. (This is on Linux; on other operating systems;
the name of the section varies, but is nearly always some variation on
"text" for historical reasons, even though executable code is about as
binary on the binary/text distinction as you can get.) The string
"Hello, world!\n"
itself is in a separate section, .rodata
, which
stores read-only data.
The .cfi_
lines tell the assembler to generate stack unwind
information, which is mostly irrelevant here (it's used by exception
handling in C++ to figure out which catch
blocks might be relevant
for the current code location, and by debuggers to produce stack
backtraces); they also compile into a section (.eh_frame
) in the
object file.
In addition to the sections, our object file also contains headers,
which contain instructions to the linker; the information about names
like main
and stdout
hasn't just disappeared, but as it can't be
expressed in machine code, it needs to be expressed some other way.
On Linux, we can see these headers using objdump -x
. This is more
than a screenful of text, but here are some interesting excerpts:
SYMBOL TABLE: [...] 0000000000000000 g F .text 0000000000000029 main 0000000000000000 *UND* 0000000000000000 stdout 0000000000000000 *UND* 0000000000000000 fwrite RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 0000000000000007 R_X86_64_PC32 stdout-0x0000000000000004 0000000000000019 R_X86_64_32 .rodata 000000000000001e R_X86_64_PC32 fwrite-0x0000000000000004
The "symbol table" tells the linker about any identifiers in the
source code that need to be given a consistent meaning between files.
In this case, main
is 0x29 bytes from the start of the .text
section, and the compiler doesn't know where stdout
and fwrite
are. Meanwhile, the "relocations" tell the linker which parts of the
machine code will need patching up; it's asking the linker to
substitute in appropriate values for just before stdout
, just before
fwrite
, and for the start of this file's part of the .rodata
section (which is where the "Hello, world!\n"
string is stored).
The compiler cannot figure out where any of these will be; it can't
know where the .rodata
section is relative to the .text
section
because it doesn't know how large those sections will be, and stdout
and fwrite
are in the standard library, not in the Hello World
program itself. The hexadecimal numbers in the left column tell the
linker where in the .text
section the new values will need to be
substituted (and the central column explains what format the linker
should use for the substitutions).
The traditional name for a compiler is cc
, and for an assembler is
as
. However, the compiler also traditionally runs the linker once
it's done; although appropriate for very small programs, this is not
what you normally want for a large project (where you're going to want
to recompile just the source files that changed, but relink all the
executables that depend on any file that changed). Thus, it's typical
to tell the compiler to do just the compile (and assemble, if
necessary) using the -c
command line option, which is standard.
Object files have the extension .o
with most toolchains, although
some Windows toolchains use .obj
.
8: Object file dependency calculation
Once we have our object files, the next step is to work out how to fit
them together to form executables. There are two sources of object
files; one is the C source files we just built, and the other is the
system libraries. (System libraries can ship object files directly;
more commonly, though, they're bundled together into libraries of
object files, with extensions like .lib
or .a
, which are simply
archives containing multiple object files, typically with some extra
metadata to allow the linker to figure out which of the contained
object files are relevant more quickly. On Linux or UNIX-based
systems like Mac OS X you can use the command ar x
to unpack a
library and look at the object files inside it directly.)
In order to produce an executable, we need a list of object files and
libraries that together have no unmet dependencies; any undefined
symbols in any of the object files (such as stdout
in our Hello
World object file) need to be defined by one of the other object
files. We also need an "entry point" to the program, so that it has
somewhere to start running. (The way this is typically implemented is
for the entry point to be somewhere in a standard library, normally in
an object file whose sole purpose is to provide an entry point; and
that object file calls main
, which it leaves undefined, performing
initialization before the call and cleanup afterwards.)
Thus, one thing we need to do as part of the build is to produce such a list. This is a job that's normally left to the programmer, and in my opinion, making the programmer do it is a terrible idea. As usual with jobs that are left to humans, it's error-prone; unlike some of the other things that are normally left to humans, it's also somewhat time-consuming, as whenever you add a new file to your project, you'll need to add it as a source for one or more of your executables (or libraries, if you build libraries as part of your build).
This is thus an excellent candidate for automation, and the concept
here is pretty simple: we have one executable for each main
function, and we find the files required to produce it by recursively
looking for object files or libraries that define symbols that are
undefined in the object files that are already part of the build. In
the case of our Hello World program, aimake
would start by checking
the symbol table for our main program, finding it needed definitions
of stdout
and fwrite
, then finding those in the standard library,
and it'd be done.
One potential problem here is if symbols with the same name are
defined in multiple files, but aimake
uses heuristics to figure out
which file to link that work excellently in practice for correct
programs. However, there's another problem: when the input programs
contain subtle errors (such as linking against implementation details
of a library defined elsewhere in the project, rather than using the
public API of that library), aimake
can normally find a solution
that works anyway (for this example, linking against the relevant
object files in that library directly), even though we wouldn't want
it to. At the moment, I check to ensure this hasn't happened by
inspecting the symbol tables manually once every few months, but this
is unsatsifactory. Ideally, there should be some way to tell aimake
(or some way to have it automatically deduce) that certain files
should be kept separate; a crude way to do this at the moment is to
define symbols with the same name in each to force a link error, but
this isn't really satisfactory.
In order to actually discover which symbols are defined (or referenced
but undefined) in the various object files, we need to look at their
symbol table headers. We could use a program like objdump
for this,
but it's rather system-specific; there's a more generally usable
program, nm
, that deals with symbol tables specifically. Again, we
have to deal with variations in the output format, and come across a
particularly weird problem: POSIX defines a standard output format for
nm
, but the way you get at that format varies from nm
implementation to nm
implementation, making it not particularly
useful as a standard. Meanwhile, the default output format of nm
is
consistent enough that just matching it directly seems to cause no
problems.
9: Linking
Once we know which object files we need to combine to make an executable, the next step is to use the linker to actually make our executable.
The first job of the linker is to work out how to lay everything out
in memory; it will generally combine sections with the same name into
contiguous blocks. On modern systems, the sections keep their
identity all the way into the executable, and so combining them means
that the executable's headers will be simpler and shorter. There are
multiple reasons that the executable needs to know about the sections.
One is to correctly configure permissions on memory while the
executable is running; the kernel will tell the computer's memory
management unit (MMU) which parts of memory are supposed to be
readable, writable, and/or executable, and violating these rules will
cause a crash. For our the sections used by our Hello World program,
.rodata
is readable but not writable or executable (which is useful
for diagnosing errors in which you try to write to a string literal,
or to deallocate one); .text
is executable and readable but not
writable (for security reasons, toolchains try to avoid memory that's
both writable and executable as far as possible). Another reason is
that some of the sections can be optimized out of the output entirely;
the .bss
section is for read-write data that's initialized entirely
with zeroes, which is a common enough special case that it makes sense
to not have to pad the executable with that many bytes of zeroes. The
kernel knows about this rule, but needs the .bss
section identified
so it can allocate memory for it at runtime.
The linker has a lot of control over the memory addresses used by the program; object files typically have no opinion on where something should be placed in memory, but it's in the linker's power to fix all the addresses within the resulting file, and this is what it does in typical usage. (There's no need to worry about clashes between executables; the MMU will place each executable in its own separate address space, so each executable has perfect control over how it wants to lay out memory.)
Once the linker's decided all the addresses that the program should
use, it creates an output file that's basically a memory image of each
section, and substitutes in addresses according to the relocations
requested by the compiler. This list of sections, plus appropriate
headers, is the executable file (i.e. .exe
on Windows) that's
actually produced on disk. In order to run such a file, the operating
system basically just has to memory-map all the sections in the
executable at the virtual addresses that the linker indicates, then
starts the program running at the start address indicated. (Modern
operating systems can thus optimize memory storage for executables the
same way they would for memory-mapped files.)
It's worth noting that some operating systems have rather simpler
executable formats than this. A DOS .com
file is a memory image of
a single section (starting at a virtual address of 0x0100), with an
entry point at the start of the section, that's readable, writable and
executable. The advantage of this format is that it doesn't need a
header at all, as all the information that it would specify is fixed;
the disadvantage is that because the 8086's memory management was so
primitive, it means that the entire program is limited to 65280 bytes
of program, static data, and stack.
Although the main job of the linker is to fix memory addresses and patch up the relocations accordingly, it also has a few other jobs. One is handling code that define a variable in multiple source files. There are three ways to specify variables that the linker might potentially have to process, in C:
- Declaration that is never a definition, e.g.
extern int foo;
- Tentatively defining a variable as zero-initialized, e.g.
int foo;
- Definitely a definition due to an initializer, e.g.
int foo = 0;
The first and third cases are well-defined and have the same meaning
everywhere; the first uses extern
to explicitly state it isn't a
definition, and the third initializes the variable (and thus must be a
definition). The second case is more ambiguous; the standard states
that it's always a definition, but not all C programmers seem to have
got the message, and sometimes it's incorrectly used as just a
declaration instead. Worse, sometimes all your object files just say
int foo;
, and none of them initialize it; now (in pre-standard C)
one of them is a definition, the others are declarations.
Many linkers thus have special logic to handle this case, converting
one all but one plain int foo;
-style definition into a declaration
if it's required for the program to link correctly (by allocating all
the definitions at the same memory address). Some don't, perhaps for
technical issues (e.g. building shared libraries on Mac OS X).
Because well-written programs shouldn't be doing this sort of thing
anyway, and because it would confuse aimake
's heuristics, aimake
tells the compiler to tell the linker that all definitions are
definitely definitions (rather than possibly declarations), and thus
can help catch this sort of error. (NetHack 4 was unintentionally
doing this for a long time; two different variables windowprocs
were
getting merged together, when they shouldn't have been. It didn't
affect the operation of the program, as it happens, so I'd never have
caught it without aimake
telling me something was wrong. It didn't
manage to articulate the problem very clearly, though, and I had an
incorrect fix in place for months, until I finally figured out what
was wrong when working on the Mac OS X port of NetHack 4.)
The linker can also catch certain problems with an executable that the
compiler can't; for instance, it can notice that a function isn't
defined anywhere (leading to an undefined symbol error). However, the
linker can't check as much as you'd like; if you declare a variable as
an int
in one file and a float
in another, then it's very unlikely
that the linker will be able to discover the type mismatch (it might
know about the expected size of the two resulting symbols, on some
operating systems, but both types are usually 4 bytes long; in order
to know about the type, the only source of information the linker
could rely on is the debug information, which linkers typically don't
parse and which the compiler doesn't always generate).
10: Installation
Although you might think that the build is done once the executables are produced, there's still several steps left in a full build process. One problem at this point in the build is, although we have a working executable, it would be very hard to ship. One possible build layout is for the executables and object files and source files to all be mixed together in the source tree (something I personally hate because it makes it impossible to have multiple builds from the same source, and makes reversing a build hard, but which is nonetheless very common), which is too disorganized to ship easily. The alternative, having files generated during the build in a separate directory, means that the executables are separate from any data they might need (which is still sitting in the source tree). Either way, some files are going to need to be copied and/or reorganized before they're really usable.
The simplest possible form of installation is simply taking the files
we just built, together with relevant data files from the source, and
copying them to the location the user asked us to install them in
(back during configuration, at the start of the build). There's
nothing conceptually too difficult about this; however, it's nearly
always a separate step from the actual build due to the potential need
for elevated permissions (it's common to want to install into system
directories that aren't writable by ordinary users, like /usr/bin
on
Linux or a subdirectory of CSIDL_PROGRAM_FILES
(which is typically
C:\Program Files
on the filesystem) on Windows).
There are also some subtle details that build systems take care of during this step. For instance, the build system needs to create directories before installing in them. Another example is that it's common to "strip" executables while installing them; this discards all information in the executable (symbol tables, debug information, etc.) that isn't absolutely necessary for it to run, and so reduces disk usage at the expence of debuggability. (A more recent innovation is to split out the debug symbols into a separate file, typically stored on a network and downloaded when debugging information is needed.)
Another job of the install process is to set the permissions on the
files it installs. This is surprisingly easy to get wrong, especially
when they need to be different from the defaults. For example, on
Linux and UNIX-based systems, NetHack in general and NetHack 4 in
particular set permissions so that players can't write the high score
table, or write bones files, without going through the nethack
executable or using administrator privileges. Some distributions
(which modified this step of the install process, despite warnings in
the documentation that their new configuration was a bad idea) managed
to screw this up to the extent of introducing a security bug that
could allow anyone with local access to take over the whole system.
It's also worth noting the "DESTDIR convention", which will become
relevant later. The idea is that you compile as if you were
installing into one location (telling your program to look for its
data files in /usr/share
, for instance), then actually place the
compiled files in a different location (typically somewhere in your
home directory). It's easy to add this feature to an installer, and
forgetting will make some of the later steps of the build process
almost impossible to complete. (The way you specify this using GNU
automake
, and with most hand-written makefiles, is to set a make
variable DESTDIR
; aimake
uses a command-line option --destdir
.)
In addition to its use in generating packages, this is also handy when
installing into a chroot; I use this feature when updating the server
on nethack4.org.
Still, this is one of the simpler steps. The main problem from the
point of view of a build system designer is that because it needs
enhanced privileges, the UI will inevitably be more complex than for
the other steps. aimake
supports three different models for
handling the install step:
If you don't actually need permissions for your install, you can just use
-i
(i.e. "install as well as building") directly, and things work fine.You can tell
aimake
how to elevate permissions with the-S
option, e.g.-S sudo
, and it'll use that method (sudo
on Linux is very similar to UAC on Windows; it can only be used on an administrative account, and prompts for your password, elevating one process if i's entered correctly).You can run
aimake
without-i
to do the build; then elevate permissions, and run with-i --install-only
. (This is how-S
is implemented internally.)
The second method is the most convenient, and minimizes the risk of
permissions-related errors (a build directory full of files you don't
own is always frustrating to deal with). The third method is the one
that most other build systems use, so it's included to make it easier
to write a wrapper that makes aimake
look like, say, GNU autotools.
11: Resource linking
We're now past the point where most build systems consider their work to be done. This is a pity, as resource linking is something that typically has to be done at the same time as the installation; the normal workaround (that's been repeated so many times across so many projects that many people don't realise it is a workaround) is to manually write build rules to perform the relevant steps. On Windows, the situation is less bad, because Visual Studio does a resource link just after linking (and thus it happens immediately before installation).
The term "resource link" comes from Windows, but the general principle
is the same across operating systems: a plain executable by itself is
not particularly usable, because all it has is a filename, and (if
you're lucky) documentation. Documentation might be useful to a
human, but what can a computer do with a name like nethack4.exe
but
place it in a list of thousands of other executables that nobody will
ever scroll through? (This is not just a hypothetical; it's Firefox's
actual UI on Linux for selecting a program to open a file, when it
doesn't know of an appropriate option already, because it just opens
up an "Open File" dialog box that you even have to navigate to
/usr/bin
by hand. This really sucks when all you wanted was to open
a text file in a text editor.)
Clearly, what is needed is some sort of metadata: a longer description
than the filename, an icon, perhaps a version number, things like
that. On Windows, this is compiled into the executable. On Linux,
it's typically done by means of .desktop
files in a known directory
(/usr/share/applications
); many desktop environments maintain an
index of these files so that they can grab the metadata for a given
executable on demand. (I'm not sure how this works on Mac OS X,
although would be interested for someone to tell me.)
It's impossible for a build system to know what all the metadata
should be without being told. However, aimake
can certainly
translate it to an appropriate format for the OS by itself, and the
amount of information it needs to make a decent shot at it is actually
very small. (The only pieces of information that it can't at least
make a reasonable guess at or a plausible placeholder for are the
version number and icon; and if not specified, it'll just tell the OS
to use its default placeholder icon.)
It would be reasonable to do this step before the install, rather than
after; that's where Visual Studio does it, and I originally tried to
put it there. However, doing it just after the install (while
aimake
still has elevated permissions, so that it can write to
system directories like /usr/share/applications
) turned out to be
much simpler in terms of aimake
code (which in turn made NetHack 4's
aimake.rules
simpler as it didn't have to try to deal with the
pre-resource-link and post-resource-link versions of executables
separately). Besides, on Windows, you want to make a Start menu
shortcut as well, and because shell links on Windows have a curious
mix of overengineering and underengineering (some of which is to
compensate for bad design decisions made elsewhere), it's impossible
to create a robustly behaving shortcut unless its target exists at the
time. (aimake
currently does not make these shortcuts during the
install process for this reason, also because most of the existing
wrappers use the wrong APIs and when I wrote a new wrapper, it was 153
lines of C++ before even writing the code to communicate with aimake
itself.)
As a fun fact, the two main NetHack 4 executables, nethack4
and
nethack4-sdl
, differ only in filename and metadata (nethack4-sdl
is a symlink on Linux, because the metadata is in external files).
The filename difference determines the default interface plugin to use
(an ending of -sdl
loads the graphical interface by default); as
well as explaining the interface difference, the metadata is used to
request a terminal window for nethack4
, while not opening one for
nethack4-sdl
because it doesn't need one and so it would look ugly.
12: Package generation
So far, we've been focusing on the build and install process as it looks to a developer. However, distributing a source tarball and telling users to build it is considered a very user-unfriendly method of distribution nowadays. On Linux, nontechnical users want a package that they can point their distribution's package manager to. On Windows, most users will want an installer that they can double-click on. (Additionally, nothing about the previous steps has any particular support for an uninstaller, something which any software package should have; the sort of end-user installation framework discussed in this section typically comes with an uninstaller as well.)
This step is going to be inherently somewhat system-specific; Windows
Installer has many differences from dpkg
(the Debian equivalent),
for instance. However, some things are the same. The most notable is
that the input into these install tools is a folder hierarchy
containing the files that would be installed; the DESTDIR convention
is very useful here. (The package generation tools also need to be
told about the permissions on the files, but ideally, administrator
permissions wouldn't be needed for a DESTDIR install; on Linux, the
workaround for this problem is a program called fakeroot
which
intercepts attempts to set permissions and remembers which permissions
the files "should" have, and on Windows, aimake
will make a list of
all the files it's installing and the permissions they need and pass
that to the package generation tool separately.)
Currently, aimake
's support for generating an installer on Windows
is almost complete; it can get as far as generating an input file for
WiX, which can then take the build process the rest of the way.
The only thing it doesn't do is actually run WiX; that's currently
left to the user, even though it should be easier to automate than do
manually. (One problem is that Windows Installer has a few design
issues that need working around, and some of the recommended
workarounds are ridiculous; eventually, I stopped trying to use them,
and instead came up with my own workarounds which both practically and
theoretically seem to work better, but now the build process spams
warnings and errors because I'm not using the recommended workarounds.
(Interestingly, the errors and many of the warnings are false
positives caused by the same design issues that lead to the
workarounds being necessary in the first place. It's a good thing
that "errors" from Windows Installer's consistency checkers are really
just warnings, and thus can be turned off.)
The situation on Linux is a little more complex. So far I've been
focusing on Debian (who have the most widely used package format).
Debian's policy is written assuming that either the package was
developed for Debian in particular, or that it was developed without
knowledge of Debian and was packaged separately by one of their
developers. The case of "a package that exists on its own but knows
how to deploy itself on Debian" makes it quite hard to work out an
appropriate source package (a binary package is much easier).
Currently, aimake
itself has all the appropriate options to handle
being run by dpkg-buildpackage
, which requires a short wrapper
script (31 lines in NetHack 4, probably about 20 for most packages) to
tell dpkg-buildpackage
how aimake
wants to be run. Much of the
Debian metadata should be the same for all aimake
-using programs,
though (just as happened with WiX), so I'm hopeful that eventually,
aimake
will have an option to just automate all the steps in
generating a Debian package. This would lead to a weird situation in
which aimake
is running dpkg-buildpackage
to run itself, but this
method maximises the chance that the build is reproducible.
13: Dynamic linking
As something that happens at runtime, this is a little irrelevant to
the design of a build system, but I'm mentioning it for completeness.
So far, we've been implicitly assuming that the build is a so-called
static build, in which all the code that runs at runtime is part of
the executable. In practice, though, this isn't normally the case; an
executable will depend on shared libraries (.so
files on Linux,
.dll
on Windows, .dylib
on Mac OS X), which contain code that's
shared between multiple executables.
Conceptually, dynamic linking is pretty similar to regular, static
linking; the dynamic linker picks an address for the shared library
and maps it into the address space of the executable, which then makes
calls into it. This means that the executable needs to contain
some sort of relocations, just the same way that the object file did.
Here are the five relocations that end up in our Hello World
executable on my 64-bit Linux system (as printed by readelf -r
):
Relocation section '.rela.dyn' at offset 0x3a8 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000600ff8 000200000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0 000000601040 000400000005 R_X86_64_COPY 0000000000601040 stdout + 0 Relocation section '.rela.plt' at offset 0x3d8 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000601018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0 000000601020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0 000000601028 000300000007 R_X86_64_JUMP_SLO 0000000000000000 fwrite + 0
We can see our old friends stdout
and fwrite
here, still waiting
to be linked against libc
. However, we can see that the offsets for
the relocations are much higher than existed in the object file;
that's because the linker has already determined addresses for
everything in the executable itself, and so knows where the
relocations need to be at runtime. Dynamic linking is harder than
static linking because the OS really wants the shared library to be
bitwise identical between all the processes that run it (so that it
can maintain just one copy of it in physical memory); on 64-bit Linux,
the solution to this is to place the relocations in various tables
that the dynamic linker updates, rather than in the code directly, and
making calls indirectly via the table. (This won't work for stdout
,
which is a variable rather than a function; the linker resolved the
problem by putting it in a separate table.) This also means that the
shared library needs to be written in such a way that it can be moved
around in memory and still run correctly, but on x86_64, the
processor makes this pretty easy by allowing memory references to be
relative to the program counter. (Meanwhile, on 32-bit x86, it's a
complete nightmare; the compiler can handle it, but it has to jump
through such hoops that you can instead tell the compiler to not
bother, in which case the dynamic linker has to make a copy of the
shared library in memory and patch up all the relocations the same way
that the regular linker does. Linux's dynamic linker is actually
willing to do this; on x86_64, though, it refuses, explaining that it
really should be the compiler's job to make the library
position-independent on a system where it's so easy.)
I won't go into all the details of dynamic linking here, partly
because they aren't relevant to understanding build systems and partly
because I don't know what they are. However, there is one issue that
needs to be addressed: what does it mean to link against a shared
library, given that its code doesn't actually end up in the
executable? The answer lies in the concept of an import library,
which is a file that looks like a static library to the linker, but
actually just puts relocations into the output executable, rather than
code. On Linux, each shared library can also act an import library
for itself, which makes shared library deployment pretty easy, but
writing build systems confusing (especially when you consider the case
of two shared libraries that need symbols from each other at runtime;
this is not technically a circular dependency, but you need to somehow
come up with a working import library by other means, such as by first
linking the libraries without the dependency on each other). On
Windows, import libraries tend to be separate files that are generated
by a program dlltool
from a description of which symbols go into the
shared library; aimake
gives it the object files from which the
shared libraries are built as a handy method of describing the shared
library without actually needing a copy of it (and thus breaking up
the circular dependency that way).
There's also a point that's important for programmers, too. On Windows, imports from a shared library will need special code generated for them, so they have to be marked in the source file. Likewise, exports need to be marked as such inside the shared library on Windows. On Linux, exports don't need to be marked, but they should be; this makes it possible for a build system to hide all the non-exported symbols from the API of the shared library by means of command-line options, thus reducing namespace pollution.
Handling appropriate marking is normally done by the preprocessor. You typically have a header file declaring the API of your shared library, which looks something like this:
#ifdef IN_SHARED_LIBRARY_INTERNALS # define IMPORT_EXPORT(x) AIMAKE_EXPORT(x) #else # define IMPORT_EXPORT(x) AIMAKE_IMPORT(x) #endif int IMPORT_EXPORT(function1) (void); int IMPORT_EXPORT(function2) (int, int);
Then, users of the shared library just include the header; the shared
library itself does #define IN_SHARED_LIBRARY_INTERNALS
and then
includes the header, and thus gets the "export" definitions rather
than the "import" definitions. The AIMAKE_EXPORT
and
AIMAKE_IMPORT
macros are defined by aimake
to expand into whatever
system-specific annotation is needed to import or export from a shared
library. (The exact syntax is subject to change, because gcc
doesn't like the way the current syntax interacts with functions that
return pointers; I'm currently working around that with typedef
, but
really need a better solution.)
As far as I can tell, this sort of marking is necessary to create
shared libraries, as the API of the shared library has to be specified
somehow. However, aimake
will spot these annotations and
automatically generate shared libraries rather than executables when
it sees them, so no extra work is needed beyond the bare minimum.
(Some improvements to this are needed, though; it currently doesn't
work on Mac OS X because I don't know enough about shared libraries on
that system, and there really should be some way to override the whole
mechanism and generate static libraries instead.)
Conclusions
Hopefully, this post should give C programmers something of a better understanding of the toolchain that goes into actually building their programs; perhaps it'll even inspire someone to go into toolchain development. I hope I've also made the point that a lot more of a build toolchain can and should be automated than typically is; there's a lot of programmer time being wasted right now dealing with things that should really be done by a computer. Most of the attempts I've seen to fix this are dealing with the wrong problem; people look at existing build systems, and think "we need a better tool for writing Makefiles" or "we need a better tool for generating configuration", when these are really relatively small parts of a build.
I hope aimake
acts as a proof of concept that almost the entire
build process can and should be automated, and perhaps one day grows
into a tool that can be widely used for this purpose (even if today,
it suffers from UI problems, the occasional bug, missing features, and
incomplete platform support). At the very least, maybe the world's
build system designers will be inspired to deal with every part of the
build, not just the small corners they previously worked on, and I'll
be able to use something standard rather than being forced to work on
aimake
.