Submit the solution of this task according to the Submitting instructions before Tuesday Oct 10 23:59:59.
To correctly submit this task:
- Fill out the registration form so that we can create your private repository.
- Go over the first task (programs and dynamic libraries).
- Solve the second task by creating a
Makefile
as specified (and copy / save it). - When your private repository gets created, submit it according to the submitting instructions
Go over the following "tutorial" (and try it for yourself!):
- compiling a simple C program
- compiling a C program consisting of multiple units
- compiling dynamic libraries
- running dynamically linked programs
Lets start with a very simple C program (simple.c):
int main(int argc, char *argv[])
{
return 0;
}
We can compile it by running gcc
:
$ gcc -o simple simple.c
Simple! We are using the -o simple
option to specify an output file (otherwise
gcc would use the default name of a.out
, you can google the historic reasons
for that ;).
Lets check that it actually works:
$ ./simple
$ echo $?
0
The command echo $?
prints out the return code of the last executed command,
so we can see that simple
really return 0
.
The gcc command performed several steps for us, which is handy when compiling a simple program, but might not be correct for bigger applications:
- first it compiled the source
.c
file (which is also called a compilation unit) into and object file (which would normally be calledsimple.o
), - then it linked the object file with any required
If we wanted to do this manually, we would have first to actually compile the source (thus creating the object file):
$ gcc -c -o simple.o simple.c
and then "link" it with any standard libraries to produce the actual executable:
$ gcc -o simple simple.o
For larger applications, source code is often split into multiple files. A (very
simplistic) example can be seen in the files number.h
,
number.c
and program.c
.
Gcc can again do all the necessary steps for us when compiling the program:
$ gcc -o program program.c number.c
$ ./program
$ echo $?
47
This has however various disadvantages:
-
if we change even only one of the input files, all compilation units (c source files) would be recompiled;
-
sometimes one unit (source file) can be used to generate multiple binaries, thus being compiled over and over, although it might not make sense to break it out to an actual library.
Because of this C code is usually compiled in the way we've seen before: first compiling the individual source units into object files and then linking them together as needed.
By the way: static libraries are really nothing else then just a collection of object files.
To compile our example program we would really do:
# compile program.c into program.o
$ gcc -c -o program.o program.c
# compile number.c into number.o
$ gcc -c -o number.o number.c
# link them together
$ gcc -o program program.o number.o
$ ./program
$ echo $?
47
Note that we actually compiled program.c
before number.c
: it does not really
matter, as long as the header files correctly specify the "interface".
Even if we "reuse" object files (either manually or as static libraries) to create multiple executables, the final executables would have a lot of code duplicated. This would lead for example to the standard library to be included in every C application.
To avoid this, operating systems can used dynamic libraries: code will not be present directly in the executable, but when the operating system runs it, it will find all the (dynamic) libraries (DLLs on Windows and .so-s on linux) and "include" them in the process.
Lets have a closer look at our executable program
:
$ file program
program: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped
$ ls -l program
-rwxr-xr-x 1 yoyo users 7968 Sep 26 04:11 program
$ ldd program
linux-vdso.so.1 (0x00007ffcb95ec000)
libc.so.6 => /lib64/libc.so.6 (0x00007f8af47f6000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8af4b8f000)
The important bits are: it is a dynamically linked executable; it is just under 8kB and it references 3 other libraries (although two of them are sort of "fake": the first one is provided by kernel, and the last one is the actual dynamic loader).
The libc.so.6
is actually the "dynamic" version of the standard C library,
We could also try to build a static version of our program, with all the libraries and code included:
$ gcc -static -o program program.c number.c
$ file program
program: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, not stripped
$ ls -l program
-rwxr-xr-x 1 yoyo users 948352 Sep 26 04:24 program
$ ldd program
not a dynamic executable
Now it's almost 1MB because it contains all the required code from standard library.
Note: the "not stripped" part actually tells us, that our executable contains a
bit more that is really needed to run it, usually helpful for debugging or
further processing of the executable (note: we did not actually include any real
debugging information, try compiling the program with -g
). To actually remove
all the extra information, you can use the strip
command:
$ strip program
$ ls -l program
-rwxr-xr-x 1 yoyo users 730312 Sep 26 04:25 program
To actually compile a dynamically linked application or binary, one addition problem needs to be taken care of: we need to tell the compiler to create code that works even when it is loaded in different parts of memory.
Questions to google: why? How does static compilation work? How are symbol (function) addresses resolved when dynamically linking?
To achieve that, we need to pass the -fPIC
(Position Independent Code) to the
compiler when compiling code that will be dynamically loaded:
$ gcc -c -o number.o -fPIC number.c
We can then "link" it into a shared library:
$ gcc -o libnumber.so -shared number.o
Note: shared libraries on linux are always named libLIBRARYNAME.so
, this is a
convention used by the linker and loader to find the library files. Libraries
are also usually versioned (i.e. libsomething.so.10.2
) to allow multiple ABI
to coexist on a single computer.
After that we can compile and link the actual executable:
$ gcc -c -o program.o -fPIC program.c
$ gcc -o program program.o -lnumber -L.
This time, when linking the program we specified the library(ies) to link
against and, because it is not present in a standard directory, also the path
where to find it (.
). Note that the lib
prefix and .so
suffix is added by
the linker.
Question: is the -fPIC
option needed also for the executable itself?
Lets try to run our freshly new dynamically linked executable:
$ ./program
./program: error while loading shared libraries: libnumber.so: cannot open shared object file: No such file or directory
Oops, that didn't go that well. The problem is similar to why we needed to add
the -I.
option: the compiler looks for the libraries only in predefined system
locations.
On linux this is similar also when actually looking up programs (i.e. why we
need to include ./
when running our program).
Note: Windows do actually look for executables and DLLs in current directory, which could be considered a security risk. Question: why?
Let's check again with ldd
:
$ ldd ./program
linux-vdso.so.1 (0x00007fff947fc000)
libnumber.so => not found
libc.so.6 => /lib64/libc.so.6 (0x00007fccea578000)
/lib64/ld-linux-x86-64.so.2 (0x00007fccea911000)
Fortunately we can tell the loader (program that actually loads our executable
and then all its libraries into memory) to look in custom directories by setting
the LD_LIBRARY_PATH
environment variable:
$ export LD_LIBRARY_PATH=$PWD
$ ldd ./program
linux-vdso.so.1 (0x00007ffe3b2d6000)
libnumber.so => /home/yoyo/osprog/osprog/l01/libnumber.so (0x00007f8cceb45000)
libc.so.6 => /lib64/libc.so.6 (0x00007f8cce7ac000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8cced47000)
$ ./program
$ echo $?
47
Question: why did we use $PWD
instead of just .
?
Note: this section is a bit more technical and you can skip it directly to the Makefile task if you are not interested.
What's the actual size of our "smallest" C program?
$ ls -l simple
-rwxr-xr-x 1 yoyo users 7896 Sep 26 03:08 simple
Almost 8kb! Not very big, but more then we would expect for program that does nothing except returning zero... Why is it so big? The answer lies in two things: executable formats and libraries:
-
Each operating system needs more then just the machine code to run a program: "type" of the machine code (32bit? 64bit? arm? intel?...), address where the code starts, any dynamic libraries and symbols it might use, data section with pre-filled data etc. On windows the usual format for binaries is EXE (though that's a bit of a simplification) on linux ELF.
-
Our program really does more then just returning a number. The C standard says that the
main
function will be executed and its return value will be the exit value of our program, but the machine code might have to do more, variables might need to be initialized etc. Similarly, the standard says that a function likeprintf
should format text and print it, but that's not an operating system "function", so the code must come from somewhere. All this is included in what is called the standard C library. Although we are not using functions likeprintf
, gcc still included parts of it to handle program startup and exit.
Lets see what we actually got:
$ objdump -h simple
simple: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
...
11 .text 00000171 0000000000400410 0000000000400410 00000410 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
...
The objdump -h
command lists "sections" inside an ELF binary. The section that
contains actual code is called .text
. We can see that it is actually 0x171
(i.e. 369) bytes long. You can try and google what the other sections are for.
Lets see what code we actually have. objdump -d
will give us a disassembly of
all sections that contain code:
$ objdump -d simple
simple: file format elf64-x86-64
Disassembly of section .init:
00000000004003c0 <_init>:
...
Disassembly of section .text:
0000000000400410 <_start>:
...
0000000000400506 <main>:
400506: 55 push %rbp
400507: 48 89 e5 mov %rsp,%rbp
40050a: 89 7d fc mov %edi,-0x4(%rbp)
40050d: 48 89 75 f0 mov %rsi,-0x10(%rbp)
400511: b8 00 00 00 00 mov $0x0,%eax
400516: 5d pop %rbp
400517: c3 retq
400518: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40051f: 00
...
We can see that there's actually some code in other sections and also more then
our main
function in the .text
section. These are functions from the
standard library that handle initialization and "finalization" of any standard
library / C features as well as the actual startup and exit of our program.
Because we didn't compile the binary as a static executable, only parts that are really needed were included (though this is a big simplification ;) and a lot of other parts would be loaded dynamically:
$ ldd ./simple
linux-vdso.so.1 (0x00007ffeb81fc000)
libc.so.6 => /lib64/libc.so.6 (0x00007f0375be3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0375f7c000)
If you first compile the source code into an object file, you can use objdump
on the object file to actually see that it really contains only our code (plus
information needed to actually link it).
Create a Makefile
with the following targets:
- default target (invoked by just running
make
): compile the program and library from the dynamic library example. This should create two files (binaries) namedprogram
andlibnumber.so
. (plus any auxiliary files used during the compilation). clean
: remove any auxiliary files created during the compilation, but leave the program and library.distclean
: remove all generated files (targets and also aux files).test
: run the program and display the returned value.
The default target should correctly rebuild the targets if the sources change
(hint: the program must be recompiled if the header from the library changes).
There is a test.sh
script that checks some of your Makefile
functionality
(but not everything).
Bonus: try to create as smallest Makefile
as possible. (Note: whitespace
doesn't count, try to keep it as readable as possible. Also: make variables and
builtin rules are your friends).
Makefiles are collections of recipes that specify how files (programs, libraries) should be created. You can either (try to) read the official make manual or google some tutorials. A very short introduction follows here.
A single recipe in a Makefile looks like this:
target: dep1 dep2...
<one tab>command1
<one tab>command2
...
target
is the name of the final file that will be created, dep1
... are the
names of files that target
is generated from (i.e. depends on them). Whenever
dependencies change, make
will re-run the commands to (re)generate the target
.
If any of the commands fail (return non-zero status), make will fail.
The first target in a Makefile is called the default target and will be built
when make
is called without any arguments
There can be target
s whose commands don't really generated the target file.
These are called "phony" targets and make
will run the associated commands
every time they are requested. It is nice to list them as dependencies of a special
.PHONY
target (as can be seen in the following example).
The following example "renders" a php file into HTML or plain text (using the lynx
command / commandline browser), assuming page.php
includes another file
header.php
:
.PHONY: default
default: page.html
page.html: page.php header.php
php page.php >page.html
page.txt: page.html
lynx -dump page.html >page.txt
,PHONY: show
show: page.txt
@echo "Generated plaintext:"
@cat page.txt
Normally make
prints all the commands being executed. This can be disabled for
specific commands by prepending @
as seen with the echo
and cat
commands in the
example. Note that the php and lynx commands get still printed if you call make show
when the html and txt files are not yet generated.
Submit your solution by committing required files (Makefile
) under the
directory l01
and creating a pull request against the l01
branch.
A correctly created pull request should appear in the
list of PRs for l01
.