Binary Files and Processes in Linux
Created On 17. Apr 2021
Updated: 2021-06-06 01:55:00.938153000 +0000
Created By: acidghost
In Linux systems, the executable files are known as executable and linkable format (ELF).
Let's read the information of an ELF file. Here I'm launching it against hexdump:
readelf -a /usr/bin/hexdump
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x18f0
Start of program headers: 64 (bytes into file)
Start of section headers: 24936 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 28
Section header string table index: 27
Looks cool, but what's in there? First in the ELF header, the magic is defined. To see what is in those bytes we can use a nice function in pwntools:
In [1]: bytes.fromhex("7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00")
Out[1]: b'\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00'
Or in python:
>>> print(''.join([chr(int(c, 16)) for c in "7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00".split()]))
ELF
Or just echo it out:
$ echo -e $"\x7f\x45\x4c\x46"
ELF
Isn't this amazing? The bytes that spell ELF are 7f 45 4c 46
. The 'Magic' in each ELF header indicates that the file is an executable and that is how Linux knows that it should be executed.
This can be also seen with the command 'file'. Just run file /usr/bin/hexdump
.
Further, there is information defining that file's architecture (64bit) and how it deals with the numbers (2's complement, little endian). The entry point address defines the address where the execution of the program begins.
Program Header
In program and segment headers it is described how the program should be loaded. The program header of an ELF file will have more entry types, where the most important ones are:
- INTERP: defines the library that should be used to load this ELF into memory
- LOAD: defines a part of the file that should be loaded into memory
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000001f8 0x00000000000001f8 R E 0x8
INTERP 0x0000000000000238 0x0000000000000238 0x0000000000000238
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000005860 0x0000000000005860 R E 0x200000
LOAD 0x0000000000005b30 0x0000000000205b30 0x0000000000205b30
0x00000000000004fa 0x00000000000005b0 RW 0x200000
DYNAMIC 0x0000000000005c40 0x0000000000205c40 0x0000000000205c40
0x0000000000000200 0x0000000000000200 RW 0x8
NOTE 0x0000000000000254 0x0000000000000254 0x0000000000000254
0x0000000000000044 0x0000000000000044 R 0x4
GNU_EH_FRAME 0x0000000000005190 0x0000000000005190 0x0000000000005190
0x00000000000000f4 0x00000000000000f4 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000005b30 0x0000000000205b30 0x0000000000205b30
0x00000000000004d0 0x00000000000004d0 R 0x1
In INTERP it is seen the offset and the size of the library that will load the headers. The VirtAddr specifies the virtual memory address and it is often randomized for better security. MemSiz specifies how much will be stored in the memory. In LOAD it is indicated the offset and file size of the headers that are loaded.
Section Headers
In section headers is the metadata that describes the program components. You will find these familiar to the code structure when writing an assembly program in x86. The important ones are:
- .text - the executable code of the program
- .plt, .got - for resolving dynamically linked functions/variables
- .data - initialized data (ex. global arrays with initial values)
- .rodata - initialised read-only data (ex. string constants)
- .bss - uninitialized data (ex. global arrays without initial values)
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000000254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000000274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000298 00000298
0000000000000060 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 00000000000002f8 000002f8
0000000000000600 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 00000000000008f8 000008f8
00000000000002ac 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000000ba4 00000ba4
0000000000000080 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000000c28 00000c28
0000000000000080 0000000000000000 A 6 2 8
[ 9] .rela.dyn RELA 0000000000000ca8 00000ca8
0000000000000420 0000000000000018 A 5 0 8
[10] .rela.plt RELA 00000000000010c8 000010c8
0000000000000438 0000000000000018 AI 5 23 8
[11] .init PROGBITS 0000000000001500 00001500
0000000000000017 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 0000000000001520 00001520
00000000000002e0 0000000000000010 AX 0 0 16
[13] .plt.got PROGBITS 0000000000001800 00001800
0000000000000008 0000000000000008 AX 0 0 8
[14] .text PROGBITS 0000000000001810 00001810
0000000000002fa2 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 00000000000047b4 000047b4
0000000000000009 0000000000000000 AX 0 0 4
[16] .rodata PROGBITS 00000000000047c0 000047c0
00000000000009d0 0000000000000000 A 0 0 8
[17] .eh_frame_hdr PROGBITS 0000000000005190 00005190
00000000000000f4 0000000000000000 A 0 0 4
[18] .eh_frame PROGBITS 0000000000005288 00005288
00000000000005d8 0000000000000000 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000205b30 00005b30
0000000000000008 0000000000000008 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000205b38 00005b38
0000000000000008 0000000000000008 WA 0 0 8
[21] .data.rel.ro PROGBITS 0000000000205b40 00005b40
0000000000000100 0000000000000000 WA 0 0 32
[22] .dynamic DYNAMIC 0000000000205c40 00005c40
0000000000000200 0000000000000010 WA 6 0 8
[23] .got PROGBITS 0000000000205e40 00005e40
00000000000001a8 0000000000000008 WA 0 0 8
[24] .data PROGBITS 0000000000206000 00006000
000000000000002a 0000000000000000 WA 0 0 8
[25] .bss NOBITS 0000000000206040 0000602a
00000000000000a0 0000000000000000 WA 0 0 32
[26] .gnu_debuglink PROGBITS 0000000000000000 0000602c
0000000000000034 0000000000000000 0 0 4
[27] .shstrtab STRTAB 0000000000000000 00006060
0000000000000101 0000000000000000 0 0 1
How do we interact with ELF files?
- gcc - compile the ELF file
- patchelf - change libraries,interpreter, etc.
- objcopy - export import sections
- objdump - disassemble the ELF file
- ldd - program to check the shared objects within a file
- kaitai struct (https://ide.kaitai.io/) - an interactive tool to inspect ELF files
Processes
A process is launched as result of fork and clone system calls that are producing a copy of the calling process. Upon the call of a process, the caller forks itself into a parent and the child of the process. The child process executes (by calling execve system call) becoming the called process.
Every process will have:
- state - running, waiting, stopped, zombie
- priority and other scheduling information
- parent, siblings, children
- shared resources - files, pipes, sockets
- virtual memory space
- security context
When the process is launched it will usually call __libc_start_main() in libc which calls the program's main() function. libc.so is almost always present when calling Linux processes, and it is the standard C library. It provides such functionality as printf()
, scanf()
, malloc()
and many others.
Dynamically and Statically Linked ELF Files
When loading a file, the Linux kernel will check if the file is a dynamically or statically linked ELF file. If the file is a dynamically linked ELF as with hexdump that was analysed before, then the kernel will analyze the interpreter defined in the ELF and let it take control. Dynamically linked means that ELF files rely on specific libraries that they depend on and need to be loaded. ELF files can be also statically linked, which will make them self contained not needing to load the libraries externally. Usually, in Linux systems the ELF files are dynamically linked. Statically linked files could be a more secure practice, however, because of their large size that could cause more strain on the systems, the dynamically linked stay as the preferred choice.
Interpreter
Before the program runs, the kernel checks the beginning of the file. It firstly looks for the ELF interpreter, which is also known as the "loader". We can check which interpreter the file has by running:
$ readelf -a /usr/bin/hexdump | grep interpreter
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
The interpreter can be temporarily overridden by specifying it directly and then firing with it the process:
$ /lib64/ld-linux-x86-64.so.2 /usr/bin/hexdump -n 100
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000000 6161 6161 6161 6161 6161 6161 6161 6161
It can be also changed permanently with patchelf --set-interpreter
:
$ patchelf --set-interpreter /great_interpreter /usr/bin/hexdump
While this interpreter doesn't exist, hexdump won't be able to execute as before. If we request again:
$ readelf -a /usr/bin/hexdump | grep interpreter
We should see the /great_interpreter
set as the interpreter. Why does this happen? When bash initiates a process it launches an execve process and then calls the kernel, which on its turn calls the interpreter that is set. If we strace
hexdump, we will see the system calls run by the program during the execution. At the beginning we see how execve starts the child process of bash:
execve("/lib64/ld-linux-x86-64.so.2", ["/lib64/ld-linux-x86-64.so.2", "/usr/bin/hexdump", "-n", "100"], 0x7fff36ff9f48 /* 60 vars */) = 0
It also tries every listed path and with openat it succeeds and reads out different properties of the library:
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libbsd.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P4\0\0\0\0\0\0"..., 832) = 832
Scripts start with a #!
and the kernel will extract the interpreter from the rest of that line and execute it with the original file as an argument. More arguments can be passed in such way:
#! /bin/echo argument1 argument2
Libraries
LD_PRELOAD and LD_LIBRARY_PATH are both environment variables that specify what libraries should be looked at first at when the program is launched. They both are loaded consecutively, so the kernel first loads LD_PRELOAD and then LD_LIBRARY_PATH. LD_LIBRARY_PATH can be set in shell the following way:
$ strace -E LD_LIBRARY_PATH=/some/library/path /usr/bin/hexdump 2>&1 | head -n 20
And then we can see that it runs with the /some/library/path
variable:
openat(AT_FDCWD, "/some/library/path/tls/x86_64/x86_64/libbsd.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT
Both can be loaded in such way:
$ LD_LIBRARY_PATH=/some/library/path LD_PRELOAD=somefile.so /usr/bin/hexdump
With patchelf we can also set the DT_RPATH:
$ patchelf --set-rpath /some/path
This will look for libraries in specified path.
Virtual Memory Space
Virtual memory space is where all the libraries get loaded to. It also contains the heap, stack, memory mapped by the program, helper regions, kernel code and can be looked at /proc/self/maps
. The virtual memory will be dedicated to the process, while the physical memory is shared among the whole system.
To catch the memory layout of hexdump's execution, I launch it against another file and stop the execution with CTRL + Z
. Then I can run ps aux | grep hexdump
to find the process PID and use it to look it up with cat /proc/PID/maps
.
56223c2b1000-56223c2b7000 r-xp 00000000 08:01 265426 /usr/bin/hexdump
56223c4b6000-56223c4b7000 r--p 00005000 08:01 265426 /usr/bin/hexdump
56223c4b7000-56223c4b8000 rw-p 00006000 08:01 265426 /usr/bin/hexdump
56223dcb2000-56223dcd3000 rw-p 00000000 00:00 0 [heap]
7f2799a0e000-7f2799d3e000 r--p 00000000 08:01 265519 /usr/lib/locale/locale-archive
7f2799d3e000-7f2799d58000 r-xp 00000000 08:01 1076370 /lib/x86_64-linux-gnu/libpthread-2.27.so
7f2799d58000-7f2799f57000 ---p 0001a000 08:01 1076370 /lib/x86_64-linux-gnu/libpthread-2.27.so
7f2799f57000-7f2799f58000 r--p 00019000 08:01 1076370 /lib/x86_64-linux-gnu/libpthread-2.27.so
7f2799f58000-7f2799f59000 rw-p 0001a000 08:01 1076370 /lib/x86_64-linux-gnu/libpthread-2.27.so
7f2799f59000-7f2799f5d000 rw-p 00000000 00:00 0
7f2799f5d000-7f2799f64000 r-xp 00000000 08:01 1076373 /lib/x86_64-linux-gnu/librt-2.27.so
7f2799f64000-7f279a163000 ---p 00007000 08:01 1076373 /lib/x86_64-linux-gnu/librt-2.27.so
7f279a163000-7f279a164000 r--p 00006000 08:01 1076373 /lib/x86_64-linux-gnu/librt-2.27.so
7f279a164000-7f279a165000 rw-p 00007000 08:01 1076373 /lib/x86_64-linux-gnu/librt-2.27.so
7f279a165000-7f279a34c000 r-xp 00000000 08:01 1050956 /lib/x86_64-linux-gnu/libc-2.27.so
7f279a34c000-7f279a54c000 ---p 001e7000 08:01 1050956 /lib/x86_64-linux-gnu/libc-2.27.so
7f279a54c000-7f279a550000 r--p 001e7000 08:01 1050956 /lib/x86_64-linux-gnu/libc-2.27.so
7f279a550000-7f279a552000 rw-p 001eb000 08:01 1050956 /lib/x86_64-linux-gnu/libc-2.27.so
7f279a552000-7f279a556000 rw-p 00000000 00:00 0
7f279a556000-7f279a569000 r-xp 00000000 08:01 1048661 /lib/x86_64-linux-gnu/libbsd.so.0.8.7
7f279a569000-7f279a768000 ---p 00013000 08:01 1048661 /lib/x86_64-linux-gnu/libbsd.so.0.8.7
7f279a768000-7f279a769000 r--p 00012000 08:01 1048661 /lib/x86_64-linux-gnu/libbsd.so.0.8.7
7f279a769000-7f279a76a000 rw-p 00013000 08:01 1048661 /lib/x86_64-linux-gnu/libbsd.so.0.8.7
7f279a76a000-7f279a76b000 rw-p 00000000 00:00 0
7f279a76b000-7f279a792000 r-xp 00000000 08:01 1050952 /lib/x86_64-linux-gnu/ld-2.27.so
7f279a975000-7f279a979000 rw-p 00000000 00:00 0
7f279a992000-7f279a993000 r--p 00027000 08:01 1050952 /lib/x86_64-linux-gnu/ld-2.27.so
7f279a993000-7f279a994000 rw-p 00028000 08:01 1050952 /lib/x86_64-linux-gnu/ld-2.27.so
7f279a994000-7f279a995000 rw-p 00000000 00:00 0
7fff84bca000-7fff84beb000 rw-p 00000000 00:00 0 [stack]
7fff84bee000-7fff84bf1000 r--p 00000000 00:00 0 [vvar]
7fff84bf1000-7fff84bf3000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
There can be found:
- starting and ending address of region's process address space
- permissions that the memory allows the process to access
- offset of the region
- dev - hex numbers that point to the driver and device from the source where it was mapped from
- inode - the file number of the source file where it was mapped
- pathname - the name of the file where the region mapped from. When there is no path name, virtual dynamic shared objects are informed by system calls to switch to kernel mode
Initialization
Each program has functions that run before it is launched. These are called constructors. libc can initialize memory regions for dynamic allocations when program launches. This can be specified also manually at the beginning of the C program:
__attribute_((constructor)) void yey()
{
puts("yey!");
}
then it can be compiled like this
$ gcc -static-pie -o prorgam-static program.c
and when launching the following program it will yield yey!
at the very start. This can be used in combination with LD_PRELOAD inject libraries in the process which can be useful in custom configs and debugging.
Environmental Variables
The environmental variables can be used the behavior of some utilities. Below is a nice program that loops through all environmental variables and outputs them.
int main(int argc, char **argv, char **envp)
{
for (int i = 0; envp[i] != 0; i++) puts(envp[i]);
}
Compile it:
$ gcc -o env env.c
Different variables can be added like this ENV=VAR ./env
, and the file can be then directly used as source for debugging.
System Calls
there are over 300 System Calls in Linux. A few examples are:
- int execve(const char *filename, char **argv, char **envp) - replaces the process
- ssize_t write(int fd, const void *buf, size_t nbytes) - write to a file descriptor
Frequently used syscalls are open, read, write (also used in cat) and fork, execve, wait (in a shell).
Shared Memory
By sharing the memory of a process it is possible to establish a communication that wouldn't require system calls (after establishing it). To do it easily, use a shared memory mapped file in /dev/shm
(see man mmap
).
Signals
Signals pause the process execution and invoke the handler. Handlers are functions that take one argument - the signal number. The signals that can't be handled are SIGKILL and SIGSTOP. To check the most useful signals see man 7 signal
. We encounter signals very often, for example 'segmentation fault' is also a signal.
Termination
When the process terminates, it receives an unhandled signal and exit() system call. Every process must be "reaped". This means that after termination they will stay in zombie state until they are wait() ed by their parent. When this happens their exit code will be returned to the parent and the process will be freed. If the parent dies without waiting on them, they are re-parented to PID 1 and will stay there until they are cleaned up.
References
https://www.intezer.com/blog/research/executable-linkable-format-101-part1-sections-segments/
https://gist.github.com/CMCDragonkai/10ab53654b2aa6ce55c11cfc5b2432a4
https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-1.html
https://pwn.college/modules/intro
Section: Binary Exploitation (PWN)
Back