基本工具与ELF文件基本结构
基本工具包括radare2,objdump,readelf
ELF文件的基本结构由文件头(Header)和段(section)组成
sections 与 segments
在维基百科中,对这两个概念的区分如下:
The segments contain information that is necessary for runtime
execution of the file, while sections contain important data for linking
and relocation. Any byte in the entire file can be owned by at most one
section, and there can be orphan bytes which are not owned by any
section.
可以看到,sections和segments最大的区别在于前者所包含的是链接所需要的信息,用于和链接器互动;而后者所包含的则是运行时所需要的数据,与操作系统互动。注意这里的链接可以是可执行文件产生之前就已经完成了的静态链接,也可以是运行时的动态链接。因此sections和segments并不冲突,只是概念上有不同之处。为了避免歧义,本文中对于这类概念使用英文表示。
我们使用readelf工具来探索一下/bin/sh程序的sections和segments:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # /bin/sh 的sections的信息 $ readelf -S /bin/sh There are 28 section headers, starting at offset 0x1d378: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 00000000000002a8 000002a8 000000000000001c 0000000000000000 A 0 0 1 ... [26] .gnu_debuglink PROGBITS 0000000000000000 0001d240 0000000000000034 0000000000000000 0 0 4 [27] .shstrtab STRTAB 0000000000000000 0001d274 0000000000000101 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 # /bin/sh的segments的信息 $ readelf -l /bin/sh Elf file type is DYN (Shared object file) Entry point 0x4760 There are 11 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040 0x0000000000000268 0x0000000000000268 R 0x8 INTERP 0x00000000000002a8 0x00000000000002a8 0x00000000000002a8 0x000000000000001c 0x000000000000001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x00000000000033d0 0x00000000000033d0 R 0x1000 LOAD 0x0000000000004000 0x0000000000004000 0x0000000000004000 0x0000000000011e6d 0x0000000000011e6d R E 0x1000 LOAD 0x0000000000016000 0x0000000000016000 0x0000000000016000 0x0000000000005798 0x0000000000005798 R 0x1000 LOAD 0x000000000001bf30 0x000000000001cf30 0x000000000001cf30 0x0000000000001310 0x0000000000003f40 RW 0x1000 DYNAMIC 0x000000000001cb08 0x000000000001db08 0x000000000001db08 0x00000000000001f0 0x00000000000001f0 RW 0x8 NOTE 0x00000000000002c4 0x00000000000002c4 0x00000000000002c4 0x0000000000000044 0x0000000000000044 R 0x4 GNU_EH_FRAME 0x0000000000017e44 0x0000000000017e44 0x0000000000017e44 0x00000000000007fc 0x00000000000007fc R 0x4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 0x10 GNU_RELRO 0x000000000001bf30 0x000000000001cf30 0x000000000001cf30 0x00000000000010d0 0x00000000000010d0 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .plt.got .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .data.rel.ro .dynamic .got .data .bss 06 .dynamic 07 .note.gnu.build-id .note.ABI-tag 08 .eh_frame_hdr 09 10 .init_array .fini_array .data.rel.ro .dynamic .got
注意两者输出中的Sections Headers和Program
Headers,其中前者是指sections的入口,而后者则是segments的入口。从readelf -l的输出中同样可以看到一个segment可以包含多个section。其实正是在链接期间,链接器将一个或多个sections放进了一个segment。
同时在readelf -S的输出中我们可以看到表头同时存在Address和Offset,要了解这两者的区别需要先了解在内核中,Section
Headers的信息是如何被存储的。
Section Headers是由一种特定的数据结构组成的数组,称为section header
table,用于索引文件中所有sections的位置。这个数组的下标被称为section
header table
index。这个数据的详细信息保存在ELF文件的文件头中,可以使用readelf -h查看:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 $ readelf -h /bin/sh ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Shared object file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x4760 Start of program headers: 64 (bytes into file) Start of section headers: 119672 (bytes into file) # offset of section header table Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 11 # number of segments Size of section headers: 64 (bytes) Number of section headers: 28 # number of sections Section header string table index: 27
有一些section header table
index是被保留的,其中比较重要的是以下几个:
Value
Name
Explanation
0x0000
SHN_UNDEF
Marks a meaningless section reference
0xfff1
SHN_ABS
Specifies absolute values for the corresponding reference
0xfff2
SHN_COMMON
Symbols defined relative to it are common symbols
注意:common symbols仅存在于relocatable object file中
到这里为止,我们已经简单了解了section的相关数据在ELF文件中是如何被组织起来的。下面是一个简单的示意图:
在内核中,组成section header数据结构的代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 typedef struct elf64_shdr { Elf64_Word sh_name; Elf64_Word sh_type; Elf64_Xword sh_flags; Elf64_Addr sh_addr; Elf64_Off sh_offset; Elf64_Xword sh_size; Elf64_Word sh_link; Elf64_Word sh_info; Elf64_Xword sh_addralign; Elf64_Xword sh_entsize; } Elf64_Shdr;
下面的/bin/sh的section header table可以让我们直观地看到这一点:
sh_type的值有以下几种:
Value
Name
Explanation
0
SHT_NULL
The Section Header is inactive
1
SHT_PROGBITS
Information defined completely by the program
2
SHT_SYMTAB
All symbols needed during linking and some unnecessary ones
3
SHT_STRTAB
A string table (can be mutiple)
4
SHT_RELA
Relocation entries with explict addends (can be mutiple)
9
SHT_REL
Relocation entries without explict addends (can be mutiple)
5
SHT_HASH
A symbol hash table used in dynamic linking
6
SHT_DYNAMIC
Information for dynamic linking
11
SHT_DYNSYM
Only the symbols needed during linking
7
SHT_NOTE
Information that marks the file in some way
8
SHT_NOBITS
Looks like SHT_PROGBITS but occupies no space
10
SHT_SHLIB
(Reserved but semantics not specified)
0x70000000
SHT_LOPROC
Reserved for processor-specific semantics
0x7fffffff
SHT_HIPROC
Reserved for processor-specific semantics
0x80000000
SHT_LOUSER
The lower bound of the range of index reserved for application
programs
0xffffffff
SHT_HIUSER
The upper bound of the range of index reserved for application
programs
sh_flag的值如下所示:
Nama
Value
Explanation
SHF_WRITE
0x1
Data in this section should be writable during process execution
SHF_ALLOC
0x2
The section occupies memory during process execution
SHF_EXECINSTR
0x4
This section contains executable machine instructions
SHF_MASKPROC
0xf0000000
Reserved section
sh_link与sh_info的值取决于sh_type:
sh_type
sh_link
sh_info
SHT_DYNAMIC
The section header index of the string table used by entries in the
section
0
SHT_HASH
The section header index of the symbol table to which the hash table
applies
0
SHT_REL
SHT_RELA
The section header index of the associated symbol table
The section header index of the section to which the relocation
applies
SHT_SYMTAB
SHT_DYNSYM
The section header index of the associated string table
One greater than the symbol table index of the last local symbol
(binding STB_LOCAL).
other
SHN_UNDEF
0
可以看到,sh_link的作用在于告诉特定类型的section需要的信息的位置。
有了上面的知识,我们最后来看一下sh_addr与sh_offset的区别
sh_addr为0时,代表这个section不会出现在程序运行时的地址空间中(也就是说不会出现),否则它的值就是程序运行时这个section的地址。
而sh_offset指的是在这个ELF文件中该section的位置相对于文件首的偏移。上面sh_type的可能取值告诉我们,SHT_NOBITS类型的section并不占实际的空间,这时sh_offset指的是一个概念性的位置。即这个section理论上的偏移。
下面截取的一部分readelf -S的输出结果有助于我们理解上面的区别:
1 2 3 4 5 6 7 8 9 10 11 12 [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [20] .bss NOBITS 0000000000004020 00003020 0000000000000008 0000000000000000 WA 0 0 1 [21] .comment PROGBITS 0000000000000000 00003020 0000000000000026 0000000000000001 MS 0 0 1 [22] .symtab SYMTAB 0000000000000000 00003048 0000000000000450 0000000000000018 23 41 8 [23] .strtab STRTAB 0000000000000000 00003498 0000000000000160 0000000000000000 0 0 1 [24] .shstrtab STRTAB 0000000000000000 000035f8 00000000000000cb 0000000000000000 0 0 1
.bss段的类型是NOBITS,Flags中包含A(Alloc),这意味着这个段在ELF文件中不占空间,但是在程序运行时却需要为它分配空间,它的Address便是程序运行时该section的地址,后面的Offset便是ELF文件中的概念性的偏移地址。
下面的三个段都有一个共同特点:不具有A的Flag,也就是说不会出现在文件执行时的地址空间中,因此Address的值都为0,而Offset存在。
以上便是了解ELF文件的section的预备知识,下次将详细讲解ELF文件中的section