Write boot-block and a create-image utility with precode

The whole assignment is in the assignment-info folder. And the files in this folder describes what to do and what the finished project should be. I believe only the bootblock.S and the createimage.c files in the src folder need to be edited. Any questions just ask, but everything should be in the assignment-info folder. The task is to make a boot-block that then gets the kernel in the precode. To make this all bootable the create-image utility needs to be made to make the boot-block and kernel a bootable image.

Get Help With a similar task to - Write boot-block and a create-image utility with precode

Login to view and/or buy answers.. or post an answer
Additional Instructions:

__MACOSX/._src src/createimage.c #include <assert.h> #include <elf.h> #include <errno.h> #include <stdarg.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #define IMAGE_FILE "image" #define ARGS "[--extended] [--vm] <bootblock> <executable-file> ..." /* Variable to store pointer to program name */ char *progname; /* Variable to store pointer to the filename for the file being read. */ char *elfname; /* Structure to store command line options */ static struct { int vm; int extended; } options; /* prototypes of local functions */ static void create_image(int nfiles, char *files[]); static void error(char *fmt, ...); int main(int argc, char **argv) { /* Process command line options */ progname = argv[0]; options.vm = 0; options.extended = 0; while ((argc > 1) && (argv[1][0] == '-') && (argv[1][1] == '-')) { char *option = &argv[1][2]; if (strcmp(option, "vm") == 0) { options.vm = 1; } else if (strcmp(option, "extended") == 0) { options.extended = 1; } else { error("%s: invalid option\nusage: %s %s\n", progname, progname, ARGS); } argc--; argv++; } if (options.vm == 1) { /* This option is not needed in project 1 so we doesn't bother * implementing it*/ error("%s: option --vm not implemented\n", progname); } if (argc < 3) { /* at least 3 args (createimage bootblock kernel) */ error("usage: %s %s\n", progname, ARGS); } create_image(argc - 1, argv + 1); return 0; } static void create_image(int nfiles, char *files[]) { /* This is where you should start working on your own implemtation * of createimage. Don't forget to structure the code into * multiple functions in a way whichs seems logical, otherwise the * solution will not be accepted. */ fprintf(stderr, "This version of %s doesn't do anything.\n", progname); exit(-1); } /* print an error message and exit */ static void error(char *fmt, ...) { va_list args; va_start(args, fmt); vfprintf(stderr, fmt, args); va_end(args); if (errno != 0) { perror(NULL); } exit(EXIT_FAILURE); } __MACOSX/src/._createimage.c src/.DS_Store __MACOSX/src/._.DS_Store src/bochsrc floppya: 1_44=image, status=inserted boot: floppy cpu: count=1, ips=10000000, reset_on_triple_fault=1 megs: 32 #floppy_bootsig_check: disabled=1 log: /dev/stdout panic: action=ask error: action=report info: action=report debug: action=ignore debugger_log: - com1: enabled=0 parport1: enabled=0 vga: update_freq=60 keyboard: serial_delay=250, paste_delay=100000 #floppy_command_delay: 500 mouse: enabled=0 private_colormap: enabled=0 #gdbstub: enabled=1, port=1234, text_base=0, data_base=0, bss_base=0 __MACOSX/src/._bochsrc src/Makefile # Makefile for the OS projects. FLOPPYDEV ?= $(shell ./flpdevdetect) CC = gcc LD = ld CREATEIMAGE=./createimage.given BOOTBLOCK=bootblock.given # When you are ready to test your own implementation of createimage, you # should uncomment the following line: #CREATEIMAGE=./createimage # When you are ready to test your own implementation of the bootblock, you # should uncomment the following line: #BOOTBLOCK=bootblock # Where to locate the kernel in memory KERNEL_ADDR = 0x1000 # Compiler flags # # -fno-builtin: # Don't recognize builtin functions that do not begin with # '__builtin_' as prefix. # # -fomit-frame-pointer: # Don't keep the frame pointer in a register for functions that don't # need one. # # -O2: # Turn on all optional optimizations except for loop unrolling and # function inlining. # # -c: # Compile or assemble the source files, but do not link. # # -Wall: # Enable all warnings (Same as all `-W' options combined) # # -m32: # Compile for 32-bit CPUs CCOPTS = -Wall -Wextra -g -c -fomit-frame-pointer -O2 -fno-builtin -m32 -DKERNEL_ADDR=$(KERNEL_ADDR) # Linker flags # # -nostartfiles: # Do not use the standard system startup files when linking. # # -nostdlib: # Don't use the standard system libraries and startup files when # linking. Only the files you specify will be passed to the linker. # # -melf-i386: # Emulate the i386 ELF linker # # -Ttext <xxxx>: # Use <xxxx> as the starting address for the text segment of the # output file. LDOPTS = -nostartfiles -nostdlib -melf_i386 -Ttext # Makefile targets all: image kernel: kernel.o $(LD) $(LDOPTS) $(KERNEL_ADDR) -o kernel $< bootblock: bootblock.o $(LD) $(LDOPTS) 0x0 -o bootblock $< createimage: createimage.o $(CC) -m32 -o createimage $< # Create an image to put on the floppy image: $(BOOTBLOCK) kernel $(CREATEIMAGE) $(CREATEIMAGE) --extended $(BOOTBLOCK) kernel # Put the image on the floppy (these two stages are independent, as both # vmware and bochs can run using an image file stored on the harddisk) boot: image ifneq ($(FLOPPYDEV),) dd if=./image of=$(FLOPPYDEV) bs=512 else @echo -e "Cannot seem to find a usable floppy drive. If you think you\ have a usable\ndevice that isn't detected, try this:\ 'make FLOPPYDEV=/dev/somdevice boot'" endif # Clean up! clean: rm -f *.o rm -f createimage image bootblock kernel # No, really, clean up! distclean: clean rm -f *~ rm -f \#* rm -f *.bak rm -f bochsout.txt # How to compile a C file %.o:%.c $(CC) $(CCOPTS) $< # How to assemble %.o:%.s $(CC) $(CCOPTS) $< %.o:%.S $(CC) $(CCOPTS) -x assembler-with-cpp $< # How to produce assembler input from a C file %.s:%.c $(CC) $(CCOPTS) -S $< __MACOSX/src/._Makefile src/flpdevdetect #!/usr/bin/perl # -*-perl-*- $usbfloppy = "/dev/sda"; $floppy = "/dev/fd0"; if (-w $usbfloppy && (stat(_))[5] == getgrnam("floppy")) { print "$usbfloppy\n"; } elsif (-w $floppy) { print "$floppy\n"; } __MACOSX/src/._flpdevdetect src/bootblock_example.s # NB: this does not replace bootblock.s, which you have been # given as part of the pre-code. It is only meant as a hint, and # to show a few examples of some useful code constructions. # # Several of the things demonstrated here might be done differently # (more compact, more elegant, ++), but this is at least a starting # point that you may use. # # jmb .equ BOOT_SEGMENT, 0x07c0 .equ DISPLAY_SEGMENT, 0xb800 # You still need to decide where to put the stack # .equ STACK_SEGMENT, 0xXXXX # .equ STACK_POINTER, 0xXXXX .text .globl _start .code16 _start: jmp over os_size: .word 0 over: # setup stack movw $STACK_SEGMENT, %ax movw %ax, %ss movw $STACK_POINTER, %sp # setup data segment movw $BOOT_SEGMENT, %ax movw %ax, %ds # ------------------- # Example of a simple if-construction, # if (a == 2) # a = 3; # movw $3, %ax # set test value for a cmpw $2, %ax jne nope movw $3, %ax nope: # ------------------- # Example of # for (i = 0; i < 5; i++) # a = i; # movw $0, %cx # Use CX as variable 'i' loop1: cmpw $5, %cx jge loop1done # Jump if greater than or equal movw %cx, %ax # Use AX as variable 'a' incw %cx jmp loop1 loop1done: # Notice that cmpw $5, %cx would have been # cmp cx, 5 in Intel syntax. jge in this case # therefore means "jump to loop1done if %cx is greater # than or equal to 5" # ------------------ # Here, I've added a call to print, such that the code resembles # something like this: # for (i = 0; i < 5; i++) { # a = i; # print(mystring); /* Mystring is a char/string pointer*/ # } movw $0, %cx # Use CX as variable 'i' loop1b: cmpw $5, %cx jge loop1bdone # Jump if greater than or equal movw %cx, %ax # Use AX as variable 'a' movl $mystring, %esi # test string for debug call print # call print routine incw %cx jmp loop1b loop1bdone: # --------------------- # a = 0; # do { # a = a + 1; # } while (a < 10); movw $0, %ax loop2: incw %ax cmpw $10, %ax jl loop2 # ------------------- # Save value before function call pushw %ax # AX contains something we don't want to lose movw $mystring, %si call print popw %ax # say hello to user movl $hellostring,%esi call print forever: jmp forever # routine to print a zero terminated string pointed to by esi # Overwrites: AX, DS, BX print: movw $BOOT_SEGMENT, %ax movw %ax, %ds print_loop: lodsb cmpb $0,%al je print_done movb $14, %ah movl $0x0002, %ebx int $0x10 jmp print_loop print_done: retw # messages mystring: .asciz "test.\n\r" hellostring: .asciz "Hi there.\n\r" __MACOSX/src/._bootblock_example.s src/kernel.s .data # Data segment # Some strings kernel: .asciz "[Kernel]-> " testing: .asciz "Running a trivial test... " works: .asciz "Seems Ok. Now go get some sleep :)." not: .asciz "*Failed*" # 'Newline' string ('carriage return', 'linefeed', '\0') newline: .byte 10 .byte 13 .byte 0 # An integer result: .word 1000 .text # Code segment .code16 # Real mode .globl _start # The entry point must be global # # The first instruction to execute in a program is called the entry # point. The linker expects to find the entry point in the "symbol" _start # (with underscore). # _start: pushw %bp # Setup stack frame movw %sp,%bp pushw $newline call displayString # Print messages pushw $kernel call displayString pushw $testing call displayString pushw $1000 call trivialTest # trivialTest(1000) addw $8,%sp # Pop newline, kernel, testing, and '1000' cmpw %ax,result jne .L6 # If (trivialTest(1000) != 1000) goto L6 pushw $works jmp .L12 .L6: # Test failed pushw $not .L12: call displayString # Print ok/failed message addw $2,%sp pushw $newline call displayString addw $2,%sp .L8: # Loop forever jmp .L8 # # int trivialTest(n) # { # if (n > 0) { # trivialTest(n-1); # } # return n; # } trivialTest: pushw %bp # Setup stack frame movw %sp,%bp movw 4(%bp),%ax # Move argument to ax testw %ax,%ax # Logical compare (sets SF, ZF and PF) jg .L2 # if (argument > 0) goto L2 xorw %ax,%ax # else return 0 popw %bp retw .L2: decw %ax pushw %ax call trivialTest # trivialTest(argument - 1) # (Recursive calls until argument == 0) addw $2,%sp # Pop argument incw %ax popw %bp retw # Return (argument in ax) displayString: pushw %bp # Setup stack frame movw %sp,%bp pushw %ax # Save ax, bx, cx, si, es pushw %bx pushw %cx pushw %si pushw %es movw %ds, %ax # Make sure ES points to the right movw %ax, %es # segment movw 4(%bp),%cx # Move string adr to cx movw %cx, %si loop: lodsb # Load character to write (c) into al, # and increment si cmpb $0, %al jz done # if (c == '\0') exit loop movb $14,%ah # else print c movw $0x0002,%bx # int 0x10 sends a character to the display # ah = 0xe (14) # al = character to write # bh = active page number (we use 0x00) # bl = foreground color (we use 0x02) int $0x10 jmp loop done: popw %es # Restore saved registers popw %si popw %cx popw %bx popw %ax popw %bp retw # Return to caller __MACOSX/src/._kernel.s src/createimage.given __MACOSX/src/._createimage.given src/bootblock.given __MACOSX/src/._bootblock.given src/bootblock.S # bootblock.s # .equ symbol, expression # These directive set the value of the symbol to the expression .equ BOOT_SEGMENT, 0x07c0 .equ DISPLAY_SEGMENT, 0xb800 .equ KERNEL_SEGMENT, 0x0000 .equ KERNEL_OFFSET, 0x1000 # You need to decide where to put the stack # .equ STACK_SEGMENT, 0xXXXX # .equ STACK_POINTER, 0xXXXX .text # Code segment .globl _start # The entry point must be global .code16 # Real mode .org 0x0 # # The first instruction to execute in a program is called the entry # point. The linker expects to find the entry point in the "symbol" _start # (with underscore). # _start: jmp beyondReservedSpace kernelSize: .word 0 # bootimage will write size of kernel, in sectors beyondReservedSpace: movw $DISPLAY_SEGMENT, %bx movw %bx, %es # Clear screen movw $0x0a00, %ax # Fill with black background / green foreground movw $2000, %cx # Number of characters (80x25 screen in text mode = 2000) xorw %di, %di # DI = 0 rep stosw movb $0x4b, %es:(0x0) # Write 'K' in the upper left corner of the screen forever: jmp forever # Loop forever __MACOSX/src/._bootblock.S

__MACOSX/._assignment_info assignment_info/pc-arch.pdf A Guide to Programming Pentium/Pentium Pro Processors Kai Li, Princeton University The goal of this documentation is to provide a brief and concise documentation about Pentium PC architectures. It has a short description about the Intel Pentium and Pentium Pro processors and a brief introduction to assembly programming with the Gnu assembler. Two useful reference books are Pentium Pro Family Developers Manual, Volume 2: Programmer’s Reference Manual, Intel Corporation, 1996, and Pentium Pro Family Developers Manual, Volume 3: Operating System Writer's Manual, Intel Corporation, 1996. The on-line versions are available at http://www.x86.org/intel.doc/intelDocs.html. 1 Pentium/Prentium Pro Processor 1.1 Modes The Pentium and Pentium Pro processor has three operating modes:  Real-address mode. This mode lets the processor to address "real" memory address. It can address up to 1Mbytes of memory (20-bit of address). It can also be called "unprotected" mode since operating system (such as DOS) code runs in the same mode as the user applications. Pentium and Prentium Pro processors have this mode to be compatible with early Intel processors such as 8086. The processor is set to this mode following by a power-up or a reset and can be switched to protected mode using a single instruction.  Protected mode. This is the preferred mode for a modern operating system. It allows applications to use virtual memory addressing and supports multiple programming environment and protections.  System management mode. This mode is designed for fast state snapshot and resumption. It is useful for power management. There is also a virtual-8086 mode that allows the processor to execute 8086 code software in the protected, multi-tasking environment. 1.2 Register Set There are three types of registers: general-purpose data registers, segment registers, and status and control registers. The following figure shows these registers: General-purpose Registers The eight 32-bit general-purpose data registers are used to hold operands for logical and arithmetic operations, operands for address calculations and memory pointers. The following shows what they are used for:  EAX—Accumulator for operands and results data.  EBX—Pointer to data in the DS segment.  ECX—Counter for string and loop operations.  EDX—I/O pointer.  ESI—Pointer to data in the segment pointed to by the DS register; source pointer for string operations.  EDI—Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations.  ESP—Stack pointer (in the SS segment).  EBP—Pointer to data on the stack (in the SS segment). The following figure shows the lower 16 bits of the general-purpose registers can be used with the names AX, BX, CX, DX, BP, SP, SI, and DI (the names for the corresponding 32-bit ones have a prefix "E" for "extended"). Each of the lower two bytes of the EAX, EBX, ECX, and EDX registers can be referenced by the names AH, BH, CH, and DH (high bytes) and AL, BL, CL, and DL (low bytes). Segment Registers There are six segment registers that hold 16-bit segment selectors. A segment selector is a special pointer that identifies a segment in memory. The six segment registers are:  CS: code segment register  SS: stack segment register  DS, ES, FS, GS: data segment registers Four data segment registers provide programs with flexible and efficient ways to access data. Modern operating system and applications use the (unsegmented) memory model all the segment registers are loaded with the same segment selector so that all memory references a program makes are to a single linear-address space. When writing application code, you generally create segment selectors with assembler directives and symbols. The assembler and/or linker then creates the actual segment selectors associated with these directives and symbols. If you are writing system code, you may need to create segment selectors directly. (A detailed description of the segment-selector data structure is given in Chapter 3, Protected-Mode Memory Management, of the Pentium Pro Family Developer’s Manual, Volume 3.) Project 1 uses the real-address mode and needs to set up the segment registers properly. EFLAGS Register The 32-bit EFLAGS register contains a group of status flags, a control flag, and a group of system flags. The following shows the function of EFLAGS register bits: Function EFLAG Register bit or bits ID Flag (ID) 21 (system) Virtual Interrupt Pending (VIP) 20 (system) Virtual Interrupt Flag (VIF) 19 (system) Alignment check (AC) 18 (system) Virtual 8086 Mode (VM) 17 (system) Resume Flag (RF) 16 (system) Nested Task (NT) 14 (system) I/O Privilege Level (IOPL) 13 to 12 (system) Overflow Flag (OF) 11 (system) Direction Flag (DF) 10 (system) Interrupt Enable Flag (IF) 9 (system) Trap Flag (TF) 8 (system) Sign Flag (SF) 7 (status) Zero Flag (ZF) 6 (status) Auxiliary Carry Flag (AF) 4 (status) Parity Flag (PF) 2 (status) Carry Flag (CF) 0 (status) Bits 1, 3, 5, 15, and 22 through 31 of this register are reserved. To understand what these fields mean and how to use them, please see Section 3.6.3 and 3.6.4 in Pentium Pro Family Developers Manual, Volume 2: Programmer’s Reference Manual. EIP Register (Instruction Pointer) The EIP register (or instruction pointer) can also be called "program counter." It contains the offset in the current code segment for the next instruction to be executed. It is advanced from one instruction boundary to the next in straight-line code or it is moved ahead or backwards by a number of instructions when executing JMP, Jcc, CALL, RET, and IRET instructions. The EIP cannot be accessed directly by software; it is controlled implicitly by control-transfer instructions (such as JMP, Jcc, CALL, and RET), inter- rupts, and exceptions. The EIP register can be loaded indirectly by modifying the value of a return instruction pointer on the procedure stack and executing a return instruction (RET or IRET). Note that the value of the EIP may not match with the current instruction because of instruction prefetching. The only way to read the EIP is to execute a CALL instruction and then read the value of the return instruction pointer from the procedure stack. The x86 processors also have control registers that are not used in project 1, and thus omitted in this document. 1.3 Addressing Bit and Byte Order Pentium and Pentium-Pro processors use "little endian" as their byte order. This means that the bytes of a word are numbered starting from the least significant byte and that the least significant bit starts of a word starts in the least significant byte. Data Types The Pentium/Pentium Pro provides four data types: a byte (8 bits), a word (16 bits), a doubleword (32 bits), and a quadword (64 bits). Note that a doubleword is equivalent to "long" in Gnu assembler. Memory Addressing One can use either flat memory model or segmented memory mode. With the flat memory model, memory appears to a program as a single, continuous address space, called a linear address space. Code (a program’s instructions), data, and the procedure stack are all contained in this address space. The linear address space is byte addressable, with addresses running contiguously from 0 to 2 32 - 1. With the segmented memory mode, memory appears to a program as a group of independent address spaces called segments. When using this model, code, data, and stacks are typically contained in separate segments. To address a byte in a segment, a program must issue a logical address, which consists of a segment selector and an offset. (A logical address is often referred to as a far pointer.) The segment selector identifies the segment to be accessed and the offset identifies a byte in the address space of the segment. The programs running on a Pentium Pro processor can address up to 16,383 segments of different sizes and types. Internally, all the segments that are defined for a system are mapped into the processor’s linear address space. So, the processor translates each logical address into a linear address to access a memory location. This translation is transparent to the application program. 1.4 Processor Reset A cold boot or a warm boot can reset the CPU. A cold boot is powering up a system whereas a warm boot means that when three keys CTRL-ALT-DEL are all pressed together, the keyboard BIOS will set a special flag and resets the CPU. Upon reset, the processor sets itself to real-mode with interrupts disabled and key registers set to a known state. For example, the state of the EFLAGS register is 00000002H and the memory is unchanged. Thus, the memory will contain garbage upon a cold boot. The CPU will jump to the BIOS (Basic Input Output Services) to load the bootstrap loader program from the diskette drive or the hard disk and begins execution of the loader. The BIOS loads the bootstrap loader into the fixed address 0:7C00 and jumps to the starting address. 2 Assembly Programming It often takes a while to master the techniques to program in assembly language for a particular machine. On the other hand, it should not take much time to assembly programming for Pentium or Pentium Pro processors if you are familiar with another processor. This section assumes that you are already familiar with Gnu assembly syntax (learned from the course Introduction to Programming Systems or its equivalent). The simplest way to learn assembly programming is to compile a simple C program into its assembly source code as a template. For example, gcc -S -c foo.c will compile foo.c its assembly source foo.s. The source code will tell you common opcodes, directives and addressing syntax. The goal of this section is to answer some frequently encountered questions and provide pointers to related documents. 2.1 Memory operands Pentium and Pentium Pro processors use segmented memory architecture. It means that the memory locations are referenced by means of a segment selector and an offset:  The segment selector specifies the segment containing the operand, and  The offset (the number of bytes from the beginning of the segment to the first byte of the operand) specifies the linear or effective address of the operand. The segment selector can be specified either implicitly or explicitly. The most common method of specifying a segment selector is to load it in a segment register and then allow the processor to select the register implicitly, depending on the type of operation being performed. The processor automatically chooses a segment according to the following rules:  Code segment register CS for instruction fetches  Stack segment register SS for stack pushes and pops as well as references using ESP or EBP as a base register  Data segment register DS for all data references except when relative to stack or string destination  Data segment register ES for the destinations of string instructions The offset part of the memory address can be specified either directly as a static value (called a displacement) or through an address computation made up of one or more of the following components:  Displacement—An 8-, 16-, or 32-bit value.  Base—The value in a general-purpose register.  Index—The value in a general-purpose register except EBP.  Scale factor—A value of 2, 4, or 8 that is multiplied by the index value. An effective address is computed by: Offset = Base + (Index  Scale) + displacement The offset which results from adding these components is called an effective address of the selected segment. Each of these components can have either a positive or negative (2's complement) value, with the exception of the scaling factor. Intel AT&T (Gnu Syntax) Immediate operands Undelimited e.g.: push 4 mov ebx, d00ah Preceded by "$" e.g.:push $4 movl $0xd00a, %eax Register operands Undelimited e.g.: eax Preceded by "%" e.g.: %eax Argument order (e.g. adds the address of C variable "foo" to register EAX) Dest, source [, source2] e.g.: add eax, _foo Source, [source,] dest e.g.: addl $_foo, %eax Single-size operands Implicit with register name, byte ptr, word ptr, or dword ptr e.g.: mov al, foo opcode{b,w,l} e.g.: movb foo, %al Address a C variable "foo" [_foo] _foo Address memory pointed by a register (e.g. EAX) [eax] (%eax) Address a variable offset by a value in the register [eax + _foo] _foo(%eax) Address a value in an array "foo" of 32-bit integers [eax*4+foo] _foo(,%eax,4) Equivalent to C code *(p+1) If EAX holds the value of p, then [eax+1] 1(%eax) 2.2 Instruction Syntax There are two conventions about their syntax and representations: Intel and AT&T. Most documents including those at http://www.x86.org use the Intel convention, whereas the Gnu assembler uses the AT&T convention. The main differences are: In addition, with the AT&T syntax, the name for a long JUMP is ljmp and long CALL is lcall. Section 6-6 of Pentium Pro Family Developers Manual, Volume 2: Programmer’s Reference Manual has a complete list of the Pentium Pro instructions. Section 11 provides the detailed description for each instruction. The instruction names obviously use the Intel convention and you need to convert them to the AT&T syntax. 2.3 Assembler Directives The Gnu assemler directives are machine independent, so your knowledge about assembly programming applies. All directive names begin with a period "." and the rest are letters in lower case. Here are some examples of commonly used directives: .ascii "string" defines an ASCII string "string" .byte 10, 13, 0 defines three bytes .word 0x0456, 0x1234 defines two words .long 0x001234, 0x12345 defines two long words .equ STACK_SEGMENT, 0x9000 sets symbol STACK_SEGMENT the value 0x9000 .globl symbol makes "symbol" global (useful for defining global labels and procedure names) .code16 tells the assembler to insert the appropriate override prefixes so the code will run in real mode. When using directives to define a string, bytes or a word, you often want to make sure that they are aligned to 32-bit long word by padding additional bytes. 2.4 Inline Assembly The most basic format of inline assembly code into your the assembly code generated by the gcc compiler is to use asm( "assembly-instruction" ); where assembly-instruction will be inlined into where the asm statement is. This is a very convenient way to inline assembly instructions that require no registers. For example, you can use asm( "cli" ); to clear interrupts and asm( "sti" ); to enable interrupts. The general format to write inline assembly code in C is: asm( "statements": output_regs: input_regs: used_regs); where statements are the assembly instructions. If there are more than one instruction, you can use "\n\t" to separate them to make them look pretty. "input_regs" tells gcc compiler which C variables move to which registers. For example, if you would like to load variable "foo" into register EAX and "bar" into register ECX, you would say : "a" (foo), "c" (bar) gcc uses single letters to represent all registers: Single Letters Registers a eax b ebx c ecx d edx S esi D edi I constant value (0 to 31) q allocate a register from EAX, EBX, ECX, EDX r allocate a register from EAX, EBX, ECX, EDX, ESI, EDI Note that you cannot specify register AH or AL this way. You need to get to EAX first and then go from there. "output_regs" provides output registers. A convenient way to do this is to let gcc compiler to pick the registers for you. You need to say "=q" or "=r" to let gcc compiler pick registers for you. You can refer to the first allocated register with "%0", second with "%1", and so on, in the assembly instructions. If you refer to the registers in the input register list, you simply say "0" or "1" without the "%" prefix. "used_regs" lists the registers that are used (or clobbered) in the assembly code. To understand exactly how to do this, please try to use gcc to compile a piece of C code containing the following inline assembly: asm ("leal (%1,%1,4), %0" : "=r" (x_times_5) : "r" (x) ); and asm ("leal (%0,%0,4), %0" : "=r" (x) : "0" (x) ); Also, to avoid the gcc compiler's optimizer to remove the assembly code, you can put in keyword volitale to ensure your inline. Here are some macro code examples: #define disable() __asm__ __volatile__ ("cli"); #define enable() __asm__ __volatile__ ("sti"); to disable and enable interrupts. To see more information, you can read another guide that has more information about inline assembly.   __MACOSX/assignment_info/._pc-arch.pdf assignment_info/createimage.html NAME createimage - create operating system image suitable for placement on a boot disk. SYNOPSIS createimage [--extended] [--vm] <bootblock> <file> ... DESCRIPTION This manual page documents the createimage program used to produce an image suitable for placement on a boot disk. When run, the result is placed in a file called image residing in the directory the command was invoked from. The image can be placed on a boot disk (e.g. /dev/fd0) by issuing      cat image > /dev/fd0  on the shell command-line, assuming that you have supplied the bootblock code as one of the executable components. createimage parses each of the given executable files according to the ELF specification. Thus, the executable files must be compliant to the ELF standard of position independent linking. The format of the image is fairly simple. The first 512 bytes of the file contains the code for the bootblock. The memory image of the entire OS follows the bootblock. The size of the OS (in sectors) is stored as a "short int" (2 bytes) at location 2 (counting from 0) in the image. OPTIONS --extended Print extended information useful for debugging the operating system image, process placement in memory and so on. Provides detailed information on the size and position of each executable component in the image file. --vm  This option tells createimage to produce an image that can be easily tailored to a virtual memory operating system. A structure, called directory , is placed at the end of the image file, describing where in physical  memory the processes should be placed. SEE ALSO objdump(1), gcc(1), ld(1), gas(1) Check the ELF Documentation available at the course homepage for details on parsing of the executable components. BUGS The --extended switch must be placed before the --vm switch __MACOSX/assignment_info/._createimage.html assignment_info/.DS_Store __MACOSX/assignment_info/._.DS_Store assignment_info/elfdoc.pdf I Executable and Linkable Format (ELF) Contents Preface 1 OBJECT FILESIntroduction 1-1 ELF Header 1-3 Sections 1-8 String Table 1-16 Symbol Table 1-17 Relocation 1-21 2 PROGRAM LOADING AND DYNAMIC LINKINGIntroduction 2-1 Program Header 2-2 Program Loading 2-7 Dynamic Linking 2-10 3 C LIBRARYC Library 3-1 I IndexIndex I-1 Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 i ELF: Executable and Linkable Format ii Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) Figures and Tables Figure 1-1: Object File Format 1-1 Figure 1-2: 32-Bit Data Types 1-2 Figure 1-3: ELF Header 1-3 Figure 1-4: e _ i d e n t [ ] Identification Indexes 1-5 Figure 1-5: Data Encoding E L F D A T A 2 L S B 1-6 Figure 1-6: Data Encoding E L F D A T A 2 M S B 1-6 Figure 1-7: 32-bit Intel Architecture Identification, e _ i d e n t 1-7 Figure 1-8: Special Section Indexes 1-8 Figure 1-9: Section Header 1-9 Figure 1-10: Section Types, s h _ t y p e 1-10 Figure 1-11: Section Header Table Entry: Index 0 1-11 Figure 1-12: Section Attribute Flags, s h _ f l a g s 1-12 Figure 1-13: s h _ l i n k and s h _ i n f o Interpretation 1-13 Figure 1-14: Special Sections 1-13 Figure 1-15: String Table Indexes 1-16 Figure 1-16: Symbol Table Entry 1-17 Figure 1-17: Symbol Binding, E L F 3 2 _ S T _ B I N D 1-18 Figure 1-18: Symbol Types, E L F 3 2 _ S T _ T Y P E 1-19 Figure 1-19: Symbol Table Entry: Index 0 1-20 Figure 1-20: Relocation Entries 1-21 Figure 1-21: Relocatable Fields 1-22 Figure 1-22: Relocation Types 1-23 Figure 2-1: Program Header 2-2 Figure 2-2: Segment Types, p _ t y p e 2-3 Figure 2-3: Note Information 2-4 Figure 2-4: Example Note Segment 2-5 Figure 2-5: Executable File 2-7 Figure 2-6: Program Header Segments 2-7 Figure 2-7: Process Image Segments 2-8 Figure 2-8: Example Shared Object Segment Addresses 2-9 Figure 2-9: Dynamic Structure 2-12 Figure 2-10: Dynamic Array Tags, d _ t a g 2-12 Figure 2-11: Global Offset Table 2-17 Figure 2-12: Absolute Procedure Linkage Table 2-17 Figure 2-13: Position-Independent Procedure Linkage Table 2-18 Figure 2-14: Symbol Hash Table 2-19 Figure 2-15: Hashing Function 2-20 Figure 3-1: l i b c Contents, Names without Synonyms 3-1 Figure 3-2: l i b c Contents, Names with Synonyms 3-1 Figure 3-3: l i b c Contents, Global External Data Symbols 3-2 Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 iii Preface ELF: Executable and Linking Format The Executable and Linking Format was originally developed and published by UNIX System Labora- tories (USL) as part of the Application Binary Interface (ABI). The Tool Interface Standards committee (TIS) has selected the evolving ELF standard as a portable object file format that works on 32-bit Intel Architecture environments for a variety of operating systems. The ELF standard is intended to streamline software development by providing developers with a set of binary interface definitions that extend across multiple operating environments. This should reduce the number of different interface implementations, thereby reducing the need for recoding and recompiling code. About This Document This document is intended for developers who are creating object or executable files on various 32-bit environment operating systems. It is divided into the following three parts: Part 1, ‘‘Object Files’’ describes the ELF object file format for the three main types of object files. Part 2, ‘‘Program Loading and Dynamic Linking’’ describes the object file information and system actions that create running programs. Part 3, ‘‘C Library’’ lists the symbols contained in l i b s y s, the standard ANSI C and l i b c routines, and the global data symbols required by the l i b c routines. NOTE References to X86 architecture have been changed to Intel Architecture. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1 1 OBJECT FILES Introduction 1-1 File Format 1-1 Data Representation 1-2 ELF Header 1-3 ELF Identification 1-5 Machine Information 1-7 Sections 1-8 Special Sections 1-13 String Table 1-16 Symbol Table 1-17 Symbol Values 1-20 Relocation 1-21 Relocation Types 1-22 Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 i Introduction Part 1 describes the iABI object file format, called ELF (Executable and Linking Format). There are three main types of object files. A relocatable file holds code and data suitable for linking with other object files to create an execut- able or a shared object file. An executable file holds a program suitable for execution; the file specifies how e x e c(BA_OS) creates a program’s process image. A shared object file holds code and data suitable for linking in two contexts. First, the link editor [see l d(SD_CMD)] may process it with other relocatable and shared object files to create another object file. Second, the dynamic linker combines it with an executable file and other shared objects to create a process image. Created by the assembler and link editor, object files are binary representations of programs intended to execute directly on a processor. Programs that require other abstract machines, such as shell scripts, are excluded. After the introductory material, Part 1 focuses on the file format and how it pertains to building pro- grams. Part 2 also describes parts of the object file, concentrating on the information necessary to execute a program. File Format Object files participate in program linking (building a program) and program execution (running a pro- gram). For convenience and efficiency, the object file format provides parallel views of a file’s contents, reflecting the differing needs of these activities. Figure 1-1 shows an object file’s organization. Figure 1-1: Object File Format Linking View Execution View_ _____________________ _ ______________________ ELF header ELF header_ _____________________ _ ______________________ Program header table Program header table optional_ _____________________ _ ______________________ Section 1_ _____________________ . . . Segment 1 _ _____________________ _ ______________________ Section n_ _____________________ . . . Segment 2 _ _____________________ _ ______________________ . . . . . . _ _____________________ _ ______________________ Section header table Section header table optional_ _____________________ _ ______________________                                                     An ELF header resides at the beginning and holds a ‘‘road map’’ describing the file’s organization. Sec- tions hold the bulk of object file information for the linking view: instructions, data, symbol table, reloca- tion information, and so on. Descriptions of special sections appear later in Part 1. Part 2 discusses seg- ments and the program execution view of the file. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-1 ELF: Executable and Linkable Format A program header table , if present, tells the system how to create a process image. Files used to build a pro- cess image (execute a program) must have a program header table; relocatable files do not need one. A section header table contains information describing the file’s sections. Every section has an entry in the table; each entry gives information such as the section name, the section size, etc. Files used during link- ing must have a section header table; other object files may or may not have one. NOTE Although the figure shows the program header table immediately after the ELF header, and the section header table following the sections, actual files may differ. Moreover, sections and segments have no specified order. Only the ELF header has a fixed position in the file. Data Representation As described here, the object file format supports various processors with 8-bit bytes and 32-bit architec- tures. Nevertheless, it is intended to be extensible to larger (or smaller) architectures. Object files there- fore represent some control data with a machine-independent format, making it possible to identify object files and interpret their contents in a common way. Remaining data in an object file use the encod- ing of the target processor, regardless of the machine on which the file was created. Figure 1-2: 32-Bit Data Types Name Size Alignment Purpose_ ____________________________________________________________ E l f 3 2 _ A d d r 4 4 Unsigned program address E l f 3 2 _ H a l f 2 2 Unsigned medium integer E l f 3 2 _ O f f 4 4 Unsigned file offset E l f 3 2 _ S w o r d 4 4 Signed large integer E l f 3 2 _ W o r d 4 4 Unsigned large integer u n s i g n e d c h a r 1 1 Unsigned small integer_ ____________________________________________________________                         All data structures that the object file format defines follow the ‘‘natural’’ size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of 4, etc. Data also have suitable alignment from the beginning of the file. Thus, for example, a structure containing an E l f 3 2 _ A d d r member will be aligned on a 4-byte boundary within the file. For portability reasons, ELF uses no bit-fields. 1-2 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF Header Some object file control structures can grow, because the ELF header contains their actual sizes. If the object file format changes, a program may encounter control structures that are larger or smaller than expected. Programs might therefore ignore ‘‘extra’’ information. The treatment of ‘‘missing’’ informa- tion depends on context and will be specified when and if extensions are defined. Figure 1-3: ELF Header # d e f i n e E I _ N I D E N T 1 6 t y p e d e f s t r u c t { u n s i g n e d c h a r e _ i d e n t [ E I _ N I D E N T ] ; E l f 3 2 _ H a l f e _ t y p e ; E l f 3 2 _ H a l f e _ m a c h i n e ; E l f 3 2 _ W o r d e _ v e r s i o n ; E l f 3 2 _ A d d r e _ e n t r y ; E l f 3 2 _ O f f e _ p h o f f ; E l f 3 2 _ O f f e _ s h o f f ; E l f 3 2 _ W o r d e _ f l a g s ; E l f 3 2 _ H a l f e _ e h s i z e ; E l f 3 2 _ H a l f e _ p h e n t s i z e ; E l f 3 2 _ H a l f e _ p h n u m ; E l f 3 2 _ H a l f e _ s h e n t s i z e ; E l f 3 2 _ H a l f e _ s h n u m ; E l f 3 2 _ H a l f e _ s h s t r n d x ; } E l f 3 2 _ E h d r ; e_ident The initial bytes mark the file as an object file and provide machine-independent data with which to decode and interpret the file’s contents. Complete descriptions appear below, in ‘‘ELF Identification.’’ e_type This member identifies the object file type. Name Value Meaning_ _______________________________________ ET_NONE 0 No file type ET_REL 1 Relocatable file ET_EXEC 2 Executable file ET_DYN 3 Shared object file ET_CORE 4 Core file ET_LOPROC 0xff00 Processor-specific ET_HIPROC 0xffff Processor-specific_ _______________________________________                   Although the core file contents are unspecified, type ET_CORE is reserved to mark the file. Values from ET_LOPROC through ET_HIPROC (inclusive) are reserved for processor-specific semantics. Other values are reserved and will be assigned to new object file types as necessary. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-3 ELF: Executable and Linkable Format e_machine This member’s value specifies the required architecture for an individual file. Name Value Meaning_ ___________________________________ EM_NONE 0 No machine EM_M32 1 AT&T WE 32100 EM_SPARC 2 SPARC EM_386 3 Intel 80386 EM_68K 4 Motorola 68000 EM_88K 5 Motorola 88000 EM_860 7 Intel 80860 EM_MIPS 8 MIPS RS3000_ ___________________________________                     Other values are reserved and will be assigned to new machines as necessary. Processor-specific ELF names use the machine name to distinguish them. For example, the flags mentioned below use the prefix EF_ ; a flag named WIDGET for the EM_XYZ machine would be called EF_XYZ_WIDGET . e_version This member identifies the object file version. Name Value Meaning_ _____________________________________ EV_NONE 0 Invalid version EV_CURRENT 1 Current version_ _____________________________________       The value 1 signifies the original file format; extensions will create new versions with higher numbers. The value of EV_CURRENT , though given as 1 above, will change as necessary to reflect the current version number. e_entry This member gives the virtual address to which the system first transfers control, thus starting the process. If the file has no associated entry point, this member holds zero. e_phoff This member holds the program header table’s file offset in bytes. If the file has no program header table, this member holds zero. e_shoff This member holds the section header table’s file offset in bytes. If the file has no sec- tion header table, this member holds zero. e_flags This member holds processor-specific flags associated with the file. Flag names take the form EF_machine_flag. See ‘‘Machine Information’’ for flag definitions. e_ehsize This member holds the ELF header’s size in bytes. e_phentsize This member holds the size in bytes of one entry in the file’s program header table; all entries are the same size. e_phnum This member holds the number of entries in the program header table. Thus the pro- duct of e_phentsize and e_phnum gives the table’s size in bytes. If a file has no pro- gram header table, e_phnum holds the value zero. e_shentsize This member holds a section header’s size in bytes. A section header is one entry in the section header table; all entries are the same size. e_shnum This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table’s size in bytes. If a file has no section header table, e_shnum holds the value zero. 1-4 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format e_shstrndx This member holds the section header table index of the entry associated with the sec- tion name string table. If the file has no section name string table, this member holds the value SHN_UNDEF . See ‘‘Sections’’ and ‘‘String Table’’ below for more informa- tion. ELF Identification As mentioned above, ELF provides an object file framework to support multiple processors, multiple data encodings, and multiple classes of machines. To support this object file family, the initial bytes of the file specify how to interpret the file, independent of the processor on which the inquiry is made and indepen- dent of the file’s remaining contents. The initial bytes of an ELF header (and an object file) correspond to the e_ident member. Figure 1-4: e_ident[ ] Identification Indexes Name Value Purpose_ __________________________________________ EI_MAG0 0 File identification EI_MAG1 1 File identification EI_MAG2 2 File identification EI_MAG3 3 File identification EI_CLASS 4 File class EI_DATA 5 Data encoding EI_VERSION 6 File version EI_PAD 7 Start of padding bytes EI_NIDENT 16 Size of e_ident[]_ __________________________________________                       These indexes access bytes that hold the following values. EI_MAG0 to EI_MAG3 A file’s first 4 bytes hold a ‘‘magic number,’’ identifying the file as an ELF object file. Name Value Position_ ______________________________________ ELFMAG0 0x7f e_ident[EI_MAG0] ELFMAG1 ’E’ e_ident[EI_MAG1] ELFMAG2 ’L’ e_ident[EI_MAG2] ELFMAG3 ’F’ e_ident[EI_MAG3]_ ______________________________________           EI_CLASS The next byte, e_ident[EI_CLASS] , identifies the file’s class, or capacity. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-5 ELF: Executable and Linkable Format Name Value Meaning_ _____________________________________ ELFCLASSNONE 0 Invalid class ELFCLASS32 1 32-bit objects ELFCLASS64 2 64-bit objects_ _____________________________________         The file format is designed to be portable among machines of various sizes, without imposing the sizes of the largest machine on the smallest. Class ELFCLASS32 supports machines with files and virtual address spaces up to 4 gigabytes; it uses the basic types defined above. Class ELFCLASS64 is reserved for 64-bit architectures. Its appearance here shows how the object file may change, but the 64-bit format is otherwise unspecified. Other classes will be defined as necessary, with different basic types and sizes for object file data. EI_DATA Byte e_ident[EI_DATA] specifies the data encoding of the processor-specific data in the object file. The following encodings are currently defined. Name Value Meaning_ ___________________________________________ ELFDATANONE 0 Invalid data encoding ELFDATA2LSB 1 See below ELFDATA2MSB 2 See below_ ___________________________________________         More information on these encodings appears below. Other values are reserved and will be assigned to new encodings as necessary. EI_VERSION Byte e_ident[EI_VERSION] specifies the ELF header version number. Currently, this value must be EV_CURRENT , as explained above for e_version . EI_PAD This value marks the beginning of the unused bytes in e_ident . These bytes are reserved and set to zero; programs that read object files should ignore them. The value of EI_PAD will change in the future if currently unused bytes are given meanings. A file’s data encoding specifies how to interpret the basic objects in a file. As described above, class ELFCLASS32 files use objects that occupy 1, 2, and 4 bytes. Under the defined encodings, objects are represented as shown below. Byte numbers appear in the upper left corners. Encoding ELFDATA2LSB specifies 2’s complement values, with the least significant byte occupying the lowest address. Figure 1-5: Data Encoding ELFDATA2LSB 01 0 0x01 02 0 01 1 0x0102 04 0 03 1 02 2 01 3 0x01020304 1-6 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format Encoding ELFDATA2MSB specifies 2’s complement values, with the most significant byte occupying the lowest address. Figure 1-6: Data Encoding ELFDATA2MSB 01 0 0x01 01 0 02 1 0x0102 01 0 02 1 03 2 04 3 0x01020304 Machine Information For file identification in e_ident , the 32-bit Intel Architecture requires the following values. Figure 1-7: 32-bit Intel Architecture Identification, e_ident Position Value_ ____________________________________ e_ident[EI_CLASS] ELFCLASS32 e_ident[EI_DATA] ELFDATA2LSB_ ____________________________________    Processor identification resides in the ELF header’s e_machine member and must have the value EM_386 . The ELF header’s e_flags member holds bit flags associated with the file. The 32-bit Intel Architecture defines no flags; so this member contains zero. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-7 Sections An object file’s section header table lets one locate all the file’s sections. The section header table is an array of Elf32_Shdr structures as described below. A section header table index is a subscript into this array. The ELF header’s e_shoff member gives the byte offset from the beginning of the file to the sec- tion header table; e_shnum tells how many entries the section header table contains; e_shentsize gives the size in bytes of each entry. Some section header table indexes are reserved; an object file will not have sections for these special indexes. Figure 1-8: Special Section Indexes Name Value_ _________________________ SHN_UNDEF 0 SHN_LORESERVE 0xff00 SHN_LOPROC 0xff00 SHN_HIPROC 0xff1f SHN_ABS 0xfff1 SHN_COMMON 0xfff2 SHN_HIRESERVE 0xffff_ _________________________          SHN_UNDEF This value marks an undefined, missing, irrelevant, or otherwise meaningless section reference. For example, a symbol ‘‘defined’’ relative to section number SHN_UNDEF is an undefined symbol. NOTE Although index 0 is reserved as the undefined value, the section header table contains an entry for index 0. That is, if the e_shnum member of the ELF header says a file has 6 entries in the section header table, they have the indexes 0 through 5. The contents of the initial entry are specified later in this section. SHN_LORESERVE This value specifies the lower bound of the range of reserved indexes. SHN_LOPROC through SHN_HIPROC Values in this inclusive range are reserved for processor-specific semantics. SHN_ABS This value specifies absolute values for the corresponding reference. For example, symbols defined relative to section number SHN_ABS have absolute values and are not affected by relocation. SHN_COMMON Symbols defined relative to this section are common symbols, such as FORTRAN COMMON or unallocated C external variables. SHN_HIRESERVE This value specifies the upper bound of the range of reserved indexes. The system reserves indexes between SHN_LORESERVE and SHN_HIRESERVE , inclusive; the values do not reference the section header table. That is, the section header table does not contain entries for the reserved indexes. Sections contain all information in an object file, except the ELF header, the program header table, and the section header table. Moreover, object files’ sections satisfy several conditions. 1-8 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format Every section in an object file has exactly one section header describing it. Section headers may exist that do not have a section. Each section occupies one contiguous (possibly empty) sequence of bytes within a file. Sections in a file may not overlap. No byte in a file resides in more than one section. An object file may have inactive space. The various headers and the sections might not ‘‘cover’’ every byte in an object file. The contents of the inactive data are unspecified. A section header has the following structure. Figure 1-9: Section Header t y p e d e f s t r u c t { E l f 3 2 _ W o r d s h _ n a m e ; E l f 3 2 _ W o r d s h _ t y p e ; E l f 3 2 _ W o r d s h _ f l a g s ; E l f 3 2 _ A d d r s h _ a d d r ; E l f 3 2 _ O f f s h _ o f f s e t ; E l f 3 2 _ W o r d s h _ s i z e ; E l f 3 2 _ W o r d s h _ l i n k ; E l f 3 2 _ W o r d s h _ i n f o ; E l f 3 2 _ W o r d s h _ a d d r a l i g n ; E l f 3 2 _ W o r d s h _ e n t s i z e ; } E l f 3 2 _ S h d r ; sh_name This member specifies the name of the section. Its value is an index into the section header string table section [see ‘‘String Table’’ below], giving the location of a null- terminated string. sh_type This member categorizes the section’s contents and semantics. Section types and their descriptions appear below. sh_flags Sections support 1-bit flags that describe miscellaneous attributes. Flag definitions appear below. sh_addr If the section will appear in the memory image of a process, this member gives the address at which the section’s first byte should reside. Otherwise, the member con- tains 0. sh_offset This member’s value gives the byte offset from the beginning of the file to the first byte in the section. One section type, SHT_NOBITS described below, occupies no space in the file, and its sh_offset member locates the conceptual placement in the file. sh_size This member gives the section’s size in bytes. Unless the section type is SHT_NOBITS , the section occupies sh_size bytes in the file. A section of type SHT_NOBITS may have a non-zero size, but it occupies no space in the file. sh_link This member holds a section header table index link, whose interpretation depends on the section type. A table below describes the values. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-9 ELF: Executable and Linkable Format sh_info This member holds extra information, whose interpretation depends on the section type. A table below describes the values. sh_addralign Some sections have address alignment constraints. For example, if a section holds a doubleword, the system must ensure doubleword alignment for the entire section. That is, the value of sh_addr must be congruent to 0, modulo the value of sh_addralign . Currently, only 0 and positive integral powers of two are allowed. Values 0 and 1 mean the section has no alignment constraints. sh_entsize Some sections hold a table of fixed-size entries, such as a symbol table. For such a sec- tion, this member gives the size in bytes of each entry. The member contains 0 if the section does not hold a table of fixed-size entries. A section header’s sh_type member specifies the section’s semantics. Figure 1-10: Section Types, sh_type Name Value_ _____________________________ SHT_NULL 0 SHT_PROGBITS 1 SHT_SYMTAB 2 SHT_STRTAB 3 SHT_RELA 4 SHT_HASH 5 SHT_DYNAMIC 6 SHT_NOTE 7 SHT_NOBITS 8 SHT_REL 9 SHT_SHLIB 10 SHT_DYNSYM 11 SHT_LOPROC 0x70000000 SHT_HIPROC 0x7fffffff SHT_LOUSER 0x80000000 SHT_HIUSER 0xffffffff_ _____________________________                     SHT_NULL This value marks the section header as inactive; it does not have an associated section. Other members of the section header have undefined values. SHT_PROGBITS The section holds information defined by the program, whose format and meaning are determined solely by the program. SHT_SYMTAB and SHT_DYNSYM These sections hold a symbol table. Currently, an object file may have only one sec- tion of each type, but this restriction may be relaxed in the future. Typically, SHT_SYMTAB provides symbols for link editing, though it may also be used for dynamic linking. As a complete symbol table, it may contain many symbols unneces- sary for dynamic linking. Consequently, an object file may also contain a SHT_DYNSYM section, which holds a minimal set of dynamic linking symbols, to save space. See ‘‘Symbol Table’’ below for details. 1-10 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format SHT_STRTAB The section holds a string table. An object file may have multiple string table sections. See ‘‘String Table’’ below for details. SHT_RELA The section holds relocation entries with explicit addends, such as type Elf32_Rela for the 32-bit class of object files. An object file may have multiple relocation sections. See ‘‘Relocation’’ below for details. SHT_HASH The section holds a symbol hash table. All objects participating in dynamic linking must contain a symbol hash table. Currently, an object file may have only one hash table, but this restriction may be relaxed in the future. See ‘‘Hash Table’’ in Part 2 for details. SHT_DYNAMIC The section holds information for dynamic linking. Currently, an object file may have only one dynamic section, but this restriction may be relaxed in the future. See ‘‘Dynamic Section’’ in Part 2 for details. SHT_NOTE The section holds information that marks the file in some way. See ‘‘Note Section’’ in Part 2 for details. SHT_NOBITS A section of this type occupies no space in the file but otherwise resembles SHT_PROGBITS . Although this section contains no bytes, the sh_offset member contains the conceptual file offset. SHT_REL The section holds relocation entries without explicit addends, such as type Elf32_Rel for the 32-bit class of object files. An object file may have multiple reloca- tion sections. See ‘‘Relocation’’ below for details. SHT_SHLIB This section type is reserved but has unspecified semantics. Programs that contain a section of this type do not conform to the ABI. SHT_LOPROC through SHT_HIPROC Values in this inclusive range are reserved for processor-specific semantics. SHT_LOUSER This value specifies the lower bound of the range of indexes reserved for application programs. SHT_HIUSER This value specifies the upper bound of the range of indexes reserved for application programs. Section types between SHT_LOUSER and SHT_HIUSER may be used by the application, without conflicting with current or future system-defined section types. Other section type values are reserved. As mentioned before, the section header for index 0 (SHN_UNDEF) exists, even though the index marks undefined section references. This entry holds the fol- lowing. Figure 1-11: Section Header Table Entry: Index 0 Name Value Note_ _____________________________________________________ sh_name 0 No name sh_type SHT_NULL Inactive sh_flags 0 No flags sh_addr 0 No address sh_offset 0 No file offset sh_size 0 No size              Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-11 ELF: Executable and Linkable Format Figure 1-11: Section Header Table Entry: Index 0 (continued ) sh_link SHN_UNDEF No link information sh_info 0 No auxiliary information sh_addralign 0 No alignment sh_entsize 0 No entries_ _____________________________________________________           A section header’s sh_flags member holds 1-bit flags that describe the section’s attributes. Defined values appear below; other values are reserved. Figure 1-12: Section Attribute Flags, sh_flags Name Value_ ______________________________ SHF_WRITE 0x1 SHF_ALLOC 0x2 SHF_EXECINSTR 0x4 SHF_MASKPROC 0xf0000000_ ______________________________      If a flag bit is set in sh_flags , the attribute is ‘‘on’’ for the section. Otherwise, the attribute is ‘‘off’’ or does not apply. Undefined attributes are set to zero. SHF_WRITE The section contains data that should be writable during process execution. SHF_ALLOC The section occupies memory during process execution. Some control sections do not reside in the memory image of an object file; this attribute is off for those sections. SHF_EXECINSTR The section contains executable machine instructions. SHF_MASKPROC All bits included in this mask are reserved for processor-specific semantics. Two members in the section header, sh_link and sh_info , hold special information, depending on section type. 1-12 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format Figure 1-13: sh_link and sh_info Interpretation sh_type sh_link sh_info_ _____________________________________________________________________ The section header index of the string table used by entries in the section. SHT_DYNAMIC 0 _ _____________________________________________________________________ The section header index of the symbol table to which the hash table applies. SHT_HASH 0 _ _____________________________________________________________________ SHT_REL SHT_RELA The section header index of the associated symbol table. The section header index of the section to which the relocation applies._ _____________________________________________________________________ SHT_SYMTAB SHT_DYNSYM The section header index of the associated string table. One greater than the sym- bol table index of the last local symbol (binding STB_LOCAL)._ _____________________________________________________________________ other SHN_UNDEF 0_ _____________________________________________________________________                                       Special Sections Various sections hold program and control information. Sections in the list below are used by the system and have the indicated types and attributes. Figure 1-14: Special Sections Name Type Attributes_ ___________________________________________________________ .bss SHT_NOBITS SHF_ALLOC + SHF_WRITE .comment SHT_PROGBITS none .data SHT_PROGBITS SHF_ALLOC + SHF_WRITE .data1 SHT_PROGBITS SHF_ALLOC + SHF_WRITE .debug SHT_PROGBITS none .dynamic SHT_DYNAMIC see below .dynstr SHT_STRTAB SHF_ALLOC .dynsym SHT_DYNSYM SHF_ALLOC .fini SHT_PROGBITS SHF_ALLOC + SHF_EXECINSTR .got SHT_PROGBITS see below .hash SHT_HASH SHF_ALLOC .init SHT_PROGBITS SHF_ALLOC + SHF_EXECINSTR .interp SHT_PROGBITS see below .line SHT_PROGBITS none .note SHT_NOTE none .plt SHT_PROGBITS see below .relname SHT_REL see below                                        Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-13 ELF: Executable and Linkable Format Figure 1-14: Special Sections (continued ) .relaname SHT_RELA see below .rodata SHT_PROGBITS SHF_ALLOC .rodata1 SHT_PROGBITS SHF_ALLOC .shstrtab SHT_STRTAB none .strtab SHT_STRTAB see below .symtab SHT_SYMTAB see below .text SHT_PROGBITS SHF_ALLOC + SHF_EXECINSTR_ ___________________________________________________________                   .bss This section holds uninitialized data that contribute to the program’s memory image. By definition, the system initializes the data with zeros when the program begins to run. The section occupies no file space, as indicated by the section type, SHT_NOBITS . .comment This section holds version control information. .data and .data1 These sections hold initialized data that contribute to the program’s memory image. .debug This section holds information for symbolic debugging. The contents are unspecified. .dynamic This section holds dynamic linking information. The section’s attributes will include the SHF_ALLOC bit. Whether the SHF_WRITE bit is set is processor specific. See Part 2 for more information. .dynstr This section holds strings needed for dynamic linking, most commonly the strings that represent the names associated with symbol table entries. See Part 2 for more information. .dynsym This section holds the dynamic linking symbol table, as ‘‘Symbol Table’’ describes. See Part 2 for more information. .fini This section holds executable instructions that contribute to the process termination code. That is, when a program exits normally, the system arranges to execute the code in this section. .got This section holds the global offset table. See ‘‘Special Sections’’ in Part 1 and ‘‘Global Offset Table’’ in Part 2 for more information. .hash This section holds a symbol hash table. See ‘‘Hash Table’’ in Part 2 for more information. .init This section holds executable instructions that contribute to the process initialization code. That is, when a program starts to run, the system arranges to execute the code in this sec- tion before calling the main program entry point (called main for C programs). .interp This section holds the path name of a program interpreter. If the file has a loadable seg- ment that includes the section, the section’s attributes will include the SHF_ALLOC bit; oth- erwise, that bit will be off. See Part 2 for more information. .line This section holds line number information for symbolic debugging, which describes the correspondence between the source program and the machine code. The contents are unspecified. 1-14 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format .note This section holds information in the format that ‘‘Note Section’’ in Part 2 describes. .plt This section holds the procedure linkage table. See ‘‘Special Sections’’ in Part 1 and ‘‘Pro- cedure Linkage Table’’ in Part 2 for more information. .relname and .relaname These sections hold relocation information, as ‘‘Relocation’’ below describes. If the file has a loadable segment that includes relocation, the sections’ attributes will include the SHF_ALLOC bit; otherwise, that bit will be off. Conventionally, name is supplied by the section to which the relocations apply. Thus a relocation section for .text normally would have the name .rel.text or .rela.text . .rodata and .rodata1 These sections hold read-only data that typically contribute to a non-writable segment in the process image. See ‘‘Program Header’’ in Part 2 for more information. .shstrtab This section holds section names. .strtab This section holds strings, most commonly the strings that represent the names associated with symbol table entries. If the file has a loadable segment that includes the symbol string table, the section’s attributes will include the SHF_ALLOC bit; otherwise, that bit will be off. .symtab This section holds a symbol table, as ‘‘Symbol Table’’ in this section describes. If the file has a loadable segment that includes the symbol table, the section’s attributes will include the SHF_ALLOC bit; otherwise, that bit will be off. .text This section holds the ‘‘text,’’ or executable instructions, of a program. Section names with a dot (.) prefix are reserved for the system, although applications may use these sec- tions if their existing meanings are satisfactory. Applications may use names without the prefix to avoid conflicts with system sections. The object file format lets one define sections not in the list above. An object file may have more than one section with the same name. Section names reserved for a processor architecture are formed by placing an abbreviation of the architec- ture name ahead of the section name. The name should be taken from the architecture names used for e_machine. For instance .FOO.psect is the psect section defined by the FOO architecture. Existing extensions are called by their historical names. Pre-existing Extensions_ _______________________ .sdata .tdesc .sbss .lit4 .lit8 .reginfo .gptab .liblist .conflict Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-15 String Table String table sections hold null-terminated character sequences, commonly called strings. The object file uses these strings to represent symbol and section names. One references a string as an index into the string table section. The first byte, which is index zero, is defined to hold a null character. Likewise, a string table’s last byte is defined to hold a null character, ensuring null termination for all strings. A string whose index is zero specifies either no name or a null name, depending on the context. An empty string table section is permitted; its section header’s sh_size member would contain zero. Non-zero indexes are invalid for an empty string table. A section header’s sh_name member holds an index into the section header string table section, as desig- nated by the e_shstrndx member of the ELF header. The following figures show a string table with 25 bytes and the strings associated with various indexes. Index + 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9______________________________________________________ 0 \0 n a m e . \0 V a r______________________________________________________ 10 i a b l e \0 a b l e______________________________________________________ 20 \0 \0 x x \0______________________________________________________                                             Figure 1-15: String Table Indexes Index String_ _________________ 0 none 1 name. 7 Variable 11 able 16 able 24 null string_ _________________        As the example shows, a string table index may refer to any byte in the section. A string may appear more than once; references to substrings may exist; and a single string may be referenced multiple times. Unreferenced strings also are allowed. 1-16 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) Symbol Table An object file’s symbol table holds information needed to locate and relocate a program’s symbolic definitions and references. A symbol table index is a subscript into this array. Index 0 both designates the first entry in the table and serves as the undefined symbol index. The contents of the initial entry are specified later in this section. Name Value___________________ STN_UNDEF 0___________________   A symbol table entry has the following format. Figure 1-16: Symbol Table Entry t y p e d e f s t r u c t { E l f 3 2 _ W o r d s t _ n a m e ; E l f 3 2 _ A d d r s t _ v a l u e ; E l f 3 2 _ W o r d s t _ s i z e ; u n s i g n e d c h a r s t _ i n f o ; u n s i g n e d c h a r s t _ o t h e r ; E l f 3 2 _ H a l f s t _ s h n d x ; } E l f 3 2 _ S y m ; st_name This member holds an index into the object file’s symbol string table, which holds the character representations of the symbol names. If the value is non-zero, it represents a string table index that gives the symbol name. Otherwise, the symbol table entry has no name. NOTE External C symbols have the same names in C and object files’ symbol tables. st_value This member gives the value of the associated symbol. Depending on the context, this may be an absolute value, an address, etc.; details appear below. st_size Many symbols have associated sizes. For example, a data object’s size is the number of bytes contained in the object. This member holds 0 if the symbol has no size or an unknown size. st_info This member specifies the symbol’s type and binding attributes. A list of the values and meanings appears below. The following code shows how to manipulate the values. # d e f i n e E L F 3 2 _ S T _ B I N D ( i ) ( ( i ) > > 4 ) # d e f i n e E L F 3 2 _ S T _ T Y P E ( i ) ( ( i ) & 0 x f ) # d e f i n e E L F 3 2 _ S T _ I N F O ( b , t ) ( ( ( b ) < < 4 ) + ( ( t ) & 0 x f ) ) Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-17 ELF: Executable and Linkable Format st_other This member currently holds 0 and has no defined meaning. st_shndx Every symbol table entry is ‘‘defined’’ in relation to some section; this member holds the relevant section header table index. As Figure 1-7 and the related text describe, some section indexes indicate special meanings. A symbol’s binding determines the linkage visibility and behavior. Figure 1-17: Symbol Binding, ELF32_ST_BIND Name Value_ ____________________ STB_LOCAL 0 STB_GLOBAL 1 STB_WEAK 2 STB_LOPROC 13 STB_HIPROC 15_ ____________________       STB_LOCAL Local symbols are not visible outside the object file containing their definition. Local symbols of the same name may exist in multiple files without interfering with each other. STB_GLOBAL Global symbols are visible to all object files being combined. One file’s definition of a global symbol will satisfy another file’s undefined reference to the same global symbol. STB_WEAK Weak symbols resemble global symbols, but their definitions have lower precedence. STB_LOPROC through STB_HIPROC Values in this inclusive range are reserved for processor-specific semantics. Global and weak symbols differ in two major ways. When the link editor combines several relocatable object files, it does not allow multiple definitions of STB_GLOBAL symbols with the same name. On the other hand, if a defined global symbol exists, the appearance of a weak symbol with the same name will not cause an error. The link edi- tor honors the global definition and ignores the weak ones. Similarly, if a common symbol exists (i.e., a symbol whose st_shndx field holds SHN_COMMON), the appearance of a weak symbol with the same name will not cause an error. The link editor honors the common definition and ignores the weak ones. When the link editor searches archive libraries, it extracts archive members that contain definitions of undefined global symbols. The member’s definition may be either a global or a weak symbol. The link editor does not extract archive members to resolve undefined weak symbols. Unresolved weak symbols have a zero value. In each symbol table, all symbols with STB_LOCAL binding precede the weak and global symbols. As ‘‘Sections’’ above describes, a symbol table section’s sh_info section header member holds the symbol table index for the first non-local symbol. 1-18 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format A symbol’s type provides a general classification for the associated entity. Figure 1-18: Symbol Types, ELF32_ST_TYPE Name Value_ _____________________ STT_NOTYPE 0 STT_OBJECT 1 STT_FUNC 2 STT_SECTION 3 STT_FILE 4 STT_LOPROC 13 STT_HIPROC 15_ _____________________          STT_NOTYPE The symbol’s type is not specified. STT_OBJECT The symbol is associated with a data object, such as a variable, an array, etc. STT_FUNC The symbol is associated with a function or other executable code. STT_SECTION The symbol is associated with a section. Symbol table entries of this type exist pri- marily for relocation and normally have STB_LOCAL binding. STT_FILE Conventionally, the symbol’s name gives the name of the source file associated with the object file. A file symbol has STB_LOCAL binding, its section index is SHN_ABS , and it precedes the other STB_LOCAL symbols for the file, if it is present. STT_LOPROC through STT_HIPROC Values in this inclusive range are reserved for processor-specific semantics. Function symbols (those with type STT_FUNC) in shared object files have special significance. When another object file references a function from a shared object, the link editor automatically creates a pro- cedure linkage table entry for the referenced symbol. Shared object symbols with types other than STT_FUNC will not be referenced automatically through the procedure linkage table. If a symbol’s value refers to a specific location within a section, its section index member, st_shndx , holds an index into the section header table. As the section moves during relocation, the symbol’s value changes as well, and references to the symbol continue to ‘‘point’’ to the same location in the program. Some special section index values give other semantics. SHN_ABS The symbol has an absolute value that will not change because of relocation. SHN_COMMON The symbol labels a common block that has not yet been allocated. The symbol’s value gives alignment constraints, similar to a section’s sh_addralign member. That is, the link editor will allocate the storage for the symbol at an address that is a multiple of st_value . The symbol’s size tells how many bytes are required. SHN_UNDEF This section table index means the symbol is undefined. When the link editor combines this object file with another that defines the indicated symbol, this file’s references to the symbol will be linked to the actual definition. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-19 ELF: Executable and Linkable Format As mentioned above, the symbol table entry for index 0 (STN_UNDEF) is reserved; it holds the following. Figure 1-19: Symbol Table Entry: Index 0 Name Value Note_ ______________________________________________ st_name 0 No name st_value 0 Zero value st_size 0 No size st_info 0 No type, local binding st_other 0 st_shndx SHN_UNDEF No section_ ______________________________________________                 Symbol Values Symbol table entries for different object file types have slightly different interpretations for the st_value member. In relocatable files, st_value holds alignment constraints for a symbol whose section index is SHN_COMMON . In relocatable files, st_value holds a section offset for a defined symbol. That is, st_value is an offset from the beginning of the section that st_shndx identifies. In executable and shared object files, st_value holds a virtual address. To make these files’ sym- bols more useful for the dynamic linker, the section offset (file interpretation) gives way to a virtual address (memory interpretation) for which the section number is irrelevant. Although the symbol table values have similar meanings for different object files, the data allow efficient access by the appropriate programs. 1-20 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) Relocation Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. In other words, relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right informa- tion for a process’s program image. Relocation entries are these data. Figure 1-20: Relocation Entries t y p e d e f s t r u c t { E l f 3 2 _ A d d r r _ o f f s e t ; E l f 3 2 _ W o r d r _ i n f o ; } E l f 3 2 _ R e l ; t y p e d e f s t r u c t { E l f 3 2 _ A d d r r _ o f f s e t ; E l f 3 2 _ W o r d r _ i n f o ; E l f 3 2 _ S w o r d r _ a d d e n d ; } E l f 3 2 _ R e l a ; r_offset This member gives the location at which to apply the relocation action. For a relocatable file, the value is the byte offset from the beginning of the section to the storage unit affected by the relocation. For an executable file or a shared object, the value is the virtual address of the storage unit affected by the relocation. r_info This member gives both the symbol table index with respect to which the relocation must be made, and the type of relocation to apply. For example, a call instruction’s relocation entry would hold the symbol table index of the function being called. If the index is STN_UNDEF , the undefined symbol index, the relocation uses 0 as the ‘‘symbol value.’’ Relocation types are processor-specific. When the text refers to a relocation entry’s relocation type or symbol table index, it means the result of applying ELF32_R_TYPE or ELF32_R_SYM , respectively, to the entry’s r_info member. # d e f i n e E L F 3 2 _ R _ S Y M ( i ) ( ( i ) > > 8 ) # d e f i n e E L F 3 2 _ R _ T Y P E ( i ) ( ( u n s i g n e d c h a r ) ( i ) ) # d e f i n e E L F 3 2 _ R _ I N F O ( s , t ) ( ( ( s ) < < 8 ) + ( u n s i g n e d c h a r ) ( t ) ) r_addend This member specifies a constant addend used to compute the value to be stored into the relocatable field. As shown above, only Elf32_Rela entries contain an explicit addend. Entries of type Elf32_Rel store an implicit addend in the location to be modified. Depending on the processor architecture, one form or the other might be necessary or more convenient. Consequently, an implementation for a particular machine may use one form exclusively or either form depending on context. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-21 ELF: Executable and Linkable Format A relocation section references two other sections: a symbol table and a section to modify. The section header’s sh_info and sh_link members, described in ‘‘Sections’’ above, specify these relationships. Relocation entries for different object files have slightly different interpretations for the r_offset member. In relocatable files, r_offset holds a section offset. That is, the relocation section itself describes how to modify another section in the file; relocation offsets designate a storage unit within the second section. In executable and shared object files, r_offset holds a virtual address. To make these files’ relo- cation entries more useful for the dynamic linker, the section offset (file interpretation) gives way to a virtual address (memory interpretation). Although the interpretation of r_offset changes for different object files to allow efficient access by the relevant programs, the relocation types’ meanings stay the same. Relocation Types Relocation entries describe how to alter the following instruction and data fields (bit numbers appear in the lower box corners). Figure 1-21: Relocatable Fields word32 31 0 word32 This specifies a 32-bit field occupying 4 bytes with arbitrary byte alignment. These values use the same byte order as other word values in the 32-bit Intel Architecture. 01 3 31 02 2 03 1 04 0 0 0x01020304 Calculations below assume the actions are transforming a relocatable file into either an executable or a shared object file. Conceptually, the link editor merges one or more relocatable files to form the output. It first decides how to combine and locate the input files, then updates the symbol values, and finally per- forms the relocation. Relocations applied to executable or shared object files are similar and accomplish the same result. Descriptions below use the following notation. A This means the addend used to compute the value of the relocatable field. B This means the base address at which a shared object has been loaded into memory during execution. Generally, a shared object file is built with a 0 base virtual address, but the execu- tion address will be different. 1-22 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format G This means the offset into the global offset table at which the address of the relocation entry’s symbol will reside during execution. See ‘‘Global Offset Table’’ in Part 2 for more informa- tion. GOT This means the address of the global offset table. See ‘‘Global Offset Table’’ in Part 2 for more information. L This means the place (section offset or address) of the procedure linkage table entry for a sym- bol. A procedure linkage table entry redirects a function call to the proper destination. The link editor builds the initial procedure linkage table, and the dynamic linker modifies the entries during execution. See ‘‘Procedure Linkage Table’’ in Part 2 for more information. P This means the place (section offset or address) of the storage unit being relocated (computed using r_offset). S This means the value of the symbol whose index resides in the relocation entry. A relocation entry’s r_offset value designates the offset or virtual address of the first byte of the affected storage unit. The relocation type specifies which bits to change and how to calculate their values. The SYSTEM V architecture uses only Elf32_Rel relocation entries, the field to be relocated holds the addend. In all cases, the addend and the computed result use the same byte order. Figure 1-22: Relocation Types Name Value Field Calculation_ __________________________________________________ R_386_NONE 0 none none R_386_32 1 word32 S + A R_386_PC32 2 word32 S + A - P R_386_GOT32 3 word32 G + A - P R_386_PLT32 4 word32 L + A - P R_386_COPY 5 none none R_386_GLOB_DAT 6 word32 S R_386_JMP_SLOT 7 word32 S R_386_RELATIVE 8 word32 B + A R_386_GOTOFF 9 word32 S + A - GOT R_386_GOTPC 10 word32 GOT + A - P_ __________________________________________________                                           Some relocation types have semantics beyond simple calculation. R_386_GOT32 This relocation type computes the distance from the base of the global offset table to the symbol’s global offset table entry. It additionally instructs the link editor to build a global offset table. R_386_PLT32 This relocation type computes the address of the symbol’s procedure linkage table entry and additionally instructs the link editor to build a procedure linkage table. R_386_COPY The link editor creates this relocation type for dynamic linking. Its offset member refers to a location in a writable segment. The symbol table index specifies a symbol that should exist both in the current object file and in a shared object. During execution, the dynamic linker copies data associated with the shared object’s symbol to the location specified by the offset. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 1-23 ELF: Executable and Linkable Format R_386_GLOB_DAT This relocation type is used to set a global offset table entry to the address of the specified symbol. The special relocation type allows one to determine the correspondence between symbols and global offset table entries. R_3862_JMP_SLOT The link editor creates this relocation type for dynamic linking. Its offset member gives the location of a procedure linkage table entry. The dynamic linker modifies the procedure linkage table entry to transfer control to the desig- nated symbol’s address [see ‘‘Procedure Linkage Table’’ in Part 2]. R_386_RELATIVE The link editor creates this relocation type for dynamic linking. Its offset member gives a location within a shared object that contains a value represent- ing a relative address. The dynamic linker computes the corresponding virtual address by adding the virtual address at which the shared object was loaded to the relative address. Relocation entries for this type must specify 0 for the sym- bol table index. R_386_GOTOFF This relocation type computes the difference between a symbol’s value and the address of the global offset table. It additionally instructs the link editor to build the global offset table. R_386_GOTPC This relocation type resembles R_386_PC32 , except it uses the address of the global offset table in its calculation. The symbol referenced in this relocation normally is _GLOBAL_OFFSET_TABLE_ , which additionally instructs the link editor to build the global offset table. 1-24 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) 2 PROGRAM LOADING AND DYNAMIC LINKING Introduction 2-1 Program Header 2-2 Base Address 2-4 Note Section 2-4 Program Loading 2-7 Dynamic Linking 2-10 Program Interpreter 2-10 Dynamic Linker 2-10 Dynamic Section 2-11 Shared Object Dependencies 2-15 Global Offset Table 2-16 Procedure Linkage Table 2-17 Hash Table 2-19 Initialization and Termination Functions 2-20 Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 i Introduction Part 2 describes the object file information and system actions that create running programs. Some infor- mation here applies to all systems; other information is processor-specific. Executable and shared object files statically represent programs. To execute such programs, the system uses the files to create dynamic program representations, or process images. A process image has seg- ments that hold its text, data, stack, and so on. The major sections in this part discuss the following. Program header. This section complements Part 1, describing object file structures that relate directly to program execution. The primary data structure, a program header table, locates segment images within the file and contains other information necessary to create the memory image for the pro- gram. Program loading. Given an object file, the system must load it into memory for the program to run. Dynamic linking. After the system loads the program, it must complete the process image by resolv- ing symbolic references among the object files that compose the process. NOTE There are naming conventions for ELF constants that have specified processor ranges. Names such as DT_, PT_, for processor-specific extensions, incorporate the name of the processor: DT_M32_SPECIAL, for example. Pre–existing processor extensions not using this convention will be supported. Pre-existing Extensions_ ____________________ D T _ J M P _ R E L Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 2-1 Program Header An executable or shared object file’s program header table is an array of structures, each describing a seg- ment or other information the system needs to prepare the program for execution. An object file segment contains one or more sections , as ‘‘Segment Contents’’ describes below. Program headers are meaningful only for executable and shared object files. A file specifies its own program header size with the ELF header’s e _ p h e n t s i z e and e _ p h n u m members [see ‘‘ELF Header’’ in Part 1]. Figure 2-1: Program Header t y p e d e f s t r u c t { E l f 3 2 _ W o r d p _ t y p e ; E l f 3 2 _ O f f p _ o f f s e t ; E l f 3 2 _ A d d r p _ v a d d r ; E l f 3 2 _ A d d r p _ p a d d r ; E l f 3 2 _ W o r d p _ f i l e s z ; E l f 3 2 _ W o r d p _ m e m s z ; E l f 3 2 _ W o r d p _ f l a g s ; E l f 3 2 _ W o r d p _ a l i g n ; } E l f 3 2 _ P h d r ; p_type This member tells what kind of segment this array element describes or how to interpret the array element’s information. Type values and their meanings appear below. p_offset This member gives the offset from the beginning of the file at which the first byte of the segment resides. p_vaddr This member gives the virtual address at which the first byte of the segment resides in memory. p_paddr On systems for which physical addressing is relevant, this member is reserved for the segment’s physical address. Because System V ignores physical addressing for applica- tion programs, this member has unspecified contents for executable files and shared objects. p_filesz This member gives the number of bytes in the file image of the segment; it may be zero. p_memsz This member gives the number of bytes in the memory image of the segment; it may be zero. p_flags This member gives flags relevant to the segment. Defined flag values appear below. p_align As ‘‘Program Loading’’ later in this part describes, loadable process segments must have congruent values for p_vaddr and p_offset , modulo the page size. This member gives the value to which the segments are aligned in memory and in the file. Values 0 and 1 mean no alignment is required. Otherwise, p_align should be a positive, integral power of 2, and p_vaddr should equal p_offset , modulo p_align . Some entries describe process segments; others give supplementary information and do not contribute to the process image. Segment entries may appear in any order, except as explicitly noted below. Defined type values follow; other values are reserved for future use. 2-2 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format Figure 2-2: Segment Types, p_type Name Value___________________________ PT_NULL 0 PT_LOAD 1 PT_DYNAMIC 2 PT_INTERP 3 PT_NOTE 4 PT_SHLIB 5 PT_PHDR 6 PT_LOPROC 0x70000000 PT_HIPROC 0x7fffffff___________________________            PT_NULL The array element is unused; other members’ values are undefined. This type lets the program header table have ignored entries. PT_LOAD The array element specifies a loadable segment, described by p_filesz and p_memsz . The bytes from the file are mapped to the beginning of the memory segment. If the segment’s memory size (p_memsz) is larger than the file size (p_filesz), the ‘‘extra’’ bytes are defined to hold the value 0 and to follow the segment’s initialized area. The file size may not be larger than the memory size. Loadable segment entries in the program header table appear in ascending order, sorted on the p_vaddr member. PT_DYNAMIC The array element specifies dynamic linking information. See ‘‘Dynamic Section’’ below for more information. PT_INTERP The array element specifies the location and size of a null-terminated path name to invoke as an interpreter. This segment type is meaningful only for executable files (though it may occur for shared objects); it may not occur more than once in a file. If it is present, it must precede any loadable segment entry. See ‘‘Program Interpreter’’ below for further information. PT_NOTE The array element specifies the location and size of auxiliary information. See ‘‘Note Sec- tion’’ below for details. PT_SHLIB This segment type is reserved but has unspecified semantics. Programs that contain an array element of this type do not conform to the ABI. PT_PHDR The array element, if present, specifies the location and size of the program header table itself, both in the file and in the memory image of the program. This segment type may not occur more than once in a file. Moreover, it may occur only if the program header table is part of the memory image of the program. If it is present, it must precede any loadable segment entry. See ‘‘Program Interpreter’’ below for further information. PT_LOPROC through PT_HIPROC Values in this inclusive range are reserved for processor-specific semantics. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 2-3 ELF: Executable and Linkable Format NOTE Unless specifically required elsewhere, all program header segment types are optional. That is, a file’s program header table may contain only those elements relevant to its contents. Base Address Executable and shared object files have a base address , which is the lowest virtual address associated with the memory image of the program’s object file. One use of the base address is to relocate the memory image of the program during dynamic linking. An executable or shared object file’s base address is calculated during execution from three values: the memory load address, the maximum page size, and the lowest virtual address of a program’s loadable segment. As ‘‘Program Loading’’ in this chapter describes, the virtual addresses in the program headers might not represent the actual vir- tual addresses of the program’s memory image. To compute the base address, one determines the memory address associated with the lowest p_vaddr value for a PT_LOAD segment. One then obtains the base address by truncating the memory address to the nearest multiple of the maximum page size. Depending on the kind of file being loaded into memory, the memory address might or might not match the p_vaddr values. As ‘‘Sections’’ in Part 1 describes, the .bss section has the type SHT_NOBITS . Although it occupies no space in the file, it contributes to the segment’s memory image. Normally, these uninitialized data reside at the end of the segment, thereby making p_memsz larger than p_filesz in the associated program header element. Note Section Sometimes a vendor or system builder needs to mark an object file with special information that other programs will check for conformance, compatibility, etc. Sections of type SHT_NOTE and program header elements of type PT_NOTE can be used for this purpose. The note information in sections and program header elements holds any number of entries, each of which is an array of 4-byte words in the format of the target processor. Labels appear below to help explain note information organization, but they are not part of the specification. Figure 2-3: Note Information _ _________ namesz_ _________ descsz_ _________ type_ _________ name . . . _ _________ desc . . . _ _________                   2-4 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format namesz and name The first namesz bytes in name contain a null-terminated character representation of the entry’s owner or originator. There is no formal mechanism for avoiding name conflicts. By convention, vendors use their own name, such as ‘‘XYZ Computer Company,’’ as the identifier. If no name is present, namesz contains 0. Padding is present, if necessary, to ensure 4-byte alignment for the descriptor. Such padding is not included in namesz . descsz and desc The first descsz bytes in desc hold the note descriptor. The ABI places no constraints on a descriptor’s contents. If no descriptor is present, descsz contains 0. Padding is present, if necessary, to ensure 4-byte alignment for the next note entry. Such padding is not included in descsz . type This word gives the interpretation of the descriptor. Each originator controls its own types; multiple interpretations of a single type value may exist. Thus, a program must recognize both the name and the type to ‘‘understand’’ a descriptor. Types currently must be non- negative. The ABI does not define what descriptors mean. To illustrate, the following note segment holds two entries. Figure 2-4: Example Note Segment +0 +1 +2 +3_ _____________________ namesz 7_ _____________________ descsz 0 No descriptor_ _____________________ type 1_ _____________________ name X Y Z_ _____________________ C    o    \0    pad_ ______________________ _____________________ namesz 7_ _____________________ descsz 8_ _____________________ type 3_ _____________________ name X Y Z_ _____________________ C    o    \0    pad_ _____________________ desc word 0_ _____________________ word 1_ _____________________                                   NOTE The system reserves note information with no name (namesz= =0) and with a zero-length name (name[0]= =’\0’) but currently defines no types. All other names must have at least one non-null character. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 2-5 ELF: Executable and Linkable Format NOTE Note information is optional. The presence of note information does not affect a program’s ABI confor- mance, provided the information does not affect the program’s execution behavior. Otherwise, the pro- gram does not conform to the ABI and has undefined behavior. 2-6 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) Program Loading As the system creates or augments a process image, it logically copies a file’s segment to a virtual memory segment. When—and if—the system physically reads the file depends on the program’s execu- tion behavior, system load, etc. A process does not require a physical page unless it references the logical page during execution, and processes commonly leave many pages unreferenced. Therefore delaying physical reads frequently obviates them, improving system performance. To obtain this efficiency in practice, executable and shared object files must have segment images whose file offsets and virtual addresses are congruent, modulo the page size. Virtual addresses and file offsets for the SYSTEM V architecture segments are congruent modulo 4 KB (0x1000) or larger powers of 2. Because 4 KB is the maximum page size, the files will be suitable for pag- ing regardless of physical page size. Figure 2-5: Executable File File Offset File Virtual Address_ ___________________ 0 ELF header_ ___________________ Program header table _ ___________________ Other information_ ___________________ 0x100 Text segment 0x8048100 . . . 0x2be00 bytes 0x8073eff_ ___________________ 0x2bf00 Data segment 0x8074f00 . . . 0x4e00 bytes 0x8079cff_ ___________________ 0x30d00 Other information . . . _ ___________________                             Figure 2-6: Program Header Segments Member Text Data_ _____________________________________________ p_type PT_LOAD PT_LOAD p_offset 0x100 0x2bf00 p_vaddr 0x8048100 0x8074f00 p_paddr unspecified unspecified p_filesz 0x2be00 0x4e00 p_memsz 0x2be00 0x5e24 p_flags PF_R + PF_X PF_R + PF_W + PF_X p_align 0x1000 0x1000_ _____________________________________________                     Although the example’s file offsets and virtual addresses are congruent modulo 4 KB for both text and data, up to four file pages hold impure text or data (depending on page size and file system block size). The first text page contains the ELF header, the program header table, and other information. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 2-7 ELF: Executable and Linkable Format The last text page holds a copy of the beginning of data. The first data page has a copy of the end of text. The last data page may contain file information not relevant to the running process. Logically, the system enforces the memory permissions as if each segment were complete and separate; segments’ addresses are adjusted to ensure each logical page in the address space has a single set of per- missions. In the example above, the region of the file holding the end of text and the beginning of data will be mapped twice: at one virtual address for text and at a different virtual address for data. The end of the data segment requires special handling for uninitialized data, which the system defines to begin with zero values. Thus if a file’s last data page includes information not in the logical memory page, the extraneous data must be set to zero, not the unknown contents of the executable file. ‘‘Impuri- ties’’ in the other three pages are not logically part of the process image; whether the system expunges them is unspecified. The memory image for this program follows, assuming 4 KB (0x1000) pages. Figure 2-7: Process Image Segments Virtual Address Contents Segment_ ___________________ 0x8048000 Header padding 0x100 bytes_ ___________________ 0x8048100 Text Text segment . . . 0x2be00 bytes_ ___________________ 0x8073f00 Data padding 0x100 bytes           _ ___________________           _ ___________________ 0x8074000 Text padding 0xf00 bytes_ ___________________ 0x8074f00 Data Data segment . . . 0x4e00 bytes_ ___________________ 0x8079d00 Uninitialized data 0x1024 zero bytes_ ___________________ 0x807ad24 Page padding 0x2dc zero bytes_ ___________________                             One aspect of segment loading differs between executable files and shared objects. Executable file seg- ments typically contain absolute code. To let the process execute correctly, the segments must reside at the virtual addresses used to build the executable file. Thus the system uses the p_vaddr values unchanged as virtual addresses. 2-8 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format On the other hand, shared object segments typically contain position-independent code. This lets a segment’s virtual address change from one process to another, without invalidating execution behavior. Though the system chooses virtual addresses for individual processes, it maintains the segments’ relative positions . Because position-independent code uses relative addressing between segments, the difference between virtual addresses in memory must match the difference between virtual addresses in the file. The following table shows possible shared object virtual address assignments for several processes, illus- trating constant relative positioning. The table also illustrates the base address computations. Figure 2-8: Example Shared Object Segment Addresses Sourc Text Data Base Address_ _____________________________________________________ File 0x200 0x2a400 0x0 Process 1 0x80000200 0x8002a400 0x80000000 Process 2 0x80081200 0x800ab400 0x80081000 Process 3 0x900c0200 0x900ea400 0x900c0000 Process 4 0x900c6200 0x900f0400 0x900c6000_ _____________________________________________________                   Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1 2-9 Dynamic Linking Program Interpreter An executable file may have one PT_INTERP program header element. During exec(BA_OS), the sys- tem retrieves a path name from the PT_INTERP segment and creates the initial process image from the interpreter file’s segments. That is, instead of using the original executable file’s segment images, the sys- tem composes a memory image for the interpreter. It then is the interpreter’s responsibility to receive control from the system and provide an environment for the application program. The interpreter receives control in one of two ways. First, it may receive a file descriptor to read the exe- cutable file, positioned at the beginning. It can use this file descriptor to read and/or map the executable file’s segments into memory. Second, depending on the executable file format, the system may load the executable file into memory instead of giving the interpreter an open file descriptor. With the possible exception of the file descriptor, the interpreter’s initial process state matches what the executable file would have received. The interpreter itself may not require a second interpreter. An interpreter may be either a shared object or an executable file. A shared object (the normal case) is loaded as position-independent, with addresses that may vary from one process to another; the system creates its segments in the dynamic segment area used by mmap(KE_OS) and related services. Consequently, a shared object interpreter typically will not conflict with the original executable file’s original segment addresses. An executable file is loaded at fixed addresses; the system creates its segments using the virtual addresses from the program header table. Consequently, an executable file interpreter’s virtual addresses may collide with the first executable file; the interpreter is responsible for resolving conflicts. Dynamic Linker When building an executable file that uses dynamic linking, the link editor adds a program header ele- ment of type PT_INTERP to an executable file, telling the system to invoke the dynamic linker as the pro- gram interpreter. NOTE The locations of the system provided dynamic linkers are processor–specific. Exec(BA_OS) and the dynamic linker cooperate to create the process image for the program, which entails the following actions: Adding the executable file’s memory segments to the process image; Adding shared object memory segments to the process image; Performing relocations for the executable file and its shared objects; Closing the file descriptor that was used to read the executable file, if one was given to the dynamic linker; Transferring control to the program, making it look as if the program had received control directly from exec(BA_OS). 2-10 Portable Formats Specification, Version 1.1 Tool Interface Standards (TIS) ELF: Executable and Linkable Format The link editor also constructs various data that assist the dynamic linker for executable and shared object files. As shown above in ‘‘Program Header,’’ these data reside in loadable segments, making them avail- able during execution. (Once again, recall the exact segment contents are processor-specific. See the pro- cessor supplement for complete information.) A .dynamic section with type SHT_DYNAMIC holds various data. The structure residing at the beginning of the section holds the addresses of other dynamic linking information. The .hash section with type SHT_HASH holds a symbol hash table. The .got and .plt sections with type SHT_PROGBITS hold two separate tables: the global offset table and the procedure linkage table. Sections below explain how the dynamic linker uses and changes the tables to create memory images for object files. Because every ABI-conforming program imports the basic system services from a shared object library, the dynamic linker participates in every ABI-conforming program execution. As ‘‘Program Loading’’ explains in the processor supplement, shared objects may occupy virtual memory addresses that are different from the addresses recorded in the file’s program header table. The dynamic linker relocates the memory image, updating absolute addresses before the application gains control. Although the absolute address values would be correct if the library were loaded at the addresses specified in the program header table, this normally is not the case. If the process environment [see exec(BA_OS)] contains a variable named LD_BIND_NOW with a non-null value, the dynamic linker processes all relocation before transferring control to the program. For exam- ple, all the following environment entries would specify this behavior. LD_BIND_NOW=1 LD_BIND_NOW=on LD_BIND_NOW=off Otherwise, LD_BIND_NOW either does not occur in the environment or has a null value. The dynamic linker is permitted to evaluate procedure link

Related Questions

Similar orders to Write boot-block and a create-image utility with precode
3
Views
0
Answers
Comp Sci: Python Caesar Cipher Script
here is the full assignment description: Project 1 Assignment Overview The Caesar cipher is named after Julius Caesar who used this type of encryption to keep his military communications secret. A Caesar cipher replaces each plaintext letter with one ...
11
Views
0
Answers
Assignment 1: The “Meepo is You” Game
- Starter Code is included Assignment 1: The “Meepo is You” Game Ms. Meepo, a high school teacher, likes the game “Baba is You” a lot. In this game, rules are blocks you can interact with. By manipulating these blocks, you can...
12
Views
0
Answers
Data Structures and Algorithm
needs to follow rubric...
28
Views
0
Answers
Python Project containing 4 tasks
2 Python files. Everything is in the pdf in detail...