盒子
盒子
文章目录
  1. 16位保护模式
  2. 32位保护模式
  3. Exercise 3
    1. What is ELF and three types of ELF
    2. ELF File Format
  4. Loading the Kernel
    1. Exercise 5.
    2. Exercise 6.

Mit6.828 Lab1: Part2 Jos Bootstrap -- bootloader

对于PC来说,软盘,硬盘都可以被划分为一个个大小为512字节的区域,叫做扇区。一个扇区是一次磁盘操作的最小粒度。每一次读取或者写入操作都必须是一个或多个扇区。如果一个磁盘是可以被用来启动操作系统的,就把这个磁盘的第一个扇区叫做启动扇区。当BIOS找到一个可以启动的软盘或硬盘后,BIOS从硬盘的启动扇区中将boot loader加载到物理地址0x7c00到0x7dff之间,然后使用jmp指令分别设置寄存器CS:IP0000:7c00,将控制权转交给boot loader。

boot loader由一个汇编文件boot/boot.S以及C文件boot/main.c构成。boot loader主要执行两大功能:

  • boot loader将处理器从实模式切换到32位的保护模式,只有在保护模式中才可以寻址1MB以上的物理地址空间(实模式下的寻址方式)。
  • boot loader从硬盘中读取内核到内存。

obj/boot/boot.asm是boot loader的反汇编,反汇编文件帮助看清楚boot loader的代码在内存中的位置, obj/kern/kernel.asm是内核的反汇编文件。

16位保护模式

In the 80286’s 16-bit protected mode, selector values are interpreted completely differently than in real mode. In real mode, a selector value is a paragraph number of physical memory. In protected mode, a selector value is an index into a descriptor table. In both modes, programs are divided into segments. In real mode, these segments are at fixed positions in physical memory and the selector value denotes the paragraph number of the beginning of the segment. In protected mode, the segments are not at fixed positions in physical memory. In fact, they do not have to be in memory at all!

保护模式使用了虚拟内存技术(virtual memory),只在内存中保留正在使用的数据和代码,其他的数据和代码临时地保存在硬盘中,直到下次使用它们。在16位的保护模式中,段在内存和硬盘之间移动,当一个段从硬盘返回到内存中,很有可能这个段就不是在原来的物理地址。

32位保护模式

80386引入了32位保护模式,与16位保护模式的两点主要的区别:

  • 偏移扩展到32位,允许段的大小扩展到4 gigabytes。
  • 段可以划分成4KB大小的页的基本访问单元,现在虚拟内存系统中使用置换页而不是置换段,也就是任意时刻只有段的一部分在内存中,在286的16位模式中,要么整个段在内存中,要么置换到硬盘中,但是286这种置换段的方式显然不适合32位保护模式。

Exercise 3

在地址0x7c00处设置断点,这是boot sector被加载的位置。然后让程序继续运行直到这个断点。跟踪/boot/boot.S文件的每一条指令,同时使用boot.S文件和系统为你反汇编出来的文件obj/boot/boot.asm。你也可以使用GDB的x/i指令来获取去任意一个机器指令的反汇编指令,把源文件boot.S文件和boot.asm文件以及在GDB反汇编出来的指令进行比较。
  追踪到bootmain函数中,而且还要具体追踪到readsect()子函数里面。找出和readsect()c语言程序的每一条语句所对应的汇编指令,回到bootmain(),然后找出把内核文件从磁盘读取到内存的那个for循环所对应的汇编语句。找出当循环结束后会执行哪条语句,在那里设置断点,继续运行到断点,然后运行完所有的剩下的语句。 回答以下问题:

  1. 进入32位保护模式的时机

step1
BIOS要从实模式跳到保护模式,先设置GDTR以及GITR寄存器,由于跳转到保护模式后,内存的寻址方式采用分段机制的寻址方式。因此需要在进入保护模式之前准备好全局描述符表。然后设置CR0寄存器,设置CR0寄存器中的PE位然后执行一条ljmp指令进入保护模式。

  1. boot loader最后执行的指令,内核加载第一条执行的指令?
    可以查看boot loader的反汇编文件obj/boot/boot.asm,在执行完从磁盘读取内核的操作之后,通过调用函数指针,执行进入内核的操作:
    1
    2
    ((void (*)(void)) (ELFHDR->e_entry))();
    7d6b: ff 15 18 00 01 00 call *0x10018

因此这条语句是boot loader最后执行的语句,在地址0x7d6b处设置断点:

1
2
3
4
5
6
7
8
9
(gdb) b *0x7d6b
Breakpoint 2 at 0x7d6b
(gdb) c
Continuing.
=> 0x7d6b: call *0x10018
Breakpoint 2, 0x00007d6b in ?? ()
(gdb) si
=> 0x10000c: movw $0x1234,0x472
0x0010000c in ?? ()

可以看见对应boot loader执行的最后一条指令是call *0x10018。内核加载执行的第一条指令是movw $0x1234,0x472

  1. 内核的第一条指令在地址0x10000c处执行。

What is ELF and three types of ELF

ELF (Executable and Linking Format), the object file format, There are three main types of object files.

  • A relocatable file holds code and data suitable for linking with other object files to create an
    executable or a shared object file
    .
  • An executable file holds a program suitable for execution
  • A shared object file holds code and data suitable for linking in two contexts. First, the link
    editor may process it with other relocatable and shared object files to create another object file
    .
    Second, the dynamic linker combines it with an executable file and other shared objects to
    create a process image
    .

Created by the assembler and link editor, object files are binary representations of programs
intended to execute directly on a processor
.

ELF File Format

The significance of ELF file format.
Object files participate in program linking (building a program) and program execution
(running a program). For convenience and efficiency, the object file format provides parallel
views of a file’s contents, reflecting the differing needs of these activities
.(ELF格式从不同的角度反映了不同ELF文件内容,以linking视角可以看出ELF的链接视角的格式,从execution视角可以看出ELF的执行格式)
Object File Format

从链接视角/执行视角看ELF文件各部分的作用

  • ELF header resides at the beginning and holds a “road map’’ describing the file’s organization
  • Sections hold the bulk of object file information for the linking view: instructions, data, symbol table, relocation information, and so on.
  • A program header table, if present, tells the system how to create a process image. Files used to build a process image (execute a program) must have a program header table; relocatable files do not need one.
  • A section header table contains information describing the file’s sections. Every section has an entry in the table; each entry gives information such as the section name, the section size, and so on. Files used during linking must have a section header table; other object files may or may not have one.

从下图可以看出,ELF头部位于ELF文件首部,这是肯定的。但是其他几部分的位置实际情况并非按照图中次序排列。下图所示ELF header的结构体:
ELF header

  • e_phoff
    This member holds the program header table’s file offset in bytes. If the file has no program header table, this member holds zero.
  • e_phnum
    This member holds the number of entries in the program header table. Thus the product of e_phentsize and e_phnum gives the table’s size in bytes. If a file has no program header table, e_phnum holds the value zero.
  1. boot loader如何决定从磁盘读取多少个扇区才能将完整的内核读到内存中?它在哪里发现这些信息?
    1
    2
    3
    4
    ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
    eph = ph + ELFHDR->e_phnum;
    for (; ph < eph; ph++)
    readseg(ph->p_pa, ph->p_memsz, ph->p_offset);

boot loader就是在这几条语句发现该读取多少扇区进入内存,第一条语句是查找内核的program header所在的位置,结构体Proghdr表示program header entry。 因此ph是program header的位置亦即第一条program header entry的位置,eph是program header结束的位置。for循环调用readseg函数,以p_addr,p_offset,p_memsz为参数,即从内核指定的偏移ph->p_offset读取ph->p_memsz大小的字节到内存的ph->p_pa地址中。

program header

  • p_paddr
    On systems for which physical addressing is relevant, this member is reserved for the segment’s physical address.
  • p_offset
    This member gives the offset from the beginning of the file at which the first byte of the segment resides
  • p_memsz
    This member gives the number of bytes in the memory image of the segment; it may be zero.

Loading the Kernel

C定义的ELF headers在inc/elf.h中,通常使用的program sections是:

  • .text: 程序的可执行指令
  • .rodata: read-only-data,C编译器产生的ASCII字符串常量
  • .data: 保存程序初始化数据的数据分区(data section),如全局变量的初始化int x=6;

当链接器计算一个程序的内存布局的时候,会为未初始化的全局变量保存空间在.bss数据区中,在内存中位于.data之后。C要求未初始化的全局变量默认初始化为0,因此.bss中不会存储任何内容,链接器会记录.bss的大小和内存地址。

Take particular note of the “VMA” (or link address) and the “LMA” (or load address) of the .text section. The load address of a section is the memory address at which that section should be loaded into memory. The link address of a section is the memory address from which the section expects to execute.

Jos设置正确的link address的方法:

We set the link address by passing -Ttext 0x7C00 to the linker in boot/Makefrag, so the linker will produce the correct memory addresses in the generated code.
link address

Exercise 5.

Trace through the first few instructions of the boot loader again and identify the first instruction that would “break” or otherwise do the wrong thing if you were to get the boot loader’s link address wrong. Then change the link address in boot/Makefrag to something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens. Don’t forget to change the link address back and make clean again afterward!

0x8c00
再将link address改大一点至0xf000,还是能够继续构建的,再大一点0xfff0,就会构建失败了。

ELF header中的e_entry很重要,这个域保存了程序开始执行的link address(entry_point,虚拟地址)。可以通过如下指令查看内核的第一条执行的语句:

1
2
3
4
5
6
$objdump -f obj/kern/kernel
obj/kern/kernel: file format elf32-i386
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0010000c

可以看出内核执行的第一条语句的地址在0x0010000c,这个与我之前回答的exercise 3的答案吻合。

Exercise 6.

Reset the machine (exit QEMU/GDB and start them again). Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel. Why are they different? What is there at the second breakpoint? (You do not really need to use QEMU to answer this question. Just think.)

0x00100000(1MB)是BIOS ROM结束的地址,在BIOS刚进入boot loader的时候,这时候系统仍然处于实模式,只有低1MB的内存空间可用,所以查看1MB以上的内存空间必然什么都没有:
Boot Loader mem
但是进入内核时,此时系统已经从实模式切换到32位的保护模式,已经可以访问1MB以上的内存空间。而且刚才我们查看内核的LMA时发现,内核的LMA就是0x00100000,也就是说将内核代码加载到物理地址1MB处。
kernel mem
查看kernel对应的反汇编文件obj/kern/kernel.asm,刚好与最初的几条指令对应,证明了0x00100000物理地址处保存的就是内核的代码。

支持一下
扫一扫,支持buwei