OS Programming - The First Steps

08 Aug 2025

There are many tutorials out there to help you get started with your own hobby operating system (OS). This series of blog posts will focus on one resource that I used and discuss improvements from other sources.

The original source was available at cs.bham.ac.uk. It is no longer accessible, but the Internet Archive has made snapshots of it. The repository from Carlos Fenollosa (cfenollosa) builds on this work. These tutorials aren't finished, but by the end, you'll have your own OS-like boot image that you can extend with C.

This presents a problem, that is addressed in this issue. Another tutorial based on the outcome of the previous lecture has been published here. Some of the techniques and features used in this project and the previous lecture appear to be incorrect, outdated or simply poor practice. In an earlier issue thread inside cfenollosas project, another Git repository is mentioned that heavily comments on and improves the existing codebase. I intend to implement these improvements, understand their meaning, and write about them.

A good file to start with is the bootloader (bootsect.asm). The following comments are present here:

1.  ; SMH: you *need* a longjmp to reset cs, as the first thing - some BIOSes will land you at 07c0:0000 and not 0000:7c00
2.  ; SMH: also, missing stub bpb, -1 to csm compat score
3.  ; SMH: ds:[BOOT_DRIVE] accesses `ds` while it's value is unknown
4.  ; SMH: bp is literally never used. why the hell is it here then?
5.  ; OHNO: ss:sp might be invalid after this instruction because ss could be nonzero, and you must reset it.
6.  ; SMH: use si not di...
7.  ; SMH: if the ebfe (jmp $) is never executed, and switch_to_pm can't sanely return, why dont you just tailcall there?
8.  ; SMH: don't hardcode sector sizes, the incbin directive exists for a reason.
9.  ; EH: maybe print a message or something?
10. ; EH: it's not like there is other code running here, and we all know that the only way to save registers is pusha
11. ; SMH: come on you even have a string defined for returning from the kernel...
12. ; EH: this is where incbin goes

Several comments refer to the same code error, highlighting different aspects of it. Therefore, let's take a look at points 1, 3 and 5. The code segment (CS) is used in combination with the instruction pointer (IP) to show where next line of code is located in memory. After booting, the values are uncertain, as some BIOSes set CS:IP to 07c0:0000 instead of 0000:7c00. If you're interested in the reason why, read this). Although this resolves to the same physical address, it's important to set it correctly. Setting it correctly requires a longjump. Another quirk is that the registers need to be initialized. They could be any value, so it's better to set them to a fixed value. In summary, the following changes result:

[org 0x7c00]
[bits 16]

start:
jmp 0x0000:flushcs

flushcs:
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7c00

Next, I'll focus on point 2. Here, SMH writes that a stub bpb is missing and concludes that the imaginary CSM (Compatibility Support Module) compatibility score should be reduced by one. The author means that at least a BIOS Parameter Block (BPB) filled with dummy values must be present for older BIOSes or hardware with CSM enabled to boot successfully. Additionally, some BIOSes appear to overwrite memory (and potentially boot sector code) where the BPB is typically located (0x7c03). The long jump cannot be placed before the BPB because it would not fit into three bytes. The BPB is part of the FAT file system. To learn more about this, take a look here and here. Code examples are available here, here and here.

[org 0x7c00]
[bits 16]

start:
    jmp begin
    nop
    db "        "    ; oem id
    dw 512           ; sector size
    db 0             ; sectors per cluster
    dw 0             ; reserved sectors
    db 0             ; fat count
    dw 0             ; root dir entries
    dw 0             ; sector count
    db 0             ; media type
    dw 0             ; sectors per fat
    dw 18            ; sectors per track
    dw 2             ; heads count
    dd 0             ; hidden sectors
    dd 0             ; sector count big
    db 0             ; drive number
    db 0             ; reserved / flags
    db 0             ; signature
    dd 0             ; volume id
    db "           " ; volume label
    times 8 db 0     ; filesystem type

begin:
    jmp 0x0000:flushcs

flushcs:
    ; ...

Let's continue with point 4. Cases where the BP register is used can be deleted because they are not used. Point 6 was difficult to understand. Looking at this instruction line: mov BX, MSG_REAL_MODE. The idea is to not store the starting address of the string referenced by MSG_REAL_MODE in the BX register, which is normally used as a base pointer for memory access. Instead, the address should be stored in the SI (source index) register, which is used for string and memory array copying. This can be done with lea SI, [MSG_REAL_MODE]. Later, inside the print function, lodsb can be used to load a byte at DS:SI into AL and increment SI. You can read more about the registers here.

A tail call can be used to address point 7. Nothing fancy. With this method, the disk_load function doesn't need to ret back to where it was called. Below is a minimized visualization of the program flow before and after adding the tail call to disk_load:

# before
start -> load_kernel -> ...
          |
          \-> disk_load

# after
start -> load_kernel -> disk_load -> ...

Points 8 and 12 can simply be reduced by adding the following to the end of the code after the 0xaa55:

KERNEL_OFFSET:
incbin "kernel/kernel.bin"

This way, the offset doesn't have to be hardcoded. Points 9 and 11 are resolved by simply removing the jmp $ after loading the kernel, as well as the string defined for returning from the kernel. We won't be returning from this kernel.

Having set all points except number 10, I would like to reference the bootboot.s code in OpenBSD again. Some may want to copy the code from lines 267 to 277 ans use the "INT 0x13 Extensions Installation Check" to see if LBA adressing is supported when accessing the disk, or if CHS addressing is the only option. Read more about it here. I'll use only CHS addressing since I'm not currently planning to use more than 8 GB on a boot disk, and I'm okay with the other limitations that apply. In this case, saving the drive number to another variable can be omitted.

The new bootloader code is available here. Let's see what's next!