Building an OS Kernel

During the Fall 2024 semester, Qinghan Chen and I implemented a small Unix-like operating system kernel for CMU’s 15-410 OS course. The project required building each core subsystem—from virtual memory to scheduling to system calls—directly on top of x86 hardware architecture. This post summarizes the concepts we worked with and the reasoning behind several of our design choices.

Part 1: Core Concepts

Virtual Memory and Paging

Our starting point was establishing a functional virtual memory subsystem. On x86-32, the paging mechanism divides a virtual address into three fields: a 10-bit page directory index, a 10-bit page table index, and a 12-bit page offset. The CR3 register stores the physical address of the active page directory, and the processor uses this structure to locate the physical frame corresponding to a virtual address.

Each page directory and page table entry includes not only an address but also permission and status bits such as present, read/write, and user/supervisor. These bits combine across levels, so an accessible user page requires consistent permissions at both the directory and table entries. Implementing paging correctly meant ensuring every address translation path had valid and coherent metadata.

Context Switching and Scheduling

To support multiple runnable threads, the kernel must save and restore execution context predictably. Our context structure includes all general-purpose registers, segment registers, the values automatically pushed by the processor during an interrupt, and the control registers CR2 and CR3.

Context switches occur either preemptively (via timer interrupts) or voluntarily (via system calls such as yield() or deschedule()). In both cases, the kernel saves the current thread’s stack pointer, identifies a new thread to run, loads its saved state, and resumes execution. Maintaining a uniform stack layout allowed us to use a single context-restore path regardless of how the switch was initiated.

Interrupt Handling and Privilege Transitions

Interrupts are central to both preemption and device interaction. The processor dispatches an interrupt using the IDT and, when transitioning from user mode, switches to the kernel stack designated in the Task State Segment. It saves user-mode SS, ESP, EFLAGS, CS, and EIP, and for certain faults, an error code.

Handlers run with full kernel privileges. On completion, iret restores the saved state and returns to the appropriate privilege level. Ensuring that our handlers and wrapper assembly preserved stack invariants was essential; even small deviations could cause unrecoverable faults on return.

System Call Interface

We implemented more than twenty system calls covering threading, process management, memory operations, and basic I/O. Each call is invoked using a designated software interrupt. The handler retrieves arguments from user registers, validates them, and performs the requested operation.

Argument validation is essential for robustness. User programs can present invalid pointers or malformed data, so we implemented controlled read/write helpers that check addresses against user-space bounds before the kernel dereferences them.

Device Drivers

The kernel includes simple drivers for the timer, keyboard, and console:

  • Timer: Programs the PIT to generate periodic interrupts, updates a global tick count, and notifies the scheduler.
  • Keyboard: Reads scan codes from the controller, translates them to characters, and stores them in a buffer for system calls like readline() and getchar().
  • Console: Writes to VGA text-mode memory and reads cursor state from CRT controller ports.

These drivers operate through explicit I/O port instructions and operate at interrupt level, requiring careful synchronization.

Part 2: Design Decisions

Dual Page Directory Structure

To manage page tables safely, we used two page directories:

  1. A “sudo” directory used exclusively by the kernel for modifying page tables.
  2. Per-process page directories for execution.

The sudo directory identity-maps all physical memory into a consistent kernel virtual space. When updating user page tables, the kernel switches temporarily to this directory, modifies the structures, and switches back. This avoids reliance on recursive mappings and guarantees that page table memory is always reachable.

Kernel space is mapped identically across all page directories, allowing the same kernel code and data to execute regardless of which process is active.

Round-Robin Scheduling with Separate Queues

Our scheduler uses round-robin scheduling with three distinct queues—runnable, descheduled, and blocked—each backed by hash-indexed lists for efficient lookup. Threads move between queues based on system call activity or I/O events.

Thread selection follows a simple policy: the head of the runnable queue is executed, then rotated to the end at each preemption point. An idle thread (TID 0) runs when no user threads are available, simplifying the control flow.

Thread and Process Metadata

We maintained per-thread and per-process records using hash tables. Thread metadata includes identifiers, the kernel stack base and current SP, and any user-configured exception handlers. Process metadata includes PID, parent PID, thread membership, child process status, and wait queues for wait().

Each thread receives its own kernel stack. The stack pointer saved during a context switch allows the kernel to resume execution reliably.

Physical Memory Allocation

The course provides an LMM (List-based Memory Manager) for physical memory allocation. The kernel uses lmm_alloc_page() and lmm_free_page() to manage frames and page tables. A global mutex protects LMM operations since they operate on shared state.

The allocator resides entirely in kernel-mapped memory, ensuring it remains accessible even if the process page directory changes.

Context Representation

We enforced a specific stack layout in our interrupt wrappers to ensure consistency. After PUSHA saves general-purpose registers, we push segment registers and selected control registers. The context-switch routine saves the current ESP and restores the next thread’s ESP before executing POPA and iret.

Treating all interrupts—timer, system call, or otherwise—uniformly simplified the switching logic and reduced special-case paths.

Error Handling

Assertions were used extensively during development to detect logic errors early. The kernel halts with diagnostic output if an invariant is violated. For production behavior, however, system call handlers validate all user inputs and return error codes rather than performing unsafe operations.

Conclusion

Building the kernel required reasoning about software and hardware at the same level of detail—tracking how each instruction, register change, and memory access would behave on the actual processor. A line from Alan Perlis summarized the experience particularly well: “To understand a program you must become both the machine and the program.” It was a fitting way to close the course, and one I found especially resonant after working through the system from the ground up.

Explore Next

SyncLeet Chrome Extension

Other Projects