Monday, September 17, 2012

Linux Page Tables












Hardware-wise, we have a two level page table structure, where the first
level has 4096 entries, and the second level has 256 entries.  Each entry
is one 32-bit word.  Most of the bits in the second level entry are used
by hardware, and there aren't any "accessed" and "dirty" bits.

Linux on the other hand has a three level page table structure, which can
be wrapped to fit a two level page table structure easily - using the PGD
and PTE only.  However, Linux also expects one "PTE" table per page, and
at least a "dirty" bit.

Therefore, we tweak the implementation slightly - we tell Linux that we
have 2048 entries in the first level, each of which is 8 bytes (iow, two
hardware pointers to the second level.)  The second level contains two
hardware PTE tables arranged contiguously, preceded by Linux versions
which contain the state information Linux needs.  We, therefore, end up
with 512 entries in the "PTE" level.

This leads to the page tables having the following layout:

   pgd             pte
|        |
+--------+
|        |       +------------+ +0
+- - - - +       | Linux pt 0 |
|        |       +------------+ +1024
+--------+ +0    | Linux pt 1 |
|        |-----> +------------+ +2048
+- - - - + +4    |  h/w pt 0  |
|        |-----> +------------+ +3072
+--------+ +8    |  h/w pt 1  |
|        |       +------------+ +4096

See L_PTE_xxx below for definitions of bits in the "Linux pt", and
PTE_xxx for definitions of bits appearing in the "h/w pt".

PMD_xxx definitions refer to bits in the first level page table.

The "dirty" bit is emulated by only granting hardware write permission
iff the page is marked "writable" and "dirty" in the Linux PTE.  This
means that a write to a clean page will cause a permission fault, and
the Linux MM layer will mark the page dirty via handle_pte_fault().
For the hardware to notice the permission change, the TLB entry must
be flushed, and ptep_set_access_flags() does that for us.

The "accessed" or "young" bit is emulated by a similar method; we only
allow accesses to the page if the "young" bit is set.  Accesses to the
page will cause a fault, and handle_pte_fault() will set the young bit
for us as long as the page is marked present in the corresponding Linux
PTE entry.  Again, ptep_set_access_flags() will ensure that the TLB is
up to date.

However, when the "young" bit is cleared, we deny access to the page
by clearing the hardware PTE.  Currently Linux does not flush the TLB
for us in this case, which means the TLB will retain the translation
until either the TLB entry is evicted under pressure, or a context
switch which changes the user space mapping occurs.

Sunday, September 16, 2012

Saturday, September 15, 2012

kexec


Kexec is a patch to the Linux kernel that allows you to boot directly to a new kernel from the currently running one. In the boot sequence described above, kexec skips the entire bootloader stage (the first part) and directly jumps into the kernel that we want to boot to. There is no hardware reset, no firmware operation, and no bootloader involved. The weakest link in the boot sequence -- that is, the firmware -- is completely avoided. The big gain from this feature is that system reboots are now extremely fast. For enterprise-class systems, kexec drastically reduces reboot-related system downtime. For kernel and system software developers, kexec helps you quickly reboot your system during development or testing efforts without having to go through the costly firmware stage every time.
The kexec patch is the work of Eric Biederman and the project is under active development (see the Resources section for more details on the project and how to contribute to it).
Obviously, since this feature touches so many sensitive parts of the operating system, a great deal of care is needed to make it all work properly. The biggest challenge for kexec is that, in Linux, the new kernel that is to be rebooted to needs to sit in the same place in memory as the currently executing one. Replacing the existing kernel in memory with the new one, while still running in the context of the existing kernel, is a tough task. Another big issue is the state of the devices in the system. Firmware always initializes (or resets) the devices to a known "sane" state. The fact that kexec bypasses the firmware stage means that the state of the devices is unreliable.
Subsequent sections of this article will show you how to overcome these challenges, and how the direct booting to a new kernel is achieved. Note that kexec is currently available only on the x86 32-bit platform. Although work is underway to port kexec to other platforms, there is no working version of the code yet. Hence, all technical details in the subsequent sections are specific to the x86 platform

Kexec has two components. The first is the userspace component known as "kexec-tools." The second is the actual kernel patch. The two parts achieve the two main operations of kexec: loading the new kernel into memory and rebooting to it. Getting a kexec-enabled kernel is simple. Just download the kexec-tools package and the kernel-specific patch (see the link in the Resourcessection), build the kexec-tools package to obtain the kexec tool, and apply the kernel-specific patch to the kernel tree and reboot to it. Of course, make sure you have selected the CONFIG_KEXEC option while building the kernel.
As mentioned above, using kexec consists of (1) loading the kernel to be rebooted to into memory, and (2) actually rebooting to it. To load a kernel, the syntax is as follows:
kexec -l <kernel-image> --append="<command-line-options>"
where <kernel-image> is the kernel file that you intend to reboot to and <command-line-options> contain the command-line parameters that need to be passed to the new kernel. Because the wrong command-line options can cause problems during the reboot, passing the contents of /proc/cmdline is the safest way to ensure that legal values are passed to the rebooting kernel.
For example, if the kernel image you want to reboot is /boot/bzImage, and the contents of /proc/cmdline are"root=/dev/hda1", the command to load the kernel would be:
kexec -l /boot/bzImage -append="root=/dev/hda1"
Then, to actually reboot to the loaded kernel, just type:
kexec -e
The system will reboot immediately. Unlike the normal reboot process, kexec does not perform a clean shutdown of the system before rebooting. It is left to you to kill all applications and unmount file systems before attempting a kexec reboot.
One of the biggest challenges in the development of kexec comes from the fact that the Linux kernel runs from a fixed address in memory. This means that the new kernel needs to sit at the same place that the current kernel is running from. On x86 systems, the kernel sits at the physical address 0x100000 (virtual address 0xc0000000, known as PAGE_OFFSET). The task of overwriting the old kernel with the new one is done in three stages:
  1. Copy the new kernel into memory.
  2. Move this kernel image into dynamic kernel memory.
  3. Copy this image into the real destination (overwriting the current kernel), and start the new kernel.
The first two stages are achieved during the "loading" of the kernel. The first task here is to interpret the contents of the kernel image file. Kexec-tools has been built so that, in principle, you could load and boot to any (even a non-Linux) kernel. Currently, it is possible to boot to any elf32-format kernel image. The file is parsed and the kernel "segments" are loaded into buffers. These segments are categorized based on the nature of the code. For example, in the case of the commonly used "bzImage" kernel file format, the typical segments are for 16-bit kernel code, 32-bit kernel code, and init ramdisk code. The structure used to track these segments is known as kexec_segment and is a fairly simple structure:

Listing 1. The kexec_segment structure
struct kexec_segment {
   void *buf;
   size_t bufsz;
   void *mem;
   size_t memsz;
};

The first two elements of the structure point to the userspace buffer and its size, while the next two elements indicate the final destination of the segment and its size.
Once the kernel-file format-specific module loads the image into user memory, the image is transferred to dynamic kernel memory through the use of the sys_kexec system call. This system call allocates dynamic kernel pages for each of the segments that have been passed from userspace and copies the segments onto these kernel pages.
Kexec also allocates a kernel page to store a small stub of assembly code, known as the reboot_code_buffer. This stub of code does the actual job of overwriting the current kernel with the to-be-rebooted kernel and jumps to it. Thereboot_code_buffer is the only buffer that resides in its final resting place. In other words, it is executed from the same place that it is initially loaded to. In order to achieve this, on systems with MMU enabled, the page holding the code is identity mapped. Simply speaking, this involves creating a page table entry in init_mm (the kernel's page table structure) with the same physical and virtual address. This is necessary to be able to access this piece of code during the reboot operation, as discussed later.
Information about the reboot_code_buffer, the various segments, and other details is maintained through the use of thekimage structure:

Listing 2. The kimage structure
struct kimage {
        kimage_entry_t head;
        kimage_entry_t *entry;
        kimage_entry_t *last_entry;

        unsigned long destination;
        unsigned long offset;

        unsigned long start;
        struct page *reboot_code_pages;

        unsigned long nr_segments;
        struct kexec_segment segment[KEXEC_SEGMENT_MAX+1];

        struct list_head dest_pages;
        struct list_head unuseable_pages;
};

The most important parts of this structure are, of course, the segment[KEXEC_SEGMENT_MAX+1] elements, which point to the buffers in kernel memory containing the image, and the reboot_code_pages pointer to the assembly stub used during reboot.
Once the kernel image has been loaded, the system is ready to reboot into it. The actual operation on rebooting to the new kernel starts with the kexec -e command. This command essentially calls the kernel to perform a reboot using the sys_reboot system call, but with a special flag of - LINUX_REBOOT_CMD_KEXEC.
The reboot system call, upon seeing the special flag, transfers control to the machine_kexec() function. The actions performed by machine_kexec() are extremely architecture-specific. In the current x86 implementation, the sequence of actions is as follows:
  1. To access the identity-mapped reboot_code_buffer, switches from the current process's mm struct to using the kernel's init_mm structure.
  2. Stops the apics and disables interrupts.
  3. Copies the assembly stub code into the reboot_code_buffer that you had allocated during the loading of the kernel image. The assembly code is found in the relocate_new_kernel routine.
  4. Loads all the segment registers with the kernel data segment (__KERNEL_DS) value, and invalidates the GDT and IDT.
  5. Jumps to the code in the reboot_code_buffer, and passes some vital information as parameters to the new kernel, such as the indirection page containing the source/destination addresses of the kernel image, the starting address of the new kernel, the address of the reboot_code_buffer page, and a flag indicating whether the system has physical address extension (PAE) enabled.
The assembly stub code performs the following operations:
  • Reads the arguments from the stack and stores them on registers, and disables interrupts.
  • Using the address of its own page, which has been passed to it as an argument, sets up a stack at the end of that page.
  • Stores the starting address of the new kernel image onto the stack so that a return from the stub code automatically takes the system to the new kernel image.
  • Disables paging by setting appropriate bits on the cr0 register.
  • Resets the page directory base register, cr4, to zero.
  • Flushes the Translation Lookaside Buffers (TLBs).
  • Copies all the kernel image pages onto their final destination pages.
  • Flushes the TLB once again.
  • Resets all the registers to zero, except the stack pointer register esp (as it is pointing to the stack containing the starting address of the new kernel).
  • "Returns" from the stub code. This automatically takes the system to the new kernel.
After this sequence completes, the new kernel takes control and the system is booted up normally.

Systems with high availability requirements and kernel developers who have to constantly reboot their systems will benefit most from kexec. Because kexec skips the most time-consuming parts of system reboot, namely the firmware stage, reboots are extremely quick and availability is increased.
Kexec also has interesting applications in crash dumping tools. The Linux Kernel Crash Dumps (LKCD) project  has used kexec to develop a different dumping mechanism. At a system panic or user dump initiation, the system memory image is compressed and stored in available free memory pages. Next, the system is rebooted to another kernel using kexec. This new kernel is told where the dump is stored, and prevents the use of those memory regions by anyone. Subsequently, the memory dump can be written out to either a disk partition or across the network to a different machine.
The key to this design is the fact that by avoiding the firmware stage during reboot, LKCD is able to prevent the physical memory contents from being erased by the firmware. In a crash situation, LKCD also does not have to depend on an unreliable disk or network device driver to write out the memory image to the destination. Once a reboot has been performed and the system is in a reliable state, the dump is written out to the destination using normal system device drivers.
Kexec is currently available on the x86 32-bit platform only Having it on other architecture platforms such as PPC 64 and AMD 64 would be helpful. Also, better integration with the shutdown interface for graceful termination of processes, shutdown of devices, and unmounting of file systems would make it much more convenient for the average user.
You can contribute to the development of kexec. To get started, try out kexec on a test system. You can also join the "fastboot" mailing list, where all the technical discussions about the project take place

Thursday, September 13, 2012

Kernel Timers


Interrupts 0-31 -> Internal
32-255 ->External Devices

0-15 -> Inter Processor Interrupts

27 -> Global Timer
29->  Private Local Timer

ARM Global Timer

Interrupts to all the cpus using ID27.


ARM Local Timer

Interrupts only the local cpu using ID29.

Local Timer not supported in OMAP4 ES2.1

Rescheduling Interrupts are due to IPI.

Monday, September 10, 2012

ARM TrustZone

ARM

7 execution modes

Secure/NonSecure State

Monitor mode

vector base register  ->


To provide the exception behavior described above, a TrustZone-enabled processor implements three sets of exception vector tables. One of these tables is for the Normal world, one is for the Secure world, and the other is for Monitor mode.

If high vectors are enabled ie, v bit is set in CP15 , then it jumps to 0xFFFF0000 despite the value of VBAR. This is for Secure/Non-Secure state. For monitor mode , VBAR is the base.


ARM Features


Each core has the following features:
 ARM v7 CPU at 600 MHz
 32 KB of L1 instruction CACHE with parity check
 32 KB of L1 data CACHE with parity check
 Embedded FPU for single and double data precision scalar floating-point operations
 Memory management unit (MMU)
 ARM, Thumb2 and Thumb2-EE instruction set support
 TrustZone© security extension
 Program Trace Macrocell and CoreSight© component for software debug
 JTAG interface
 AMBA© 3 AXI 64-bit interface
 32-bit timer with 8-bit prescaler
 Internal watchdog (working also as timer)


The dual core configuration is completed by a common set of components:
 Snoop control unit (SCU) to manage inter-process communication, cache-2-cache and
system memory transfer, cache coherency
 Generic interrupt control (GIC) unit configured to support 128 independent interrupt
sources with software configurable priority and routing between the two cores
 64-bit global timer with 8-bit prescaler
 Asynchronous accelerator coherency port (ACP)
 Parity support to detect internal memory failures during runtime
 512 KB of unified 8-way set associative L2 cache with support for parity check and
ECC
 L2 Cache controller based on PL310 IP released by ARM
 Dual 64-bit AMBA 3 AXI interface with possible filtering on the second one to use a
single port for DDR memory access


TEX[2:0] C B Description
000 0 0 Strongly ordered
000 0 1 Shareable device
000 1 0 Outer and inner write-through, no write-allocate
000 1 1 Outer and inner write-back, no write-allocate
001 0 0 Outer and inner non-cacheable
001 0 1 Reserved
001 1 0 IMPLEMENTATION DEFINED
001 1 1 Outer and inner write-back, write-allocate
010 0 0 Non-shareable device
010 0 1 Reserved
010 1 - Reserved
011 - - Reserved
1BB A A Cacheable memory; outer = AA, inner = BB


AA/BB Attribute
00 Non-cacheable
01 Write-back, write-allocate
10 Write-through, no write-allocate
11 Write-back, no write-allocate

Saturday, September 8, 2012

Build Errors during building for panda board


arch/arm/mach-omap2/omap-headsmp.S: Assembler messages: 
arch/arm/mach-omap2/omap-headsmp.S:36: Error: selected processor does not support ARM mode `smc #0'

If you get these error message in some .S file , then add the below line to the .S file.
.arch_extension sec