Top and Bottom Halves in Linux

Top and Bottom Halves

One of the main problems with interrupt handling is how to perform lengthy tasks within a handler. Often a substantial amount of work must be done in response to a device interrupt, but interrupt handlers need to finish up quickly and not keep interrupts blocked for long. These two needs (work and speed) conflict with each other, leaving the driver writer in a bit of a bind.

Linux (along with many other systems) resolves this problem by splitting the interrupt handler into two halves. The so-called top half is the routine that actually responds to the interrupt—the one you register with request_irq. The bottom half is a routine that is scheduled by the top half to be executed later, at a safer time.

The big difference between the top-half handler and the bottom half is that all interrupts are enabled during execution of the bottom half—that’s why it runs at a safer time. In the typical scenario, the top half saves device data to a device-specific buffer, schedules its bottom half, and exits: this operation is very fast.

The bottom half then performs whatever other work is required, such as awakening processes, starting up another I/O operation, and so on. This setup permits the top half to service a new interrupt while the bottom half is still working.

Almost every serious interrupt handler is split this way. For instance, when a network interface reports the arrival of a new packet, the handler just retrieves the data and pushes it up to the protocol layer; actual processing of the packet is performed in a bottom half.

The Linux kernel has two different mechanisms that may be used to implement bottom-half processing, Tasklets are often the preferred mechanism for bottom-half processing; they are very fast, but all tasklet code must be atomic. The alternative to tasklets is workqueues, which may have a higher latency but that are allowed to sleep.


Remember that tasklets are a special function that may be scheduled to run, in software interrupt context, at a system-determined safe time. They may be scheduled to run multiple times, but tasklet scheduling is not cumulative; the tasklet runs only once, even if it is requested repeatedly before it is launched. No tasklet ever runs in parallel with itself, since they run only once, but tasklets can run in parallel with other tasklets on SMP systems. Thus, if your driver has multiple tasklets, they must employ some sort of locking to avoid conflicting with each other.

Tasklets are also guaranteed to run on the same CPU as the function that first schedules them. Therefore, an interrupt handler can be secure that a tasklet does not begin executing before the handler has completed. However, another interrupt can certainly be delivered while the tasklet is running, so locking between the tasklet and the interrupt handler may still be required.

Tasklets must be declared with the DECLARE_TASKLET macro:

DECLARE_TASKLET(name, function, data);

name is the name to be given to the tasklet, function is the function that is called to execute the tasklet (it takes one unsigned long argument and returns void), and data is an unsigned long value to be passed to the tasklet function.

The short driver declares its tasklet as follows:

void short_do_tasklet(unsigned long);
DECLARE_TASKLET(short_tasklet, short_do_tasklet, 0);

The function tasklet_schedule is used to schedule a tasklet for running. If short is loaded with tasklet=1, it installs a different interrupt handler that saves data and schedules the tasklet as follows:

irqreturn_t short_tl_interrupt(int irq, void *dev_id, struct pt_regs *regs)
    do_gettimeofday((struct timeval *) tv_head); /* cast to stop 'volatile' warning */
    short_wq_count++; /* record that an interrupt arrived */
    return IRQ_HANDLED;

The actual tasklet routine, short_do_tasklet, will be executed shortly (so to speak) at the system’s convenience. As mentioned earlier, this routine performs the bulk of the work of handling the interrupt; it looks like this:

void short_do_tasklet (unsigned long unused)
    int savecount = short_wq_count, written;
    short_wq_count = 0; /* we have already been removed from the queue */
     * The bottom half reads the tv array, filled by the top half,
     * and prints it to the circular text buffer, which is then consumed
     * by reading processes

    /* First write the number of interrupts that occurred before this bh */
    written = sprintf((char *)short_head,"bh after %6i\n",savecount);
    short_incr_bp(&short_head, written);

     * Then, write the time values. Write exactly 16 bytes at a time,
     * so it aligns with PAGE_SIZE

    do {
        written = sprintf((char *)short_head,"%08u.%06u\n",
                (int)(tv_tail->tv_sec % 100000000),
        short_incr_bp(&short_head, written);
    } while (tv_tail != tv_head);

    wake_up_interruptible(&short_queue); /* awake any reading process */

Among other things, this tasklet makes a note of how many interrupts have arrived since it was last called. A device such as short can generate a great many interrupts in a brief period, so it is not uncommon for several to arrive before the bottom half is executed. Drivers must always be prepared for this possibility and must be able to determine how much work there is to perform from the information left by the top half.


Recall that workqueues invoke a function at some future time in the context of a special worker process. Since the workqueue function runs in process context, it can sleep if need be. You cannot, however, copy data into user space from a workqueue, the worker process does not have access to any other process’s address space.

The short driver, if loaded with the wq option set to a nonzero value, uses a workqueue for its bottom-half processing. It uses the system default workqueue, so there is no special setup code required; if your driver has special latency requirements (or might sleep for a long time in the workqueue function), you may want to create your own, dedicated workqueue. We do need a work_struct structure, which is declared and initialized with the following:

static struct work_struct short_wq;

    /* this line is in short_init(  ) */
    INIT_WORK(&short_wq, (void (*)(void *)) short_do_tasklet, NULL);


Our worker function is short_do_tasklet, which we have already seen in the previous section.

When working with a workqueue, short establishes yet another interrupt handler that looks like this:

irqreturn_t short_wq_interrupt(int irq, void *dev_id, struct pt_regs *regs)
    /* Grab the current time information. */
    do_gettimeofday((struct timeval *) tv_head);

    /* Queue the bh. Don't worry about multiple enqueueing */

    short_wq_count++; /* record that an interrupt arrived */
    return IRQ_HANDLED;