Tuesday, June 28, 2011

Writing a C library, part 2

This is part two in a series of blog-posts about best practices for writing C libraries. Previous installments: part one.

Event handling and the main loop

Event-driven applications, especially GUI applications, are often built around the idea of a "main loop" that intercepts and dispatches all sorts of events such as key/button presses, incoming IPC calls, mouse movements, timers and file/socket I/O and so on. The main loop typically "call back" into the application whenever an event happens.

For example, the GLib/Gtk+ library stack is built around the GMainContext, GMainLoop and GSource types and other library stacks provide similar abstractions. Many kernel and systems-level programmers often look funny at GUI programmers when they utter the word "main loop" - much the same way GUI programmers stare confused at kernel programmers when they say put() or get() something. The truth is that a main-loop is really a well-known concept with a different name: it's basically an abstraction of OS primitives such as select(2), poll(2) or equivalent on e.g. Windows.

It is important to note that a multi-threaded application may run different main loops in different threads to ensure that callbacks happen in the right threads - in GLib this is achieved by using the g_main_context_push_thread_default() function which records the main loop for the current thread in thread-local storage. This variable is in turn read when starting an asynchronous operation (such as g_dbus_connection_call() or g_input_stream_read_async()) to ensure that the passed callback function is invoked in a thread running a main loop for the context set with g_main_context_push_thread_default() earlier.

Some main loops, for example the GLib one, allows creating recursive main loops and this is used to implement GtkDialog's run() method. While this indeed appears to block the calling thread, it is important to note that events are still being processed (to e.g. process input events and redraw animations). Specifically, this means that the functions (plural since applies to everything in the call stack) that brought up the dialog might end getting called again (from a callback). Thus, when using functions like gtk_dialog_run() you need to ensure that your functions are either re-entrant or that they are guaranteed to not get called when the dialog is showing (typically achieved by making the dialog modal so the UI action triggering the display of the dialog can't be accessed). Because of pitfalls like this, you must clearly document if a function is using a recursive main loop.

Note that main loops are not a GUI-only concept - a lot of daemons (e.g. background process without any GUI) are built around this concept since it nicely integrates events from any source whether they are file descriptor based or synthetic such as timers or logging events. In fact, a considerable part of the system-level software on a modern Linux system is built on top of GLib and uses its main event loop abstraction to dispatch events - most of the time such daemons sit idle in one or more main loops and wait for D-Bus messages to arrive (to service a client), a timeout to fire (maybe to kick off periodic house-keeping tasks) or a child process to terminate (when using a helper program or process to do work).

Now that we've explained what a main loop is (or rather, what the idea of a main loop is), let's look at why this matters if you a writing a C library. First of all, if your library doesn't need to deliver events to users, you don't need to worry about main loops. Most libraries, however, are not that simple - for example, libudev delivers events when devices are plugged or changed, NetworkManager wants to inform of changes in networking and so on.

If your library is using GLib, it is often suitable to just require that the user runs the GLib main loop (if the application is using another main loop, it can either integrate the GLib main loop (like Qt does) or run it in a separate thread) and use g_main_context_get_thread_default() when setting up a callback. This is the way many GObject-based libraries, such as libpolkit-gobject-1libnm-glib or libgudev work - for example, callbacks connected to the GUdevClient::uevent signal will be called in what was the thread-default main loop when the object was constructed. For a shared resource, such as a message bus connection, a good policy is that callbacks happen in what was the the thread-default main loop when the method was called (see e.g. g_dbus_connection_signal_subscribe() where this is the case) since applications or libraries have no absolute control of when the shared object was created. In any case, functions dealing with callbacks must always document in what context the callback happens in.

On the other hand, if your library is not using GLib, a good way to provide notification is simply to a) provide a file descriptor that e.g. turns readable when there are events to process; and b) provide a function that processes events (and possibly invoke callback functions registered by the user). A good example of this is libudev's udev_monitor_get_fd() and udev_monitor_receive_device() functions. This way the application (or the library using your library) can easily control what thread the event is handled in. As an example of how libudev is integrated into the GLib main loop, see here and here. In the libudev case, the returned file descriptor is the underlying netlink socket used to receive the event from udevd (via the kernel); in the case that there are no natural file descriptor (could be the event is only happening in response to a certain entry in, say, a log file), your library could use pipe(2) (or eventfd(2) if on Linux) and use a private worker thread to signal the other end.

If your library provide callback functions, make sure they take user_data arguments so the user can easily associate callbacks with other objects and data. If the scope of your callback is undefined (e.g. may fire more than once or if there is no way to disconnect the callback), also provide a way to free the user_data pointer when it is no longer needed - otherwise the application will leak data that needs to be freed later. See g_bus_watch_name() and  for an example.

Checklist

  • Provide APIs for main loop integration
  • Make sure callback functions take user_data arguments (possibly with free an accompanying free function)

Synchronous and Asynchronous I/O

It is important for users of a library to know if calling a function involves doing synchronous I/O (also called blocking I/O). For example, an application with an user interface need to be responsive to user input and may even need to update the user interface every frame for smooth animations (e.g. 60 times a second). To avoid unresponsive applications and jerky animations, its UI thread must never call any functions that does any synchronous I/O.

Note that even loading a file from local disk may block for a very long amount of time - sometimes tens of seconds. For example the file may not be in the page cache and the hard disk with the file system may be powered down - or the file may be in the users home directory which could be on a network filesystem such as NFS. Other examples of blocking IO includes local IPC such as D-Bus or Unix domain sockets.

If an operation is known to take a long time to complete (synchronous or otherwise), it is often nice if it is possible to easily cancel the it (perhaps from another thread). For example, see the GCancellable type in the GLib stack. Another nicety (although easily implemented via the GCancellable type) is a way to set a timeout for potentially long-running operations - see e.g. g_dbus_connection_send_message_with_reply() and g_dbus_proxy_call() and note how the latter has an object-wide timeout so the timeout only has to be set once.

Some libraries provide both synchronous and asynchronous versions of a function where the former blocks the calling thread and the latter doesn't. Typically asynchronous I/O is implemented using worker threads (where the worker thread is doing synchronous I/O) but it could also involve communicating with another process via IPC (e.g. D-Bus) or even TCP/IP. For example, in the libgio-2.0 case asynchronous file I/O is implemented via synchronous IO (e.g. basically read(2) and write(2) calls) in worker threads (using a GThreadPool) simply because the Linux kernel does not (yet?) provide an adequate way to do asynchronous I/O suitable for libraries (see also: colorful notes about Asynchronous I/O). On the upside, this is mostly an implementation detail and the libgio-2.0 implementation can migrate to a non-threaded approach should such a mechanism be made available in the future.

Asynchronous I/O typically involves callbacks (or at least some kind of event notification) and thus involves a main loop. If a library provides functions for this, it should clearly state what thread the callback will happen in, and whether it requires the application to run a (specific kind of) main loop - see the previous section about main loops for details.

If a library is thread-safe, it is often easier for the application itself to just use the synchronous version of a function in a worker-thread - if using GLib, g_io_scheduler_push_job() is the right way to do that.

In some cases synchronous I/O is implemented by using a recursive main loop (typically by using the asynchronous form of the function) - this should be avoided as it typically causes all kinds of problems because of reentrancy and events being processed while waiting for the supposedly synchronous operation to complete. As always, clearly document what your code is doing.

Some libraries, such as those in the GLib stack, use a consistent pattern for asynchronous I/O for all of its functions involving the GAsyncResult / GSimpleAsyncResult, GAsyncReadyCallback and GCancellable types - this makes it a lot easier both for programmers and for higher-level language bindings especially since important things like life-cycles are part of this model (for example, you are guaranteed that the callback will always happen, even on cancellation, timeout or error).

Checklist

  • Clearly document if a function does any synchronous I/O
  • Ideally suffix synchronous functions it with _sync() so it's easy to inspect large code-trees using e.g. grep(1) 
  • Consider if an operation needs to be available in synchronous or asychronous form or both.
  • Point to both synchronous and asynchronous functions in your API documentation.
  • If possible, use an established model (such as the GIO model) for I/O instead of rolling your own

1 comment: