Boost Threading

Abstract

With the desire for a platform independent threading library becoming stronger, Boost has started a project to work on this. A threading library is a major piece of work and there are many different things people speaking of when talking of a threading library. The intention of this document is to provide the general framework, allowing the discussions to concentrate on specific, well defined aspects of the threading library. Eventually it should contain a discussion on different alternatives how to address the various problems including idioms, techniques, and library components, as well as a description of library components and potentially necessary language support, suitable for implementation and possibly standardization.

Introduction

For many problems a solutions involving [pseudo] paralellization seems to be reasonable although it is well known that paralellism produces new problems. Apart from simplifying certain tasks paralell executation can also be employed to make optimal use of multiple processors being available. Although solutions can become simpler using multiple threads, a major new problem is introduced: Data consistency can be dwarfed by having multiple threads modifying the same piece of data at the same time. That is, multi threading introduces race conditions. The whole areas of multi threading can be seen as consisting of basically two tasks:

Manipulation of threads, ie. their creation, termination, scheduling, manipulation of thread properties like a thread's priority, etc.
Preventation of race conditions involving all kinds of different techniques including atomic access, locking, and thread local data.

Although there are only two major areas, there are lots of thing which can be tought of when talking about a threading library. This is because on many systems there already exist one or even more facilities to deal with threads. Thus, depending on ones view, it does not necessarily make sense to create yet another facility doing the whole work but rather uses what is there, merely adding some form of a wrapper. ... but then, what is going on with systems lacking all multi threading facilities? It is also not at all clear how a wrapper would look like: Should the wrapper cover everything or provide "only" access to the commonly used methods?

Although it is tempting (at least to some people) to create a complete and new threading library right away, this cannot be the goal of this effort! Instead, it has to be possible that whatever is created as part of this effort can be integrated with existing threading facilities. This is particularily true for Pthreads, the POSIX approach to multi threading, at least if the multi threading support should ever be integrated with standard C++. One reason for this necessity is the application of the Boost threading facilities in libraries which want to support multi threading but otherwise don't really care about threads. Such components should be applicable independent from the underlying threading facility being used. That is, whatever library components are provided from the Boost Threading Library, these are at first only interfaces. Boost might provide implementation of these interfaces for different threading facilities but it should be possible to [easily] implement these interfaces for whatever is used on a specific platform or in a specific project.

Since the topics of thread manipulation and preventation of race conditions are widely independent (not completely: certain thread manipulations might incur race conditions unless special care is taken), these topics will be discussed separately. However, in both cases it should be attempted to support the solutions to the various problems with idioms and corresponding library components to make dealing with the problems as easy as possible.

Goals of the Boost Threading Library

To set up a general measure when considering components for inclusion into the Boost Threading Library, here are some general rules which should be obeyed by anything added to this library (at least some parts of it are identical to the general goals of the Boost Project).

The components should be portable across a variaty of platforms. Since mostly interfaces are defined, this means that the interface can be implemented using a variaty of existing threading libraries. In particular, it has to be implemented using the Pthreads interface.
Any resources, in particular locks, allocatable should have an automated cleanup, ie. the destructor of a stack based object does the cleanup automatically when the corresponding block is exited.
TODO: probably there are more general goals...

Thread Manipulation

This section is intentionally left blank since I (ie. Dietmar Kühl) am neither an expert nor interested in this area. Somebody else has to pick this up. If nobody picks up this area, support will be dropped eventually.

Preventing Race Conditions

People doing multi threaded programming are often that deep into their stuff that they consider deadlocks to be the heart of multi threading problems. Well, it is not! It is actually just a problem introduced by a certain technique to prevent race conditions, namely to protect resources using locks. Although it is almost certain that at the heart of most thread synchronization methods a mutex lock will be used it is worth to think about alternative approaches and to keep the requirements on the basic synchronization mechanisms as low as possible to allow maximum portability and performance. In particular as each use of a mutex incurs some performance penalty which should be avoided where possible for maximum performance.

To determine different techniques for preventation of race conditions it has to be determined how the user of some component will interact with threads. This is rather important because as a result of the related decisions the interface to library component will be affected as well as the well as the usage pattern for this component. Here are the basic options how thread safety can be achieved (this does not cover some of the problems arising from the selected approach, eg. deadlocks have also be prevented somehow):

unlocked: Resources are not at all protected against multiple accesses from different threads but internally they make sure that any resource internally used is accessed in a thread safe manner. However, the user has to make sure that the resource is only accessed from one thread at a time (see for a detailed description).
explicit locking: Before accessing some resource which might be shared with another thread the resources is explicitly locked by the user by acquiring a lock which is somehow associated with the resource.
implicit locking: Resources do their synchronization on their own and the user is not at all bothered with threading stuff at all.

Clearly, we all want the last option: Just using whatever we want to without caring about threads and everything works just fine! Cool. Unfortunately, I doubt we can reach this goal and also provide suitable performace. At the very least the last option has have impacts on the interface of components. For example, the following simple excerpt from a program using a stack class might fail due to an access to the stack from another thread:

if (!stack.empty()) stack.pop();
The stack class has no possibility to lock the stack for the whole portion of the code but only for individual calls to methods. That is, both the calls to empty() and pop() can be made thread safe but another thread might remove the last element in the stack in between the two calls. However, here is a version of this code which can be made thread safe:

!stack.empty() && stack.pop();
No explicit lock is acquired but an implicitly acquired lock is returned from empty() which is released at the end of the full expression, ie. after the executation of pop() which has to return something convertible to a Boolean expression. The interface change is kind of minimal but the return tpyes from empty() and pop() changed. Whether the Boost Threading Library should promote such techniques which probably have certain drawbacks with respect to prevention of deadlocks due to lengthy held locks is a question which is still to be discussed, though...

Different of the approaches mentioned above can be used in the same application. Actually, some approaches might be more suitable for low level library components while others are more suitable for higher level library components which internally use the lower level library components with a different synchroniztion policy. However, in discussion it should be made clear what kind of interface is assumed for an idiom or a technique proposed.

What causes problems?

Before looking into possible idioms and techniques to avoid the problems, lets more precisely spell out what causes problems. Actually, this is fairly simple: Any access to a resource shared between two or more procsses might cause a race condition. Simple, isn't it? Unfortunately, it is not always obvious whether a resource is shared between multiple threads or if it at least uses a different shared resource internally.

However, since a thread has to know about a resource to actually use it, any resource created by a thread which is not made known to other threads does not cause any problems, as long as the resource is implemented in a thread safe manner itself! That is, any object from a thread safe library known to only one thread does not need any form of consideration. In particular, any object allocated on the thread's stack falls into this category as long as neither the thread nor the object itself register the object somewhere where it becomes visible to other threads.

Techniques to avoid problems

There are basically two approaches how to avoid problems, ie. race conditions due to multi threading:

Do not access resources shared between multiple threads. Although this is not a workable solution in all cases, it is a very reasonable approach which should always be considered. If it is not possible to prevent all accesses to shared resources, it should be attempted to access shared resources only as infrequently as possible.
Somehow indicating that one thread is already using a resource and preventing forthcoming attempts to use this resource. This is normally achieved with some form of lock.

Since the first approach is almost certainly insufficient, locks play an important role. Since they are likely to be used a lot, it is desirable that they have a maximum of performance and, to even further reduce the performance penalty, avoid their use where possible. In addition, it is necessary to set up techniques and, if possible, programmatic support to avoid deadlocks.

Following are some sections outlining general approaches to avoid certain problems. This is just a start and has to be completed. Also, there will be separated documents providing more indepth details on the approaches as these mature.

Partial Order of Locks

A very simple approach to avoiding deadlocks is to define a partial order on the locks, making sure that a thread never attempts to lock a lock which is less than any lock already hold by this thread. People familiar with using data bases probably know this approach, too: The same concept is used for data base locking (surprise, surprise, ...).

It would be nice to somehow give programmatic support for detecting loops which would violate the restriction for partial orders. This can be done at runtime by tracking certain information when locking (see throwing deadlock). Turning this into an enforced compile time features seems to be impossible. However, if a user kind of registers the resources used underneath this registry can be used to have the compiler spit out [probably strange] error messages. The idea is to determine at compile time some sort of a "level" for each resource used through the template instantiation mechanism. If there is a loop, the level cannot be computed and an infinite recursion will have the compiler spit out some error message like "template nesting level reached".

Throwing Deadlock

If deadlocks are not prevented by eg. using some form of partial ordring of locks they should be detected and some action taken. Since aborting any involved thread is clearly out of discussion some other approach is necessary to break the tie. Since C++ is used as programming language and locks are automatically released when a scope is exited, throwing an exception on detecting a deadlock is an obvious solution. To detect a deadlock it is necessary to know which locks are hold by a thread and for each locked lock which thread is holding it: This defines a graph which would contain a cycle if waiting for a locked lock would create a deadlock.

The implementation of such a feature requires that the underlying locking mechanism allows a non-blocking attempt to acquire a lock. If this is provided, when a lock is to be locked it is first attempted to lock it non-blockingly. Only if this fails some work is necessary: The graph has to be searched to find out whether locking the corresponding lock would create a deadlock. If this is the case, an exception is thrown. Otherwise, a blocking attempt to acquire the lock is made.

Service Demotion

When accessing a certain service which is provided by a resource shared by serveral threads, it is not always necessary to obtain a lock for each service request. Instead, an auxiliary object can be used which tries to supply the requested service first and only if it cannot fulfill the request on its own contacts the shared service.

For example, consider heap allocations: There is basically one pool for heap memory. This pool can be accessed using the standard std::allocator. Doing so directly requires some form of thread synchronization for each request. Instead, an auxiliary thread local allocator can be used which maintains its own pool of memory. Only if the pool is empty, the shared allocator is contacted to obtain more memory. Of course, there are some details to be worked out: For example, if a garbage collecting thread is used, this thread's pool would be ever growing and the globally available memory would become exhausted. However, the general idea should be clear.

Basically this means that a service is split into two parts: One part shared amoung threads and one part local to each thread using this service.

Immutable Services

Race conditions can only occure if a shared resource is modified by one thread while another thread tries to read or also modify the same resource. Thus, there is no race_condition if a resource is never modified. As a consequence, immutable services don't have to be protected against race conditions.

But where is an immutable service useful as it can't modify anything? Well, it can modify things! It is only the service itself which has to be immutable. For example, the facets in the standard localization library are immutable (with respect to the service they provide; there is a mutable reference count but this does not affect normal processing at all) but they are mutating what is passed to them as argument. Of coure, the resources identified by the arguments have to be protected against race conditions if they are shared between several threads. However, if they are not shared there is no need to protect anything.