Masking Asynchronous Exceptions

{-# LANGUAGE ScopedTypeVariables #-}
module Mask where

import Control.Concurrent.MVar
import Control.Exception
import GHC.IO.Handle.Types (Handle__)

The base library explains asynchronous exceptions and masking quite well, but still this is one of the topics that is often misunderstood and some of its crucial parts like interruptible operations are not well enough documented to slip under the radar too often.

Synchronous vs Asynchronous exceptions

There are two main ways of throwing exceptions in Haskell, either with throwIO or throwTo. throwIO throws an exception in the current thread in synchronous way, throwTo allows to throw an exception in some other thread, hence the name asynchronous exceptions. But let’s start from the beginning, why we even need asynchronous exceptions? This is nicely answered in the paper Asynchronous Exceptions in Haskell:

Speculative computation. A parent thread might start a child thread to compute some value speculatively; later the parent thread might decide that it does not need the value so it may want to kill the child thread.

Timeouts: If some computation does not complete within a specified time budget, it should be aborted.

User interrupt. Interactive systems often need to cancel a computation that has already been started, for example when the user clicks on the “stop” button in a web browser.

Resource exhaustion. Most Haskell implementations use a stack and heap, both of which are essentially finite resources, so it seems reasonable to inform the program when memory is running out, in order that it can take remedial action. Since such exceptions can occur at almost any program point, it is natural to treat them as asynchronous.

Asynchronous exceptions can interrupt almost any computation, masking is provided as a way to make it predictable, so we can reason about it. Let’s examine this standard example:

withLockUnmasked :: MVar () -> IO a -> IO a
withLockUnmasked lock k = do
  takeMVar lock
  a <- catch k (\(e :: SomeException) -> putMVar lock ()
                                      >> throwIO e)
  putMVar lock () 
  return a

The problem with withLockUnmasked is that an asynchronous exception could be thrown just after takeMVar is executed but before the catch installs the handler. To be able to fix this, Haskell provides primitive operations which allow to mask asynchronous exceptions. Each thread keeps the following masking state (original haddocks preserved):

-- | Describes the behaviour of a thread when an asynchronous
-- exception is received.
data MaskingState
  = Unmasked
  -- ^ asynchronous exceptions are unmasked (the normal state)
  | MaskedInterruptible
  -- ^ the state during 'mask': asynchronous exceptions are masked, but
  -- blocking operations may still be interrupted
  | MaskedUninterruptible
  -- ^ the state during 'uninterruptibleMask': asynchronous exceptions are
  -- masked, and blocking operations may not be interrupted

Let us stress that in MaskedInterruptible state, which is a result of using mask function, asynchronous exceptions can be thrown, but only by interruptible operations. In the MaskedUninterruptible asynchronous exceptions cannot be thrown even by blocking / interruptible operations. Asynchronous Exceptions in Haskell paper specifies interruptible operations as:

Any operation which may need to wait indefinitely for a resource (e.g.,takeMVar) may receive asynchronous exceptions even within an enclosing mask, but only while the resource is unavailable.Such operations are termed interruptible operations.

The complete list of interruptible operations is astonishingly short:

takeMVar when the MVar is empty,
putMVar when the MVar is non-empty,
retry :: STM a,
throwTo
interruptible ffi calls.
there are also some primitive operations in non-threaded RTS, which are interruptible, but this will be eventually fixed.

In particular none of the operations listed below are interruptible:

IORef operations,
safe and unsafe foreign calls. There is a subtle difference between safe and unsafe ffi: when a safe ffi returns haskell thread will check if there are pending asynchronous exceptions while haskell rts does not know about unsafe ffi calls.

We have two ways of fixing withLockUnamsked:

withLockMaskedInterruptible :: MVar () -> IO a -> IO a
withLockMaskedInterruptible lock k = mask $ \unmask -> do
  takeMVar lock
  a <- catch (unmask k)
             (\(e :: SomeException) -> putMVar lock ()
                                    >> throwIO e)
  putMVar lock () 
  return a

withLockMaskedUninterruptible :: MVar () -> IO a -> IO a
withLockMaskedUninterruptible lock k = uninterruptibleMask $ \unmask -> do
  takeMVar lock
  a <- catch (unmask k)
             (\(e :: SomeException) -> putMVar lock ()
                                    >> throwIO e)
  putMVar lock () 
  return a

The difference between both of them is subtle. To get to the point we need to analyse which operations are blocking / interruptible. As specified in Asynchronous Exceptions in Haskell, takeMVar is blocking only if v :: MVar () is empty. We have two cases:

v is non-empty: neither of the two can raise asynchronous exception while executing takeMVar. This means that both implementations will install the exception handler before an asynchronous exception is raised, and this is what we wanted.
if v is empty: the semantics of MVar ensures that takeMVar is interruptible until it is empty, once takeMVar takes the value it becomes non-blocking. This ensure that asynchronous exceptions can be raised by withLockMaskedInterruptible only when takeMVar is blocked. This also means that withLockMaskedInterruptible will install the catch handler once takeMVar past the point of being interruptible.

The crucial difference between withLockMaskedInterruptible and withLockMaskedUninterruptible is that the later will never throw an async exception while takeMVar is blocked while the lock is empty (e.g. taken by some other thread).

It seem that analysing the code which is using mask is more difficult than when using uninterruptibleMask, is it really so? The main problem with withLockMaskedUninterruptible is that it can potentially introduce deadlocks; a program might become unresponsive: interruptions like one delivered by signals (e.g. CTRL-C) are delivered using asynchronous exceptions; or it could introduce undeliverable timeouts, which in networking applications can introduce safety hazards. Because deadlocks are non-local properties there is actually no way to analyse if withLockMaskedUninterruptible is safe or not without its context, it depends on the program where it is used. For this reasons the documentation of uninterruptibleMask says in capital letters: THIS SHOULD BE USED WITH GREAT CARE. In my experience, debugging asynchronous exceptions is easier than debugging deadlocks. When logging is done right you can see asynchronous exceptions in the logs, but you will not see a bunch of threads being deadlocked.

Takeaways from this are:

mask masks asynchronous exceptions only in non-interruptible code allowing to raise asynchronous exceptions in blocking operations;
uninterruptibleMask masks asynchronous exceptions, but it can introduce deadlocks;
takeMVar is only blocking if the MVar is empty.

‘base’ and ‘safe-exceptions’ / ‘unliftio’

Both safe-exceptions and unliftio are using bracket implementation which is masking using uninterruptibleMask, while base package is using mask. Which one is more appropriate in a library?

This is base’s implementation of bracket:

bracket :: IO a
        -> (a -> IO b)
        -> (a -> IO c)
        -> IO c
bracket before after thing =
  mask $ \restore -> do
    a <- before
    r <- restore (thing a) `onException` after a
    _ <- after a
    return r

The version used by unliftio and safe-exceptions are both implemented using try but the crucial difference is that they are using uninterruptibleMask when executing the after callback. Since they are using uninterruptibleMask they need to use try to avoid blocking exceptions when executing the before handler. This is to minimize time when a thread is in MaskedUninterruptible state.

Let us look at the most common resource handlers:

File Handles

The base package does not export Handle constructors, they are an implementation detail, so let’s bring the definition here:

data Handle
  = FileHandle           -- A normal handle to a file
        FilePath         -- the file (used for error messages
                         -- only)
        !(MVar Handle__)

  | DuplexHandle         -- A handle to a read/write stream
        FilePath         -- file for a FIFO, otherwise some
                         --   descriptive string (used for error
                         --   messages only)
        !(MVar Handle__) -- The read side
        !(MVar Handle__)

The hClose :: Handle -> IO () calls hClose’ which masks exceptions while calling takeMVar (in withHandle’), and continues (while exceptions are masked) with hClose_help, which does a few interesting things:

flush a buffer,
close a decoder,
call GHC.IO.Device.close to close the file descriptor.

Flushing file handle buffer is done by either safe (non-threaded rts) or unsafe (threaded rts) ffi call (using c_write or c_safe_write), and thus is uninterruptible. Closing the decoder is either a no-op (return ()) or an unsafe foreign call to iconv_close. We are left with analysing GHC.IO.Device.close. Handle__ is using IODevice FD instance. On systems that support either epoll, poll or kqueue (e.g. on Linux, MacOS, FreeBSD, and a likes) it is done via event manager. On other operating systems (Windows) closing file handle is done by a direct foreign call (this might change in the future with the new mio Windows event manager based on I/O completion ports). When event manager is involved, the closeFdWith is used. I recently fixed a bug which made it interruptible, see PR #4942. However, because calls which involve epoll are very fast all the blocking operations done by closeFdWith would block for a very short time, making it quite unlikely to be an issue (but if you run a service for long enough it could be observed).

The conclusion is that closing a file handle could only block on the TMVar holding Handle__. This makes it non-blocking if file handles are not closed concurrently.

What about asynchronous usage of file handles?

This would mean trying to close a file handle from multiple threads. The question is why one would ever need that? A good pattern for using file handles is withFile. Where the callback is only used to read and parse the content of the file and return the result. This is runs in a synchronous way which makes the bracket used by withFile not leak any resources. Another design pattern for using file handles comes from streaming libraries like pipes, streaming or conduit. To ensure that the handle is correctly closed, one needs to use ResourceT from resourcet package. Streaming is synchronous so we are safe when using hClose.

Sockets

The network package close calls (for non-Windows, threaded RTS) closeFdWith with an uninterruptible FFI function (c_close). Which we already know that is non-blocking.

For non-threaded RTS or on Windows, Network.Socket.close directly calls the uninterruptle c_close.

Threads

When dealing with threads as resources the killThread is implemented using throwTo which is an interruptible operation. It blocks until exception is delivered to the target thread. When using killThread as a resource finaliser we should use uninterruptibleMask_ . killThread instead (this week I fixed exactly such bug, see). The same applies when using async package which exports uninterruptibleCancel :: Async a -> IO (). Let us note that the withAsync is internally using uninterruptibleMask_

Conclusions

Back to safe-exceptions and unliftio. In my opinion the base has a better choice for bracket. Debugging deadlocks, is quite hard. GHC has deadlock detection mechanism but it is not always reliable. Both deadlocks and resource leaks can be silent (no logs), but the latter are clearly visible when tracking state of the program with some system tools, or just looking at /proc directly.

The essential trade-offs between both implementations is which kind of programmer’s errors they allow:

base implementation allows to use interruptible release resource function, which might not run till its completion in presence of an asynchronous exception;
safe-exception implementation allows to make the program deadlock (which might even prevent a program from termination when a kill signal is send, e.g. via CTRL-C).

Another common argument for safe-exceptions is that it allows to avoid to catch asynchronous exceptions by default. Catching SomeException can indeed lead to trouble, though for that reason it has been recommended to catch exceptions which you care about not all of them. To avoid catching asynchronous exceptions one can always use the following snippet (though it’s still recommended to be more specific than any synchronous exception!):

catchJust (\e ->
            case fromException e :: Maybe SomeAsyncException of
              Just _  -> Nothing
              Nothing -> Just e
          )

Above all, use your best judgement how to handle exceptions. In some applications using Either as a return type is the best option for dealing with synchronous exceptions, in some critical applications catching any exceptions at some level (the non-recommended way) is the safest way to avoid triggering bugs which would use rare execution path. There is just not a single way which which suits all applications and it all depends on the security / threat model under which one is operating.

References

Asynchronous Exceptions in Haskell, S. Marlow, S.P. Jones, A.Morgan, J. Reppy, December 12, 2006; Note that the specification is using asynchronous throwTo rather than a synchronous one as implemented in GHC, the relevant semantic changes are discussed in section 9.
Dealing with Asynchronous Exceptions during Resource Acquisition, D. Coutts, E. de Vries, 2014.
Interruptible or uninterruptible cleanup, issue in safe-exceptions package.
Uninterruptible closeFdWith merge request #4942