Cooperative pool deadlock when calling into an opaque subsystem

ole · March 14, 2024, 4:05pm

This is a follow-up on @nsc’s post Deadlock When Using DispatchQueue from Swift Task from 2023-07.

Here’s a reduced example that reliably (for me) creates a deadlock in the cooperative thread pool, caused by exhausting the pool with calls into an opaque subsystem (Apple’s Vision framework in this case).

The code

The full example is on GitHub: GitHub - ole/SwiftConcurrencyDeadlock (macOS only because it uses an Apple API). The essential piece of code is this:

let imageURL: URL = …
try await withThrowingTaskGroup(of: (id: Int, faceCount: Int).self) { group in
    // This deadlocks when the number of child tasks is larger than the
    // number of CPU cores on your machine. Try using a smaller range.
    for i in 1...25 {
        group.addTask {
            // Perform face detection using Vision framework
            print("Task \(i) starting")
            let request = VNDetectFaceRectanglesRequest()
            let requestHandler = VNImageRequestHandler(url: imageURL)
            try requestHandler.perform([request])
            let faces = request.results ?? []
            return (id: i, faceCount: faces.count)
        }
    }
    for try await (id, faceCount) in group {
        print("Task \(id) detected \(faceCount) faces")
    }
}

This creates a task group with a bunch of child tasks, and each child task calls synchronously into Vision.framework to perform face detection on an image. The example has been reduced from a real-world codebase that worked similarly. (In the example, all child tasks run the face detection on the same image, making it wasteful. But you can imagine wanting to run this on multiple different images in parallel.)

Running this code deadlocks every time (in my tests) as soon as the number of child tasks (= the iteration count of the for loop) is greater than or equal to the number of threads in the cooperative pool (10 on my 10-core M1 Pro machine).

Observation

When you stop the deadlocked process in the debugger, it looks like this:

10 child tasks have started running
Each of the 10 threads in the cooperative pool is hanging in a lock inside a dispatchGroupWait call that originated in Vision.framework
No other thread is doing any meaningful work that would make progress toward unblocking one of the cooperative threads.

Analysis

My interpretation:

Although Vision.framework provides a synchronous API to clients (VNRequestHandler.perform), it internally performs some async work and uses GCD for this.
Vision.framework schedules the async work on its own dispatch queue and then uses DispatchGroup.wait() to block the thread on which the request came in until the work is done.
The async work item gets scheduled, but GCD never gives it a thread to run on because it's waiting for a task in the cooperative pool to finish. That never happens, hence the deadlock.

Relevant quotes from last year's thread:

DispatchQueue is over-committing, but if you’re enqueuing to a queue with queue.sync, you’re also tying up a thread that may be from an executor that is not. sync has the isolation properties of the queue but doesn’t change the underlying reality of what thread you’re on.
— Deadlock When Using DispatchQueue from Swift Task - #19 by John_McCall

(I imagine our DispatchGroup.wait() behaves very much like DispatchQueue.sync.)

The general rule is that you should never block work in Swift concurrency on work that isn’t already actively running on a thread (“future work”). That is being violated here because the barrier block is not actively running — it’s been enqueued, but it cannot clear the barrier without running, which it needs a thread to do.
— Deadlock When Using DispatchQueue from Swift Task - #25 by John_McCall

This is the rule we're (unknowingly) violating here. The async work item inside Vision.framework is enqueued, but not yet actively running. And the next quote explains why GCD doesn't bring up a new thread for this work item:

The specific implementation reason this blows up today is that both Swift concurrency and Dispatch’s queues are serviced by same underlying pool of threads — Swift concurrency’s jobs just promise not to block on future work. So when Dispatch is deciding whether to over-commit, i.e. to create an extra thread in order to process the barrier block, it sees that all the threads are tied up by Swift concurrency jobs, which promise to not be blocked on future work and therefore can be assumed to eventually terminate without extra threads being created. Therefore, it doesn’t make an extra thread, and since you are blocking on future work, you’re deadlocking.
— Deadlock When Using DispatchQueue from Swift Task - #25 by John_McCall

It's really easy to run into deadlocks

I realize that Swift concurrency’s tendency to deadlock is not qualitatively different from GCD, only quantitatively. If you rewrote the code with GCD, it would also deadlock when GCD exhausts its thread pool limit. You can try this easily by dispatching onto a global dispatch queue from inside the child task and using a continuation to bridge back into Swift concurrency land.

GCD-based “workaround”

try await withThrowingTaskGroup(of: (id: Int, faceCount: Int).self) { group in
    // This "fixes" the deadlock at the cost of thread explosion.
    // Also, GCD's max thread pool size is 64, so if you increase to 64 or
    // more child tasks it will deadlock again.
    for i in 1...64 {
        group.addTask {
            print("Task \(i) starting")
            return try await withCheckedThrowingContinuation { c in
                DispatchQueue.global().async {
                    do {
                        let request = VNDetectFaceRectanglesRequest()
                        let requestHandler = VNImageRequestHandler(url: imageURL)
                        try requestHandler.perform([request])
                        let faces = request.results ?? []
                        c.resume(returning: (id: i, faceCount: faces.count))
                    } catch {
                        c.resume(throwing: error)
                    }
                }
            }
        }
    }
    …

So I guess you can say that the code as written is bad because it creates an unbounded amount of work at once, which either leads to thread explosion (bad) or deadlock (very bad).

Conclusion

These are the points I'm hoping to make:

It's really easy to run into deadlocks with Swift concurrency.

Not using "dangerous" APIs (such as DispatchSemaphore, DispatchGroup, or DispatchQueue.sync) in your async code is not enough. Any call into an opaque subsystem, no matter how innocuous it looks (cf. requestHandler.perform() above), may internally block on future work (as defined by @John_McCall above), thus violating the general rule.
The small thread pool size limit makes it way more likely to run into these problems in the real world than it used to be with GCD, even if the qualitative behavior is not so different (@wadetregaskis made the same point last year).
If you test your code only on your 20-core dev machine but your customers run it on 4–6 cores, you may not be aware how many deadlocks you're creating.

It's unclear to me what the best workaround is.

If any opaque subsystem is a potential deadlock problem, I don't see how you can reliably avoid such problem in real-world code, especially in the Apple world where calling into closed-source opaque frameworks is the norm.

The default executor for tasks does not overcommit, so if you’re using a system that relies on overcommit for progress, and you cannot rewrite it, then you need to be very careful to only call into it from a thread that is definitely from an overcommiting executor.
— Deadlock When Using DispatchQueue from Swift Task - #17 by John_McCall

I am willing to be careful, but I'm not sure how to reliably identify these problems before shipping my code to customers.
We've seen above that pushing the work out to a global dispatch queue is problematic at best: it causes thread explosion and only shifts the deadlock problem out.
I think a proper solution should somehow limit the amount of parallelism to a reasonable width, perhaps a little less than the number of CPU cores.
- Is this the best solution? If so, I'd love to have built-in APIs that make this convenient.
- OperationQueue does have the ability to limit the number of operations running simultaneously, but I've always found it pretty awkward to use.
- A width-limited TaskGroup could be a useful thing to have. You can write this code manually, but it's quite a bit of boilerplate. I briefly attempted to write myself an abstraction for this, but that also turned out harder than expected.

In short, I don't know what the best answer is. cc @hborla and @mattie in case you want to include this in your concurrency migration guide (which l look forward to! Thanks for doing this!).

LIBDISPATCH_COOPERATIVE_POOL_STRICT

Sidenote: I tried setting the environment variable LIBDISPATCH_COOPERATIVE_POOL_STRICT=1 as a help to identify such problems. It would be nice if running my code under this provided a way to detect potential deadlocks during development, independent of the number of cores of the machine the code runs on.

But to my surprise, the program runs to completion without deadlocking! Setting the environment variable does limit the cooperative thread pool width to 1 (you can observe that the child tasks run sequentially), but it seems that setting it has other effects too. When you stop the app in the debugger, you can see that there now is another thread that serves the Vision.framework's internal dispatch queue, so everything is making forward progress, hence no deadlock:

Does GCD change how it interops with the cooperative thread pool when LIBDISPATCH_COOPERATIVE_POOL_STRICT is set?

grynspan · March 14, 2024, 4:08pm

It looks like this Apple API is using a dispatch group as a semaphore. Semaphores can, as you're seeing, lead to deadlock when used within the Swift concurrent thread pool. Please file feedback against it using Apple's feedback assistant here. Thanks!

gwendal.roue · March 14, 2024, 5:06pm

Thanks for your sample code. Since I read about those potential deadlocks it the interaction between GCD aka Dispatch and Swift concurrency, I wasn't feeling comfortable with the async SQLite accesses of GRDB, which use GCD under the hood. No one has ever reported this issue, but maybe people are shy.

So I tried to reproduce the deadlock, inspired by your example, with a DatabasePool, the database connection that can perform parallel reads:

func testTaskGroup() async throws {
    let dbPool = try makeDatabasePool()
    try await dbPool.write { db in
        try db.execute(sql: "CREATE TABLE t(a)")
    }

    try await withThrowingTaskGroup(of: Int.self) { group in
        for i in 1...1000 {
            group.addTask {
                print("Task \(i) starting")
                return try await dbPool.read(Table("t").fetchCount)
            }
        }
        for try await count in group {
            print(count)
        }
    }
}

And... This works. No deadlock. Xcode 15.3, macOS 14.3.1, M1 Max.

Screenshot in the middle of the 1000 concurrent tasks:

This is a serious topic, so I'll share how, precisely, GCD is used here, even if I don't understand everything.

The read async method just uses a continuation on top of a completion-based method:

public func read<T: Sendable>(_ value: @escaping @Sendable (Database) throws -> T) async throws -> T {
    try await withUnsafeThrowingContinuation { continuation in
        asyncRead { result in
            do {
                try continuation.resume(returning: value(result.get()))
            } catch {
                continuation.resume(throwing: error)
            }
        }
    }
}

The asyncRead method asynchronously acquires a connection from a pool of SQLite connections. The pool has a maximum size: some reads must wait until a reader become available (this limit is enforced by a DispatchSemaphore, that we'll see below). The call to pool.asyncGet below eventually provides an available connection, and a release closure that must be called when we're done with the connection (so that the connection turns available for another read). Slightly simplified while retaining the important GCD stuff:

public func asyncRead(_ value: @escaping @Sendable (Result<Database, Error>) -> Void) {
    readerPool.asyncGet { result in
        do {
            let (reader, releaseReader) = try result.get()
            // Second async jump because that's how `Pool.asyncGet` has to be used.
            reader.async { db in
                value(.success(db))
                releaseReader(.reuse)
            }
        } catch {
            value(.failure(error))
        }
    }
}

The interesting comment above is "Second async jump because that's how Pool.asyncGet has to be used".

The reason for this async jump (we're talking about plain GCD DispatchQueue.async) is this post by @soroush, where he explains how to avoid thread explosion with GCD. Importantly each concurrent job that has to wait for the semaphore is enqueued in one serial DispatchQueue (see the blog post). Here's the code of readerPool.asyncGet:

/// Eventually produces a tuple (element, release), where element is
/// intended to be used asynchronously.
///
/// Client must call release(), only once, after the element has been used.
///
/// - important: The `execute` argument is executed in a serial dispatch
///   queue, so make sure you use the element asynchronously.
func asyncGet(_ execute: @escaping @Sendable (Result<ElementAndRelease, Error>) -> Void) {
    // Inspired by https://khanlou.com/2016/04/the-GCD-handbook/
    // > We wait on the semaphore in the serial queue, which means that
    // > we’ll have at most one blocked thread when we reach maximum
    // > executing blocks on the concurrent queue. Any other tasks the user
    // > enqueues will sit inertly on the serial queue waiting to be
    // > executed, and won’t cause new threads to be started.
    semaphoreWaitingQueue.async {
        execute(Result { try self.get() })
    }
}

The DispatchSemaphore is finally there, in pool.get() (again, simplified):

typealias ElementAndRelease = (element: T, release: @Sendable (PoolCompletion) -> Void)

/// Returns a tuple (element, release)
/// Client must call release(), only once, after the element has been used.
func get() throws -> ElementAndRelease {
    itemsSemaphore.wait()
    ...
    return (element: element, release: {
        itemsSemaphore.signal()
    })
}

That's the complete picture of GCD usage in GRDB. I omitted a few details in the pool that look unrelated to thread creation, such as a concurrent queue and a dispatch group that make it possible to run some code with the guarantee that no connection is used (similar to the "barrier" flag of dispatch queues) - features that are not used in the test.

I thought it could be interesting to share this experience. I'm not sure what is the workaround that @ole asks about, but it looks like it is possible to write GCD code that plays well with Swift concurrency?

ole · March 14, 2024, 6:14pm

Thanks for the suggestion, Jonathan. I don't think I'll do this though because I don't want to engage with nor support Apple's hostile bug reporting process (from an outside developer's perspective! I understand it's a different story if you work for Apple). Sorry, and I hope you understand.

Joe_Groff · March 14, 2024, 6:58pm

At a glance, it doesn't look like anything in your example would directly block a thread in the cooperative pool, since the async part is using withUnsafeContinuation to interact with the GCD- and callback-based parts. The issue there with semaphores has more to do with not being able to propagate QoS or priority if a higher-priority task ends up depending on the suspended task. From Ole's screenshots in the OP, it appears that Apple framework code is directly blocking the thread in non-async code that was called from tasks running in the cooperative pool, keeping the thread from being able to run other tasks and ultimately starving the pool when all the threads block.

ole · March 14, 2024, 6:59pm

Yes, that's correct.

ole · March 14, 2024, 7:07pm

Thanks Gwendal! And also thanks to @soroush and Mike Rhodes for their 2016 articles! I remember reading Soroush's post then, but I did not remember this pattern for limiting the amount of concurrency.

I was able to adopt this pattern in my GCD-based "workaround" from the first post and I think I really like this:

It avoids the thread explosion.
It fixes the deadlock because the Swift concurrency thread pool isn't blocked (by using continuations) and there's no risk of exhausting the GCD thread pool.

Here's the updated code (I also pushed this to GitHub):

/// Alternative implementation that fixes the deadlock.
///
/// Based on:
///
/// - Gwendal Roué’s reply on the Swift forums: https://forums.swift.org/t/cooperative-pool-deadlock-when-calling-into-an-opaque-subsystem/70685/3
/// - Souroush Khanlou, The GCD handbook: https://khanlou.com/2016/04/the-GCD-handbook/
/// - Mike Rhodes, Limiting concurrent execution using GCD: https://web.archive.org/web/20160613023817/https://dx13.co.uk/articles/2016/6/4/limiting-concurrent-execution-using-gcd.html
func performWorkUsingGCD(maxConcurrency: Int) async throws {
    let imageURL = findResourceInBundle("church-of-the-king-j9jZSqfH5YI-unsplash.jpg")!
    try await withThrowingTaskGroup(of: (id: Int, faceCount: Int).self) { group in
        let semaphore = DispatchSemaphore(value: maxConcurrency)
        let semaphoreWaitQueue = DispatchQueue(label: "maxConcurrency control")
        for i in 1...100 {
            group.addTask {
                print("Task \(i) starting")
                return try await withUnsafeThrowingContinuation { c in
                    semaphoreWaitQueue.async {
                        semaphore.wait()
                        DispatchQueue.global().async {
                            defer {
                                semaphore.signal()
                            }
                            do {
                                let request = VNDetectFaceRectanglesRequest()
                                let requestHandler = VNImageRequestHandler(url: imageURL)
                                try requestHandler.perform([request])
                                let faces = request.results ?? []
                                c.resume(returning: (id: i, faceCount: faces.count))
                            } catch {
                                c.resume(throwing: error)
                            }
                        }
                    }
                }
            }
        }
        for try await (id, faceCount) in group {
            print("Task \(id) detected \(faceCount) faces")
        }
    }
}

gwendal.roue · March 14, 2024, 8:21pm

Agreed. As a way to control this, all dispatch queues created by a GRDB connection have a QoS that is controlled by the user, and defaults to userInteractive. This is not very fine-grained, but it fits most needs (querying the db in response to user interaction), and avoids most runtime warnings.

gwendal.roue · March 14, 2024, 8:22pm

I'm happy

EDIT

I made another test, this time with a semaphore that allows more parallel readers than there are CPU cores.

There is still no deadlock, as predicted by @Joe_Groff, but we do have "too many" threads. This is not the classic and pathological thread explosion - user is still in control. But we don't exhibit the gracious adaptation of the number of CPU cores that would be satisfying to see:

In the end, this is not really a problem for GRDB, because limiting the number of database connections is more a matter of limiting memory consumption - the default limit is 5 readers (no rationale for this number, I just threw a dice and found the digit nice).

But in general, I don't know how to instruct GCD to fine tune the number of threads according to the physical limits of the computer. That would make the maxConcurrency parameter of @ole's function obsolete.

gwendal.roue · March 14, 2024, 8:52pm

Thank you. It's quite reassuring to read a rationale that excludes the potential for deadlocks.

ole · March 15, 2024, 8:39am

Yes, as long as you stay under GCD's max thread limit (64 on macOS in my tests) you should be fine in terms of no deadlock.

There are OS-specific APIs to query the number of CPU cores. On macOS you can do it like this (APIs on Linux etc. will probably be different):

import Darwin

var size: size_t = 4
var cpuCount: Int32 = 0
let result = sysctlbyname("hw.logicalcpu", &cpuCount, &size, nil, 0)
if result == 0 {
    print(cpuCount) // prints 10 on my machine (Apple M1 Pro)
}

Look at the sysctl(3) man page for more info.

gwendal.roue · March 15, 2024, 9:03am

The sky is the limit! Test completes even with ridiculous figures:

There are OS-specific APIs to query the number of CPU cores.

Thank you for the technique Maybe there is a technique that has GCD do it automatically, but at least we're not blocked in the current state of the knowledge exposed in this thread: we can avoid thread explosion, play gracefully with Swift concurrency, and adapt our amount of parallelism to the physical limits. Sounds like a win to me

tera · March 15, 2024, 12:02pm

Is there a call that returns the number of threads (say, on macOS)?

gwendal.roue · March 15, 2024, 12:37pm

For counting threads, I have this C function (iOS simulator):

#include <unistd.h>
#include <mach/mach.h>
#include "getThreadsCount.h"

// https://stackoverflow.com/a/21571172/525656
int getThreadsCount(void) {
    thread_array_t threadList;
    mach_msg_type_number_t threadCount;
    task_t task;

    kern_return_t kernReturn = task_for_pid(mach_task_self(), getpid(), &task);
    if (kernReturn != KERN_SUCCESS) {
        return -1;
    }

    kernReturn = task_threads(task, &threadList, &threadCount);
    if (kernReturn != KERN_SUCCESS) {
        return -1;
    }
    vm_deallocate (mach_task_self(), (vm_address_t)threadList, threadCount * sizeof(thread_act_t));

    return threadCount;
}

I use it in order to test that thread explosion does not happen under heavy load

func testStuff() throws {
    if getThreadsCount() < 0 {
        throw XCTSkip("Thread count is not available")
    }

    // Proceed with the test
}

asdf_bro · March 15, 2024, 3:27pm

Serial DispatchQueues are targeted to the overcommit global queues by default; they are not limited to 64 threads. These overcommit global queues are otherwise inaccessible with public APIs. dump() is an easy way to observe this.

Regarding the original post, it's very concerning that some Apple APIs are unsafe to use on Swift's cooperative pool due to deadlock risk. I've long been skeptical of the usability and safety of the fixed-width cooperative pool; I tend to perform system calls and computation-heavy code on DispatchQueues instead. Learning that some Apple APIs carry deadlock risk will only accelerate this tendency.

wadetregaskis · March 15, 2024, 4:59pm

If you have Foundation available, a simpler version is:

ProcessInfo.processInfo.activeProcessorCount

ole · March 15, 2024, 6:00pm

Today I learned. Thank you!

tera · March 15, 2024, 8:56pm

Thank you. FWIW, it works in iOS simulator and on macOS with no App Sandbox, and fails on iOS device or macOS with App Sandbox enabled.

I managed to fix it.

extension Thread {
    static var count: Int {
        var threadList: thread_array_t?
        var threadCount = mach_msg_type_number_t()
        let err = task_threads(mach_task_self_, &threadList, &threadCount)
        if err != KERN_SUCCESS { return -1 }
        guard let threadList else { return 0 }
        vm_deallocate(mach_task_self_, .init(bitPattern: threadList), UInt(threadCount) * UInt(MemoryLayout<thread_act_t>.stride))
        return Int(threadCount)
    }
}

This works with iOS devices and on macOS with App sandbox enabled.

Not sure if I translated this C correctly:

vm_deallocate(mach_task_self(), (vm_address_t)threadList, threadCount * sizeof(thread_act_t));

hgraves · March 16, 2024, 1:06am

Only iOS simulator yeah?

tera · March 16, 2024, 9:00am

See the message just above: instead of using task_for_pid(mach_task_self()...) (which could get you the mach port of an arbitrary process – and that's restricted) use mach_task_self() itself as you are getting the information about the current process.