{"slug": "swift-concurrency-manifesto", "title": "Swift Concurrency Manifesto", "summary": "The Swift Concurrency Manifesto, authored by Chris Lattner, outlines a long-term vision for introducing a first-class concurrency model to Swift, focusing on task-based abstractions like async/await and actors to eliminate shared mutable state. The document emphasizes improving programmer experience by making concurrent code safer, easier to design, and more maintainable, while noting that it is not a finalized proposal but a catalyst for community discussion. Swift has historically avoided built-in concurrency, relying on OS abstractions like GCD, but the manifesto aims to address issues such as race conditions and unclear data protection in current patterns.", "body_md": "# Swift Concurrency Manifesto\n\nAuthor: [Chris Lattner](https://github.com/lattner)\n\n[Chinese Translation](https://gist.github.com/yxztj/7744e97eaf8031d673338027d89eea76) by [Jason Yu](https://github.com/yxztj)\n\n## Contents\n\n- [Introduction](#introduction)\n- [Overall Vision](#overall-vision)\n- [Part 1: Async/await: Beautiful asynchronous APIs](#part-1-asyncawait-beautiful-asynchronous-apis)\n- [Part 2: Actors: Eliminating shared mutable state](#part-2-actors-eliminating-shared-mutable-state)\n- [Part 3: Reliability through fault isolation](#part-3-reliability-through-fault-isolation)\n- [Part 4: Improving system architecture](#part-4-improving-system-architecture)\n- [Part 5: The crazy and brilliant future](#part-5-the-crazy-and-brilliant-future)\n- [Learning from other concurrency designs](#learning-from-other-concurrency-designs)\n\n## Introduction\n\nThis document is published in the style of a \"Swift evolution manifesto\", outlining a long-term\nview of how to tackle a very large problem.  It explores *one possible* approach to adding\na first-class concurrency model to Swift, in an effort to catalyze positive discussion that leads\nus to a best-possible design.  As such, it isn't an approved or finalized design\nprescriptive of what Swift will end up adopting.  It is the job of public debate on the open\nsource [swift-evolution mailing list](https://github.com/apple/swift-evolution) to discuss and\niterate towards that ultimate answer, and we may end up with a completely different approach.\n\nWe focus on task-based concurrency abstractions commonly encountered in client\nand server applications, particularly those that are highly event driven (e.g. responding\nto UI events or requests from clients).  This does not attempt to be a comprehensive survey\nof all possible options, nor does it attempt to solve all possible problems in the space\nof concurrency.\nInstead, it outlines a single coherent design thread that can be built over the span of years to\nincrementally drive Swift to further greatness.\n\n### Concurrency in Swift 1...4\n\nSo far, Swift was carefully designed to avoid most concurrency topics, because we specifically did\nnot want to cut off any future directions.  Instead, Swift programmers use OS abstractions (like\nGCD, pthreads, etc) to start and manage tasks.  The design of GCD and Swift's trailing\nclosure syntax fit well together, particularly after the major update to the GCD APIs in Swift 3.\n\nWhile Swift has generally stayed away from concurrency topics, it has made some\nconcessions to practicality.  For example, ARC reference count operations are atomic,\nallowing references to classes to be shared between threads.  Weak references are also\nguaranteed to be thread atomic, Copy-On-Write (🐮) types like Array and String are sharable,\nand the runtime provides some other basic guarantees.\n\n### Goals and non-goals of this manifesto\n\nConcurrency is a broad and sweeping concept that can cover a wide range of topics.  To help\nscope this down a bit, here are some non-goals for this proposal:\n\n - We are focusing on task based concurrency, not data parallelism.  This is why we focus on\n   GCD and threads as the baseline, while completely ignoring SIMD vectorization,\n   data parallel for loops, etc.\n - In the systems programming context, it is important for Swift developers to have low-level\n   opt-in access to something like the C or C++ memory consistency model.  This is definitely\n   interesting to push forward, but is orthogonal to this work.\n - We are not discussing APIs to improve existing concurrency patterns (e.g. atomic integers,\n   better GCD APIs, etc).\n\nSo what are the actual goals?  Well, because it is already possible to express concurrent apps\nwith GCD, our goal is to make the experience *far better than it is today* by appealing to the\ncore values of Swift: we should aim to reduce the programmer time necessary to get from\nidea to a *working and efficient* implementation. In particular, we aim to improve the\nconcurrency story in Swift along these lines:\n\n - Design: Swift should provide (just) enough language and library support for\n   programmers to know what to reach for when a concurrent abstractions are\n   needed.  There should be a structured \"right\" way to achieve most tasks.\n - Maintenance: The use of those abstractions should make Swift code easier to\n   reason about.  For example, it is often difficult to know what data is\n   protected by which GCD queue and what the invariants are for a heap based\n   data structure.\n - Safety: Swift's current model provides no help for race conditions, deadlock\n   and other concurrency problems.  Completion handlers can get called on a\n   surprising queue.  These issues should be improved, and we would like to get\n   to a \"safe by default\" programming model.\n - Scalability: Particularly in server applications, it is desirable to have\n   hundreds of thousands of tasks that are active at a time (e.g. one for every\n   active client of the server).\n - Performance:  As a stretch goal, it would be great to improve performance,\n   e.g. by reducing the number of synchronization operations performed, and\n   perhaps even reducing the need for atomic accesses on many ARC operations.\n   The compiler should be aided by knowing how and where data can cross task\n   boundaries.\n - Excellence: More abstractly, we should look to the concurrency models\n   provided by other languages and frameworks, and draw together the best ideas\n   from wherever we can get them, aiming to be better overall than any\n   competitor.\n \nThat said, it is absolutely essential that any new model coexists with existing\nconcurrency constructs and existing APIs.  We cannot build a conceptually\nbeautiful new world without also building a pathway to get existing apps into\nit.\n\n\n### Why a first class concurrency model?\n\nIt is clear that the multicore world isn't the future: it is the present! As such, it is\nessential for Swift to make it straight-forward for programmers to take\nadvantage of hardware that is already prevalent in the world.  At the same time,\nit is already possible to write concurrent programs: since adding a concurrency model\nwill make Swift more complicated, we need a strong justification for that complexity.\nTo show opportunity for improvement, let's explore some of the pain that Swift\ndevelopers face with the current approaches.  Here we focus on GCD since almost\nall Swift programmers use it.\n\n#### Asynchronous APIs are difficult to work with\n\nModern Cocoa development involves a lot of asynchronous programming using closures and completion handlers, but these APIs are hard to use.  This gets particularly problematic when many asynchronous operations are used, error handling is required, or control flow between asynchronous calls is non-trivial.\n\nThere are many problems in this space, including the \"pyramid of doom\" that frequently occurs:\n\n```swift\nfunc processImageData1(completionBlock: (result: Image) -> Void) {\n    loadWebResource(\"dataprofile.txt\") { dataResource in\n        loadWebResource(\"imagedata.dat\") { imageResource in\n            decodeImage(dataResource, imageResource) { imageTmp in\n                dewarpAndCleanupImage(imageTmp) { imageResult in\n                    completionBlock(imageResult)\n                }\n            }\n        }\n    }\n}\n```\n\nError handling is particularly ugly, because Swift's natural error handling mechanism cannot be used.  You end up with code like this:\n\n```swift\nfunc processImageData2(completionBlock: (result: Image?, error: Error?) -> Void) {\n    loadWebResource(\"dataprofile.txt\") { dataResource, error in\n        guard let dataResource = dataResource else {\n            completionBlock(nil, error)\n            return\n        }\n        loadWebResource(\"imagedata.dat\") { imageResource, error in\n            guard let imageResource = imageResource else {\n                completionBlock(nil, error)\n                return\n            }\n            decodeImage(dataResource, imageResource) { imageTmp, error in\n                guard let imageTmp = imageTmp else {\n                    completionBlock(nil, error)\n                    return\n                }\n                dewarpAndCleanupImage(imageTmp) { imageResult in\n                    guard let imageResult = imageResult else {\n                        completionBlock(nil, error)\n                        return\n                    }\n                    completionBlock(imageResult)\n                }\n            }\n        }\n    }\n}\n```\n\nPartially because asynchronous APIs are onerous to use, there are many APIs defined in a synchronous form that can block (e.g. `UIImage(named: ...)`), and many of these APIs have no asynchronous alternative.  Having a natural and canonical way to define and use these APIs will allow them to become pervasive.  This is particularly important for new initiatives like the Swift on Server group.\n\n#### What queue am I on?\n\nBeyond being syntactically inconvenient, completion handlers are problematic because their\nsyntax suggests that they will be called on the current queue, but that is not always the case.\nFor example, one of the top recommendations on Stack Overflow is to implement your own\ncustom async operations with code like this (Objective-C syntax):\n\n```objective-c\n- (void)asynchronousTaskWithCompletion:(void (^)(void))completion;\n{\n  dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{\n\n    // Some long running task you want on another thread\n\n    dispatch_async(dispatch_get_main_queue(), ^{\n      if (completion) {\n        completion();\n      }\n    });\n  });\n}\n```\n\nNote how it is hard coded to call the completion handler on the main queue.  This is an\ninsidious problem that can lead to surprising results and bugs like race conditions.  For\nexample, since a lot of iOS code already runs on the main queue, you may have been using\nan API built like this with no problem.  However, a simple refactor to move that code to a\nbackground queue will introduce a really nasty problem where the code will queue hop\nimplicitly - introducing subtle undefined behavior!\n\nThere are several straight-forward ways to improve this situation like better documentation\nor better APIs in GCD.  However, the fundamental problem here is that there is no apparent\nlinkage between queues and the code that runs on them.  This makes it difficult to design\nfor, difficult to reason about and maintain existing code, and makes it more challenging to\nbuild tools to debug, profile, and reason about what is going wrong, etc.\n\n#### Shared mutable state is bad for software developers\n\nLets define \"Shared mutable state\" first: \"state\" is simply data used by the program.  \"Shared\"\nmeans the data is shared across multiple tasks (threads, queues, or whatever other concurrency\nabstraction is used).  State shared by itself is not harmful: so long as no-one is modifying the\ndata, it is no problem having multiple readers of that data.\n\nThe concern is when the shared data is mutable, and therefore someone is changing it while\nothers tasks are looking at it.  This opens an enormous can of worms that the software world has been\ngrappling with for many decades now.  Given that there are multiple things looking at and\nchanging the data, some sort of synchronization is required or else race conditions, semantic\ninconsistencies and other problems are raised.\n\nThe natural first step to start with are mutexes or locks.  Without attempting to survey the\nfull body of work\naround this, I'll claim that locking and mutexes introduce a number of problems: you need to\nensure that data is consistently protected by the right locks (or else bugs and memory safety\nissues result), determine the granularity of locking, avoid deadlocks, and deal with many other\nproblems.  There have been a number of attempts to improve this situation, notably\n`synchronized` methods in Java (which were later imported into Objective-C).  This sort of\nthing improves the syntactic side of the equation but doesn't fix the underlying problem.\n\nOnce an app is working, you then run into performance problems, because mutexes are\ngenerally very inefficient - particularly when there are many cores and threads.  Given decades\nof experience with this model, there are a number of attempts to solve certain corners of the\nproblem, including\n[readers-writer locks](https://en.wikipedia.org/wiki/Readers–writer_lock),\n[double-checked locking](https://en.wikipedia.org/wiki/Double-checked_locking), low-level\n[atomic operations](https://en.wikipedia.org/wiki/Linearizability#Primitive_atomic_instructions)\nand advanced techniques like\n[read/copy/update](https://en.wikipedia.org/wiki/Read-copy-update).  Each of these improves\non mutexes in some respect, but the incredible complexity, unsafety, and fragility of the\nresulting model is itself a sign of a problem.\n\nWith all that said, shared mutable state is incredibly important when you're working at the\nlevel of systems programming: e.g. if you're *implementing* the GCD API or a kernel in Swift,\nyou absolutely must be able to have full ability to do this.  This is why it is ultimately important\nfor Swift to eventually define an opt-in memory consistency model for Swift code.  While it is\nimportant to one day do this, doing so would be an orthogonal effort and thus is not the\nfocus of this proposal.\n\nI encourage anyone interested in this space to read [Is\nParallel Programming Hard, And, If So, What Can You Do About\nIt?](https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html).  It is\na great survey developed by Paul E. McKenney who has\nbeen driving forward efforts to get the Linux kernel to scale to massively multicore\nmachines (hundreds of cores).  Besides being an impressive summary of hardware characteristics\nand software synchronization approaches, it also shows the massive complexity creep that\nhappens when you start to care a lot about multicore scalability with pervasively shared\nmutable state.\n\n#### Shared mutable state is bad for hardware\n\nOn the hardware side of things, shared mutable state is problematic for a number of reasons.\nIn brief, the present is pervasively multicore - but despite offering the ability to view these\nmachines as shared memory devices, they are actually incredibly\n[NUMA / non-uniform](https://en.wikipedia.org/wiki/Non-uniform_memory_access).\n\nTo oversimplify a bit, consider what happens when two different cores are trying to read and\nwrite the same memory data: the cache lines that hold that data are arbitrated by (e.g.) the\n[MESI protocol](https://en.wikipedia.org/wiki/MESI_protocol), which only allows a cache\nline to be mutable in a single processor's L1 cache.  Because of this, performance quickly\nfalls off of a cliff: the cache line starts ping-pong'ing between the cores, and\nmutations to the cache line have to be pushed out to other cores that are simply reading it.\n\nThis has a number of other knock on effects: processors have quickly moved to having\n[relaxed consistency models](https://en.wikipedia.org/wiki/Consistency_model) which make\nshared memory programming even more complicated.  Atomic accesses (and other\nconcurrency-related primitives like compare/exchange) are now 20-100x slower than non-atomic\naccesses.  These costs and problems continue to scale with core count, yet it\nisn't hard to find a large machine with dozens or hundreds of cores today.\n\nIf you look at the recent breakthroughs in hardware performance, they have come from\nhardware that has dropped the goal of shared memory.  Notably,\n[GPUs](https://en.wikipedia.org/wiki/Graphics_processing_unit) have been extremely\nsuccessful at scaling to extremely high core counts, notably because they expose a\nprogramming model that encourages the use of fast local memory instead of shared global\nmemory.  Supercomputers frequently use [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface)\nfor explicitly managed memory transfers, etc.  If you explore this from first principles, the\nspeed of light and wire delay become an inherently limiting factor for very large shared\nmemory systems.\n\nThe point of all of this is that it is highly desirable for Swift to move in a direction where Swift\nprograms run great on large-scale multi-core machines.  With any luck, this could unblock the\nnext step in hardware evolution.\n\n#### Shared mutable state doesn't scale beyond a single process\n\nOk, it is somewhat tautological, but any model built on shared mutable state doesn't work\nin the absence of shared memory.\n\nBecause of this, the software industry has a complexity explosion of systems for [interprocess\ncommunication](https://en.wikipedia.org/wiki/Inter-process_communication): things like\n[sockets, signals, pipes, MIG,\nXPC](https://www.mikeash.com/pyblog/friday-qa-2009-01-16.html), and many others.\nOperating systems then invariably\nintroduce variants of the same abstractions that exist in a single process, including locks (file\nlocking), shared mutable state (memory mapped files), etc.  Beyond IPC, [distributed\ncomputation](https://en.wikipedia.org/wiki/Distributed_computing)\nand cloud APIs then reimplement the same abstractions in yet-another way, because\nshared memory is impractical in that setting.\n\nThe key observation here is simply that this is a really unfortunate state of\naffairs.  A better world would be for app developers to have a way to\nbuild their data abstractions, concurrency abstractions, and reason about their\napplication in the large, even if it is running across multiple machines in a cloud ecosystem.\nIf you want your single process app to start running in an IPC or distributed setting, you\nshould only have to teach your types how to serialize/code themselves, deal with new\nerrors that can arise, then configure where you want each bit of code to run.  You shouldn't\nhave to rewrite large parts of the application - certainly not with an entirely new technology\nstack.\n\nAfter all, app developers don't design their API with JSON as the input and output format\nfor each function, so why should cloud developers?\n\n## Overall vision\n\nThis manifesto outlines several major steps to address these problems, which can be added\nincrementally to Swift over the span of years.  The first step is quite concrete, but subsequent\nsteps get increasingly vague: this is an early manifesto and there is more design work to\nbe done.  Note that the goal here is not to come up with inherently novel ideas, it is to pull\ntogether the best ideas from wherever we can get them, and synthesize those ideas into\nsomething self-consistent that fits with the rest of Swift.\n\nThe overarching observation here is that there are four major abstractions in computation\nthat are interesting to build a model on top of:\n\n  - traditional control flow\n  - asynchronous control flow\n  - message passing and data isolation\n  - distributed data and compute\n\nSwift already has a fully-developed model for the first point, incrementally refined and\nimproved over the course of years, so we won't talk about it here.  It is important to observe\nthat the vast majority of low-level computation benefits from imperative control flow,\n[mutation with value semantics](https://developer.apple.com/videos/play/wwdc2015/414/),\nand yes, reference semantics with classes.  These concepts are the important low-level\nprimitives that computation is built on, and reflect the basic abstraction of CPUs.\n\nAsynchrony is the next fundamental abstraction that must be tackled in Swift, because it is\nessential to programming in the real world where we are talking to other machines, to slow\ndevices (spinning disks are still a thing!), and looking to achieve concurrency between multiple\nindependent operations.  Furthermore, latency of apparently identical operations is sometimes\nsubject to significant jitter, examples include: networks dropping a packet (retry after\ntimeout) and by fast path/slow path optimizations (e.g. caches).\n\nFortunately, Swift is not the first language to face\nthese challenges: the industry as a whole has fought this dragon and settled on\n[async/await](https://en.wikipedia.org/wiki/Await) as the right abstraction.  We propose\nadopting this proven concept outright (with a Swift spin on the syntax).  Adopting\nasync/await will dramatically improve existing Swift code, dovetailing with existing and\nfuture approaches to concurrency.\n\nThe next step is to define a programmer abstraction to define and model the independent\ntasks in a program, as well as the data that is owned by those tasks.  We propose the\nintroduction of a first-class [actor model](https://en.wikipedia.org/wiki/Actor_model), which\nprovides a way to define and reason about independent tasks who communicate between\nthemselves with asynchronous message sending.  The actor model has a deep history of\nstrong academic work and was adopted and proven in\n[Erlang](https://www.erlang.org) and [Akka](http://akka.io), which successfully power a large\nnumber of highly scalable and reliable systems.\nWith the actor model as a baseline, we believe we can achieve data isolation by ensuring that\nmessages sent to actors do not lead to shared mutable state.\n\nSpeaking of reliable systems, introducing an actor model is a good opportunity and excuse\nto introduce a mechanism for handling and partially recovering from runtime failures (like\nfailed force-unwrap operations, out-of-bounds array accesses, etc).  We explore several\noptions that are possible to implement and make a recommendation that we think will be a\ngood for UI and server applications.\n\nThe final step is to tackle whole system problems by enabling actors to run in different\nprocesses or even on different machines, while still communicating asynchronously through\nmessage sends.  This can extrapolate out to a number of interesting long term possibilities,\nwhich we briefly explore.\n\n\n## Part 1: Async/await: beautiful asynchronous APIs\n\nNOTE: This section is concrete enough to have a [fully baked\nproposal](https://gist.github.com/lattner/429b9070918248274f25b714dcfc7619) with more details.\n\nNo matter what global concurrency model is settled on for Swift, it is hard to ignore the\nglaring problems we have dealing with asynchronous APIs.  Asynchronicity is unavoidable\nwhen dealing with independently executing systems: e.g. anything involving I/O (disks,\nnetworks, etc), a server, or even other processes on the same system.  It is typically \"not ok\"\nto block the current thread of execution just because something is taking a while to load.\nAsynchronicity also comes up when dealing with multiple independent operations that can\nbe performed in parallel on a multicore machine.\n\nThe current solution to this in Swift is to use \"completion handlers\" with closures.  These are\n[well understood](https://grokswift.com/completion-handlers-in-swift/) but also have a large\nnumber of well understood problems: they often stack up a pyramid of doom, make error\nhandling awkward, and make control flow extremely difficult.\n\nThere is a well-known solution to this problem, called\n[async/await](https://en.wikipedia.org/wiki/Await).  It is a popular programming style that\nwas first introduced in C# and was later adopted in many other languages, including Python,\nJavascript, Scala, Hack, Dart, etc.  Given its widespread success and acceptance\nby the industry, I suggest that we do the obvious thing and support this in Swift.\n\n### async/await design for Swift\n\nThe general design of async/await drops right into Swift, but a few tweaks makes it fit into\nthe rest of Swift more consistently.  We suggest adding `async` as a function modifier akin\nto the existing `throws` function modifier.  Functions (and function types) can be declared as\n`async`, and this indicates that the function is a\n[coroutine](https://en.wikipedia.org/wiki/Coroutine).  Coroutines are functions that may return\nnormally with a value, or may suspend themselves and internally return a continuation.\n\nThis approach allows the completion handler to be absorbed into the language.  For example,\nbefore you might write:\n\n```swift\nfunc loadWebResource(_ path: String, completionBlock: (result: Resource) -> Void) { ... }\nfunc decodeImage(_ r1: Resource, _ r2: Resource, completionBlock: (result: Image) -> Void)\nfunc dewarpAndCleanupImage(_ i : Image, completionBlock: (result: Image) -> Void)\n\nfunc processImageData1(completionBlock: (result: Image) -> Void) {\n    loadWebResource(\"dataprofile.txt\") { dataResource in\n        loadWebResource(\"imagedata.dat\") { imageResource in\n            decodeImage(dataResource, imageResource) { imageTmp in\n                dewarpAndCleanupImage(imageTmp) { imageResult in\n                    completionBlock(imageResult)\n                }\n            }\n        }\n    }\n}\n```\n\nwhereas now you can write:\n\n```swift\nfunc loadWebResource(_ path: String) async -> Resource\nfunc decodeImage(_ r1: Resource, _ r2: Resource) async -> Image\nfunc dewarpAndCleanupImage(_ i : Image) async -> Image\n\nfunc processImageData1() async -> Image {\n    let dataResource  = await loadWebResource(\"dataprofile.txt\")\n    let imageResource = await loadWebResource(\"imagedata.dat\")\n    let imageTmp      = await decodeImage(dataResource, imageResource)\n    let imageResult   = await dewarpAndCleanupImage(imageTmp)\n    return imageResult\n}\n````\n\n`await` is a keyword that works like the existing `try` keyword: it is a noop at runtime, but\nindicate to a maintainer of the code that non-local control flow can happen at that point.\nBesides the addition of the `await` keyword, the async/await model allows you to write\nobvious and clean imperative code, and the compiler handles the generation of state\nmachines and callback handlers for you.\n\nOverall, adding this will dramatically improve the experience of working with completion\nhandlers, and provides a natural model to compose futures and other APIs on top of.\nMore details are contained in [the full\nproposal](https://gist.github.com/lattner/429b9070918248274f25b714dcfc7619).\n\n### New asynchronous APIs\n\nThe introduction of async/await into the language is a great opportunity to introduce more\nasynchronous APIs to Cocoa and perhaps even entire new framework extensions (like a revised\nasynchronous file I/O API).  The [Server APIs Project](https://swift.org/server-apis/) is also\nactively working to define new Swift APIs, many of which are intrinsically asynchronous.\n\n\n## Part 2: Actors: Eliminating shared mutable state\n\nGiven the ability define and use asynchronous APIs with expressive \"imperative style\" control\nflow, we now look to give developers a way to carve up their application into multiple\nconcurrent tasks.  We propose adopting the model of\n[actors](https://en.wikipedia.org/wiki/Actor_model): Actors naturally represent real-world\nconcepts like \"a document\", \"a device\", \"a network request\", and are particularly well suited\nto event driven architectures like UI applications, servers, device drivers, etc.\n\nSo what is an actor?  As a Swift programmer, it is easiest to think of an actor as a\ncombination of a `DispatchQueue`, the data that queue protects, and messages that can be\nrun on that queue.  Because they are embodied by an (internal) queue abstraction, you\ncommunicate with Actors asynchronously, and actors guarantee that the data they protect is\nonly touched by the code running on that queue.  This provides an \"island of serialization\nin a sea of concurrency\".\n\nIt is straight-forward to adapt legacy software to an actor interface, and it is possible to\nprogressively adopt actors in a system that is already built on top of GCD or other\nconcurrency primitives.\n\n### Actor Model Theory\n\nActors have a deep theoretical basis and have been explored by academia since the 1970s -\nthe [wikipedia page on actors](https://en.wikipedia.org/wiki/Actor_model) and the\n[c2 wiki page](http://wiki.c2.com/?ActorsModel) are good places\nto start reading if you'd like to dive into some of the theoretical fundamentals that back the\nmodel.  A challenge of this work (for Swift's purposes) is that academia assumes a pure actor\nmodel (\"everything is an actor\"), and assumes a model of communication so limited that it\nmay not be acceptable for Swift.  I'll provide a broad stroke summary of the advantages of\nthis pure model, then talk about how to address the problems.\n\nAs Wikipedia says:\n\n> In response to a message that it receives, an actor can: make local decisions, create more\n> actors, send more messages, and determine how to respond to the next message received.\n> Actors may modify private state, but can only affect each other through messages (avoiding\n> the need for any locks).\n\nActors are cheap to construct and you communicate with an actor using efficient\nunidirectional asynchronous message sends (\"posting a message in a mailbox\").\nBecause these messages are unidirectional, there is no waiting, and thus deadlocks are\nimpossible.  In the academic model, all data sent in these messages is deep copied, which\nmeans that there is no data sharing possible between actors.  Because actors cannot touch\neach other's state (and have no access to global state), there is no need for any\nsynchronization constructs, eliminating all of the problems with shared mutable state.\n\nTo make this work pragmatically in the context of Swift, we need to solve several problems:\n\n- we need a strong computational foundation for all the computation within a task.  Good\n  news: this is already done in Swift 1...4!\n- unidirectional async message sends are great, but inconvenient for some things.  We want\n  a model that allows messages to return a value (even if we encourage them not to), which\n  requires a way to wait for that value. This is the point of adding async/await.\n- we need to make message sends efficient: relying on a deep copy of each argument is not\n  acceptable.  Fortunately - and not accidentally - we already have Copy-On-Write (🐮) value\n  types and [move semantics](https://github.com/apple/swift/blob/master/docs/OwnershipManifesto.md)\n  on the way as a basis to build from.  The trick is dealing with reference types, which are\n  discussed below.\n- we need to figure out what to do about global mutable state, which already exists in Swift.\n  One option is considered below.\n  \n### Example actor design for Swift\n\nThere are several possible ways to manifest the idea of actors into Swift.  For the purposes of\nthis manifesto, I'll describe them as a new type in Swift because it is the least confusing way\nto explain the ideas and this isn't a formal proposal.  I'll note right here up front that this is\nonly one possible design: the right approach may be for actors to be a special kind of class,\na model described below.\n\nWith this design approach, you'd define an actor with the `actor` keyword.  An actor can\nhave any number of data members declared as instance members, can have normal methods,\nand extensions work with them as you'd expect.  Actors are reference types and have an\nidentity which can be passed around as a value.  Actors can conform to protocols and\notherwise dovetail with existing Swift features as you'd expect.\n\nWe need a simple running example, so lets imagine we're building the data model for an app\nthat has a tableview with a list of strings.  The app has UI to add and manipulate the list.  It\nmight look something like this:\n\n```swift\n  actor TableModel {\n    let mainActor : TheMainActor\n    var theList : [String] = [] {\n      didSet {\n        mainActor.updateTableView(theList)\n      }\n    }\n    \n    init(mainActor: TheMainActor) { self.mainActor = mainActor }\n\n    // this checks to see if all the entries in the list are capitalized:\n    // if so, it capitalize the string before returning it to encourage\n    // capitalization consistency in the list.\n    func prettify(_ x : String) -> String {\n      // Details omitted: it inspects theList, adjusting the\n      // string before returning it if necessary.\n    }\n\n    actor func add(entry: String) {\n      theList.append(prettify(entry))\n    }\n  }\n```\n\nThis illustrates the key points of an actor model:\n\n- The actor defines the state local to it as instance data, in this case the reference to\n   `mainActor` and `theList` is the data in the actor.\n- Actors can send messages to any other actor they have a reference to, using traditional\n  dot syntax.\n- Normal (non-actor) methods can be defined on the actor for convenience, and\n  they have full access to the state within their `self` actor.\n- `actor` methods are the messages that actors accept.  Marking a method as `actor`\n  imposes certain restrictions upon it, described below.\n- It isn't shown in the example, but new instances of the actor are created by using the\n  initializer just like any other type: `let dataModel = TableModel(mainActor)`.\n- Also not shown in the example, but `actor` methods are implicitly `async`, so they can\n  freely call `async` methods and `await` their results.\n\nIt has been found in other actor systems that an actor abstraction like this encourage the\n\"right\" abstractions in applications, and map well to the conceptual way that programmers\nthink about their data.  For example, given this data model it is easy to create multiple\ninstances of this actor, one for each document in an MDI application.\n\nThis is a straight-forward implementation of the actor model in Swift and is enough to achieve\nthe basic advantages of the model.  However, it is important to note that there are a number\nof limitations being imposed here that are not obvious, including:\n\n- An `actor` method cannot return a value, throw an error, or have an `inout` parameter.\n- All of the parameters must produce independent values when copied (see below).\n- Local state and non-`actor` methods may only be accessed by methods defined lexically\n  on the actor or in an extension to it (whether they are marked `actor` or otherwise).\n\n### Extending the model through await\n\nThe first limitation (that `actor` methods cannot return values) is easy to address as we've\nalready discussed.  Say the app developer needs a quick way to get the number of entries in\nthe list, a way that is visible to other actors they have running around.  We should simply\nallow them to define:\n\n```swift\n  extension TableModel {\n    actor func getNumberOfEntries() -> Int {\n      return theList.count\n    }\n  }\n````\n\nThis allows them to await the result from other actors:\n\n```swift\n  print(await dataModel.getNumberOfEntries())\n```\n\nThis dovetails perfectly with the rest of the async/await model.  It is unrelated to this\nmanifesto, but we'll observe that it would be more idiomatic way to\ndefine that specific example is as an `actor var`.  Swift currently doesn't allow property\naccessors to `throw` or be `async`.  When this limitation is relaxed, it would be\nstraight-forward to allow `actor var`s to provide the more natural API.\n\nNote that this extension makes the model far more usable in cases like this, but erodes the\n\"deadlock free\" guarantee of the actor model.  An await on an `actor` method suspends the\ncurrent task, and since you can get circular waits, you can end up with deadlock.  This is\nbecause only one message is processed by the actor at a time.  The simples case occurs\nif an actor waits on itself directly (possibly through a chain of references):\n\n```swift\n  extension TableModel {\n    actor func f() {\n       ...\n       let x = await self.getNumberOfEntries()   // trivial deadlock.\n       ...\n    }\n  }\n```\n\nThe trivial case like this can also be trivially diagnosed by the compiler.  The complex case\nwould ideally be diagnosed at runtime with a trap, depending on the runtime implementation\nmodel.\n\nThe solution for this is to encourage people to use `Void`-returning `actor` methods that \"fire\nand forget\".  There are several reasons to believe that these will be the most common: the\nasync/await model described syntactically encourages people not to use it (by requiring\nmarking, etc), many of the common applications of actors are event-driven applications\n(which are inherently one way), the eventual design of UI and other system frameworks\ncan encourage the right patterns from app developers, and of course documentation can\ndescribe best practices.\n\n### About that main thread\n\nThe example above shows `mainActor` being passed in, following theoretically pure actor\nhygiene.  However, the main thread in UIKit and AppKit are already global state, so we might\nas well admit that and make code everywhere nicer.  As such, it makes sense for AppKit and\nUIKit to define and vend a public global constant actor reference, e.g. something like this:\n\n```swift\npublic actor MainActor {  // Bikeshed: could be named \"actor UI {}\"\n   private init() {}      // You can't make another one of these.\n   // Helpful public stuff could be put here to make app developers happy. :-)\n}\npublic let mainActor = MainActor()\n```\n\nThis would allow app developers to put their extensions on `MainActor`, making their code\nmore explicit and clear about what *needs* to be run on the main thread.  If we got really\ncrazy, someday Swift should allow data members to be defined in extensions on classes,\nand App developers would then be able to put their state that must be manipulated on the\nmain thread directly on the MainActor.\n\n### Data isolation\n\nThe way that actors eliminate shared mutable state and explicit synchronization is through\ndeep copying all of the data that is passed to an actor in a message send, and preventing\ndirect access to actor state without going through these message sends.  This all composes\nnicely, but can quickly introduce inefficiencies in practice because of all the data copying\nthat happens.\n\nSwift is well positioned to deal with this for a number of reasons: its strong focus on value\nsemantics means that copying of these values is a core operation understood and known by\nSwift programmers everywhere.  Second, the use of Copy-On-Write (🐮) as an\nimplementation approach fits perfectly with this model.  Note how, in the example above,\nthe DataModel actor sends a copy of the `theList` array back to the UI thread so it can\nupdate itself.  In Swift, this is a super efficient O(1) operation that does some ARC stuff: it\ndoesn't actually copy or touch the elements of the array.\n\nThe third piece, which is still in development, will come as a result of the work on adding\n[ownership semantics](https://github.com/apple/swift/blob/master/docs/OwnershipManifesto.md)\nto Swift.  When this is available, advanced programmers will have the ability to *move*\ncomplex values between actors, which is typically also a super-efficient O(1) operation.\n\nThis leaves us with three open issues: 1) how do we know whether something has proper\nvalue semantics, 2) what do we do about reference types (classes and closures), and 3) what\ndo we do about global state.  All three of these options should be explored in detail, because\nthere are many different possible answers to these. I will explore a simple model below in\norder to provide an existence proof for a design, but I do not claim that it is the best model\nwe can find.\n\n#### Does a type provide proper value semantics?\n\nThis is something that many many Swift programmers have wanted to be able to know the\nanswer to, for example when defining generic algorithms that are only correct in the face of\nproper value semantics.  There have been numerous proposals for how to determine this,\nand I will not attempt to summarize them, instead I'll outline a simple proposal just to provide\nan existence proof for an answer:\n\n- Start by defining a simple marker protocol (the name of which is intentionally silly to reduce\n  early bikeshedding) with a single requirement:\n  `protocol ValueSemantical { func valueSemanticCopy() -> Self }`\n- Conform all of the applicable standard library types to `ValueSemantical`.  For example,\n  Array conforms when its elements conform - note that an array of reference types doesn't\n  always provide the semantics we need.\n- Teach the compiler to synthesize conformance for structs and enums whose members are\n  all `ValueSemantical`, just like we do for `Codable`.\n- The compiler just checks for conformance to the `ValueSemantical` protocol and\n  rejects any arguments and return values that do not conform.\n\nTo reiterate, the name `ValueSemantical` really isn't the right name for this: things like\n`UnsafePointer`, for example, shouldn't conform.  Enumerating the possible options and\nevaluating the naming tradeoffs between them is a project for another day though.\n\nIt is important to realize that this design does *not guarantee memory safety*.  Someone\ncould implement the protocol in the wrong way (thus lying about satisfying the requirements)\nand shared mutable state could occur.  In the author's opinion, this is the right tradeoff:\nsolving this would require introducing onerous type system mechanics (e.g. something like\nthe capabilities system in the [Pony](https://www.ponylang.org/) language).  Swift already\nprovides a model where memory safe APIs (e.g. `Array`) are implemented in terms of memory\nunsafety (e.g. `UnsafePointer`), the approach described here is directly analogous.\n\n*Alternate Design*: Another approach is to eliminate the requirement from the protocol:\njust use the protocol as a marker, which is applied to types that already have the right\nbehavior.  When it is necessary to customize the copy operation (e.g. for a reference type),\nthe solution would be to box values of that type in a struct that provides the right value\nsemantics.  This would make it more awkward to conform, but this design eliminates having\n\"another kind of copy\" operation, and encourages more types to provide value semantics.\n\n#### Reference types: Classes\n\nThe solution to this is simple: classes need to conform to `ValueSemantical` (and\nimplement the requirement) properly, or else they cannot be passed as a parameter or result\nof an `actor` method.  In the author's opinion, giving classes proper value semantics will not\nbe that big of a deal in practice for a number of reasons:\n\n- The default (non-conformance) is the right default: the only classes that conform will be\n  ones that a human thought about.\n- Retroactive conformance allows app developers to handle cases not addressed by the\n  framework engineers.\n- Cocoa has a number of classes (e.g. the entire UI frameworks) that are only usable on the\n  main thread.  By definition, these won't get passed around.\n- A number of classes in Cocoa are already semantically immutable, making it trivial and\n  cheap for them to conform.\n\nBeyond that, when you start working with an actor system, it is an inherent part of the\napplication design that you don't allocate and pass around big object graphs: you allocate\nthem in the actor you intend to manipulate them with.  This is something that has been\nfound true in Scala/Akka for example.\n\n#### Reference types: Closures and Functions\n\nIt is not safe to pass an arbitrary value with function type across an actor message,\nbecause it could close over arbitrary actor-local data.  If that data is closed over\nby-reference, then the recipient actor would have arbitrary access to data in the sending\nactor's state.  That said, there is at least one important exception that we should carve\nout: it is safe to pass a closure *literal* when it is known that it only closes over\ndata by copy: using the same `ValueSemantical` copy semantics described above.\n\nThis happens to be an extremely useful carveout, because it permits some interesting \"callback\"\nabstractions to be naturally expressed without tight coupling between actors.  Here is a silly\nexample:\n\n```swift\n    otherActor.doSomething { self.incrementCount($0) }\n```\n\nIn this case `OtherActor` doesn't have to know about `incrementCount` which is defined\non the self actor, reducing coupling between the actors.\n\n#### Global mutable state\n\nSince we're friends, I'll be straight with you: there are no great answers here.  Swift and C\nalready support global mutable state, so the best we can do is discourage the use of it.  We\ncannot automatically detect a problem because actors need to be able to transitively use\nrandom code that isn't defined on the actor.  For example:\n\n```swift\nfunc calculate(thing : Int) -> Int { ... }\n\nactor Foo {\n  actor func exampleOperation() {\n     let x = calculate(thing: 42)\n     ...\n  }\n}\n```\n\nThere is no practical way to know whether 'calculate' is thread-safe or not.  The only solution\nis to scatter tons of annotations everywhere, including in headers for C code.  I think that\nwould be a non-starter.\n\nIn practice, this isn't as bad as it sounds, because the most common operations\nthat people use (e.g. `print`) are already internally synchronizing, largely because people are\nalready writing multithreaded code.  While it would be nice to magically solve this long\nstanding problem with legacy systems, I think it is better to just completely ignore it and tell\ndevelopers not to define or use global variables (global `let`s are safe).\n\nAll hope is not lost though: Perhaps we could consider deprecating global `var`s from Swift\nto further nudge people away from them. Also, any accesses to unsafe global global mutable\nstate from an actor context can and should be warned about.  Taking some steps like this\nshould eliminate the most obvious bugs.\n\n### Scalable Runtime\n\nThus far, we've dodged the question about how the actor runtime should be implemented.\nThis is intentional because I'm not a runtime expert!  From my perspective, building on top\nof GCD is great if it can work for us, because it is proven and using it reduces risk from the\nconcurrency design.  I also think that GCD is a\nreasonable baseline to start from: it provides the right semantics, it has good low-level\nperformance, and it has advanced features like Quality of Service support which are just as\nuseful for actors as they are for anything else.  It would be easy to provide access to these\nadvanced features by giving every actor a `gimmeYourQueue()` method.\n\nHere are some potential issues using GCD which we will need to be figure out:\n\n**Kernel Thread Explosions**\n\nOur goal is to allow actors to be used as a core unit of abstraction within a program, which\nmeans that we want programmers to be able to create as many of them as they want, without\nrunning into performance problems.  If scalability problems come up, you end up having to\naggregate logically distinct stuff together to reduce # actors, which leads to complexity\nand loses some of the advantages of data isolation.  The model as proposed should scale\nexceptionally well, but depends on the runtime to make this happen in practice.\n\nGCD is already quite scalable, but one concern is that it can be subject to kernel thread\nexplosions, which occur when a\nGCD task blocks in a way that the kernel and runtime cannot reason about.  In response,\nthe GCD runtime allocates new kernel threads, each of which get a stack... and these stacks\ncan fragment the heap.  This is problematic in the case\nof a server workload that wants to instantiate hundreds of thousands of actors - at\nleast one for every incoming network connection.\n\nProvably solving thread explosions is probably impossible/impractical in any runtime given\nthe need to interoperate with C code and legacy systems that aren't built in pure Swift.  That\nsaid, perfection isn't necessary: we just need a path that moves towards it, and provides\nprogrammers a way to \"get their job done\" when an uncooperative framework or API is hit\nin practice.  I'd suggest a three step approach to resolving this:\n\n - Make existing frameworks incrementally \"async safe\" over time.  Ensure that new APIs are\n   done right, and make sure that no existing APIs ever go from “async safe” to “async unsafe”.\n - Provide a mechanism that developers can use to address problematic APIs that they\n    encounter in practice.  It should be something akin to “wrap your calls in a closure and\n    pass it to a special GCD function”, or something else of similar complexity.\n- Continue to improve perf and debugger tools to help identify problematic cases that occur\n   in practice.\n\nThis approach of focusing on problematic APIs that developers hit in practice should work\nparticularly well for server workloads, which are the ones most likely to need a large number of\nactors at a single time.  Legacy server libraries are also much more likely to be async friendly\nthan arbitrary other C code.\n\n**Actor Shutdown**\n\nThere are also questions about how actors are shut down.  The conceptually ideal model is\nthat actors are implicitly released when their reference count drops to zero and when the last\nenqueued message is completed.  This will probably require some amount of runtime\nintegration.\n\n**Bounded Queue Depths**\n\nAnother potential concern is that GCD queues have unbounded depth: if you have a\nproducer/consumer situation, a fast producer can outpace the consumer and continuously\ngrow the queue of work.  It would be interesting to investigate options for\nproviding bounded queues that throttle or block the producer in this sort of situation.  Another\noption is to make this purely an API problem, encouraging the use of reactive streams and\nother abstractions that provide back pressure.\n\n### Alternative Design: Actors as classes\n\nThe design above is simple and self consistent, but may not be the right model, because\nactors have a ton of conceptual overlap with classes.  Observe:\n\n- Actors have reference semantics, just like classes.\n- Actors form a graph, this means that we need to be able to have `weak`/`unowned`\n  references to them.\n- Subclassing of actors makes just as much sense as subclassing of classes, and would\n  work the same way.\n- Some people incorrectly think that Swift hates classes: this is an opportunity to restore\n  some of their former glory.\n\nHowever, actors are not *simple classes*: here are some differences:\n\n- Only actors can have `actor` methods on them.  These methods have additional\n  requirements put on them in order to provide the safety in the programming model we seek.\n- An \"actor class\" deriving from a \"non-actor base class\" would have to be illegal, because\n  the base class could escape self or escape local state references in an unsafe way.\n\nOne important pivot-point in discussion is whether subclassing of actors is desirable.  If so,\nmodeling them as a special kind of class would be a very nice simplifying assumption,\nbecause a lot of complexity comes in with that (including all the initialization rules etc).  If not,\nthen defining them as a new kind of type is defensible, because they'd be very simple and\nbeing a separate type would more easily explain the additional rules imposed on them.\n\nSyntactically, if we decided to make them classes, it makes sense for this to be a modifier\non the class definition itself, since actorhood fundamentally alters the contract of the class,\ne.g.:\n\n```swift\nactor class DataModel : SomeBaseActor { ... }\n```\n\nalternatively, since you can't derive from non-actor classes anyway, we could just make the\nbase class be `Actor`:\n\n```swift\nclass DataModel : Actor { ... }\n```\n\n### Further extensions\n\nThe design sketch above is the minimal but important step forward to build concurrency\nabstractions into the language, but really filling out the model will almost certainly require a\nfew other common abstractions.  For example:\n\n- [Reactive streams](https://en.wikipedia.org/wiki/Reactive_Streams) is a common way to\n  handle communication between async actors, and helps provide solutions to backpressure.\n  [Dart's stream design](https://www.dartlang.org/tutorials/language/streams) is one example.\n  \n- Relatedly, it makes sense to extend the `for/in` loop to asynchronous sequences - likely through the introduction of\n   a new `AsyncSequence` protocol.  FWIW, this is likely to be added to\n   [C# 8.0](https://channel9.msdn.com/Blogs/Seth-Juarez/A-Preview-of-C-8-with-Mads-Torgersen#time=16m30s).\n\n - A first class `Future` type is commonly requested.  I expect the importance of it\n   to be far less than in languages that don't have (or who started without) async/await,\n   but it is still a very useful abstraction for handling cases where you want to kick off simple\n   overlapping computations within a function.\n\n### Intra-actor concurrency\n\nAnother advanced concept that could be considered is allowing someone to define a\n\"multithreaded actor\", which provides a standard actor API, but where synchronization and\nscheduling of tasks is handled by the actor itself, using traditional synchronization\nabstractions instead of a GCD queue.  Adding this would mean that there is shared mutable\nstate *within* the actor, but that isolation *between* actors is still preserved.  This is\ninteresting to consider for a number of reasons:\n\n- This allows the programming model to be consistent (where an \"instance of an actor\n   represents a thing\") even when the thing can be implemented with internal concurrency.\n   For example, consider an abstraction for a network card/stack: it may want to do its own\n   internal scheduling and prioritizing of many different active pieces of work according to its\n   own policies, but provide a simple-to-use actor API on top if that.  The fact that the actor\n   can handle multiple concurrent requests is an implementation detail the clients shouldn’t\n   have to be rewritten to understand.\n\n- Making this non-default would provide proper progressive disclosure of complexity.\n\n- You’d still get improved safety and isolation of the system as a whole, even if individual actors are “optimized” in this way.\n\n- When incrementally migrating code to the actor model, this would make it much easier to\n   provide actor wrappers for existing concurrent subsystems built on shared mutable state\n   (e.g. a database whose APIs are threadsafe).\n\n- Something like this would also probably be the right abstraction for imported RPC services\n  that allow for multiple concurrent synchronous requests.\n  \n- This abstraction would be unsafe from the memory safety perspective, but this is widely\n   precedented in Swift.  Many safe abstractions are built on top of memory unsafe\n   primitives - consider how `Array` is built on `UnsafePointer` - and this is an important\n   part of the pragmatism and \"get stuff done\" nature of the Swift programming model.\n\nThat said, this is definitely a power-user feature, and we should understand, build, and get\nexperience using the basic system before considering adding something like this.\n\n\n## Part 3: Reliability through fault isolation\n\nSwift has many aspects of its design that encourages programmer errors (aka software\nbugs :-) to be caught at compile time: a static type system, optionals, encouraging covered\nswitch cases, etc.  However, some errors may only be caught at runtime, including things like\nout-of-bound array accesses, integer overflows, and force-unwraps of nil.\n\nAs described in the [Swift Error Handling\nRationale](https://github.com/apple/swift/blob/master/docs/ErrorHandlingRationale.rst), there\nis a tradeoff that must be struck: it doesn't make sense to force programmers to write logic\nto handle every conceivable edge case: even discounting the boilerplate that would generate,\nthat logic is likely to itself be poorly tested and therefore full of bugs.  We must carefully\nweigh and tradeoff complex issues in order to get a balanced design.  These tradeoffs are\nwhat led to Swift's approach that does force programmers to think about and write code to\nhandle all potentially-nil pointer references, but not to have to think about integer overflow on\nevery arithmetic operation.  The new challenge is that integer overflow still must be\ndetected and handled somehow, and the programmer hasn't written any recovery code.\n\nSwift handles these with a [fail fast](https://en.wikipedia.org/wiki/Fail-fast) philosophy: it is\npreferable to detect and report a programmer error as quickly as possible, rather than\n\"blunder on\" with the hope that the error won't matter.  Combined with rigorous testing (and\nperhaps static analysis technology in the future), the goal is to make bugs shallow, and provide\ngood stack traces and other information when they occur.  This encourages them to be found\nand fixed quickly early in the development cycle.  However, when the app ships, this\nphilosophy is only great if all the bugs were actually found, because an undetected problem\ncauses the app to suddenly terminate itself.\n\nSudden termination of a process is hugely problematic if it jeopardizes user data, or - in the\ncase of a server app - if there are hundreds of clients currently connected to the server at the\ntime.  While it is impossible in general to do perfect resolution of an arbitrary programmer\nerror, there is prior art for how handle common problems gracefully.  In the case of Cocoa,\nfor example, if an `NSException` propagates up to the top of the runloop, it is useful to try to\nsave any modified documents to a side location to avoid losing data.  This isn't guaranteed\nto work in every case, but when it does, the\nuser is very happy that they haven't lost their progress.  Similarly, if a server crashes\nhandling one of its client's requests, a reasonable recovery scheme is to finish handling the\nother established connections in the current process, but push off new connection requests\nto a restarted instance of the server process.\n\nThe introduction of actors is a great opportunity to improve this situation, because actors\nprovide an interesting granularity level between the \"whole process\" and \"an individual class\"\nwhere programmers think about the invariants they are maintaining.  Indeed, there is a bunch\nof prior art in making reliable actor systems, and again, Erlang is one of the leaders (for a\ngreat discussion, see [Joe Armstrong's PhD thesis](http://erlang.org/download/armstrong_thesis_2003.pdf)).  We'll\nstart by sketching the basic model, then talk about a potential design approach.\n\n### Actor Reliability Model\n\nThe basic concept here is that an actor that fails has violated its own local invariants, but that\nthe invariants in other actors still hold: this because we've defined away shared\nmutable state.  This gives us the option of killing the individual actor that broke its invariants\ninstead of taking down the entire process.  Given the definition of the basic actor model\nwith unidirectional async message sends, it is possible to have the runtime just drop any new\nmessages sent to the actor, and the rest of the system can continue without even knowing\nthat the actor crashed.\n\nWhile this is a simple approach, there are two problems:\n\n- Actor methods that return a value could be in the process of being `await`ed, but if the\n  actor has crashed those awaits will never complete.\n- Dropping messages may itself cause deadlock because of higher-level communication\n  invariants that are broken.  For example, consider this actor, which waits for 10 messages\n  before passing on the message:\n  \n```swift\n  actor Merge10Notifications {\n    var counter : Int = 0\n    let otherActor = ...  // set up by the init.\n    actor func notify() {\n      counter += 1\n      if counter >= 10 {\n        otherActor.notify()\n      }\n    }\n  }\n```\n\nIf one of the 10 actors feeding notifications into this one crashes, then the program will wait\nforever to get that 10th notification.  Because of this, someone designing a \"reliable\" actor\nneeds to think about more issues, and work slightly harder to achieve that reliability.\n\n### Opting into reliability\n\nGiven that a reliable actor requires more thought than building a simple actor, it is reasonable\nto look for opt-in models that provide [progressive disclosure of\ncomplexity](https://en.wikipedia.org/wiki/Progressive_disclosure).  The first thing\nyou need is a way to opt in.  As with actor syntax in general, there are two\nbroad options: first-class actor syntax or a class declaration modifier, i.e., one of:\n\n```swift\n  reliable actor Notifier { ... }\n  reliable actor class Notifier { ... }\n```\n\nWhen one opts an actor into caring about reliability, a new requirement is imposed on all\n`actor` methods that return a value: they are now required to be declared `throws` as well.\nThis forces clients of the actor to be prepared for a failure when/if the actor crashes.\n\nImplicitly dropping messages is still a problem.  I'm not familiar with the approaches taken in\nother systems, but I imagine two potential solutions:\n\n1) Provide a standard library API to register failure handlers for actors, allowing higher level\n   reasoning about how to process and respond to those failures.  An actor's `init()` could\n   then use this API to register its failure handler the system.\n2) Force *all* `actor` methods to throw, with the semantics that they only throw if the actor\n   has crashed.  This forces clients of the reliable actor to handle a potential crash, and do so\n   on the granularity of all messages sent to that actor.\n  \nBetween the two, the first approach is more appealing to me, because it allows factoring\nout the common failure logic in one place, rather than having every caller have to write (hard\nto test) logic to handler the failure in a fine grained way.  For example, a document actor could\nregister a failure handler that attempts to save its data in a side location if it ever crashes.\n\nThat said, both approaches are feasible and should be explored in more detail.\n\n*Alternate design*: An alternate approach is make all actors be \"reliable\" actors, by making\nthe additional constraints a simple part of the actor model.  This reduces the number of\nchoices a Swift programmer gets-to/has-to make.  If the async/await model ends up making\nasync imply throwing, then this is probably the right direction, because the `await` on a value\nreturning method would be implicitly a `try` marker as well.\n\n### Reliability runtime model\n\nBesides the high level semantic model that the programmer faces, there are also questions\nabout what the runtime model is.  When an actor crashes:\n\n - What state is its memory left in?\n - How well can the process clean up from the failure?\n - Do we attempt to release memory and other resources (like file descriptors) managed by that actor?\n\nThere are multiple possible designs, but I\nadvocate for a design where **no cleanup is performed**: if an actor crashes, the runtime\npropagates that error to other actors and runs any recovery handlers (as described in the\nprevious section) but that it **should not** attempt further clean up the resources owned by\nthe actor.\n\nThere are a number of reasons for this, but the most important is that the failed actor just\nviolated its own consistency with whatever invalid operation it attempted to perform.  At this\npoint, it may have started a transaction but not finished it, or may be in any other sort of\ninconsistent or undefined state.  Given the high likelihood for internal inconsistency, it is\nprobable that the high-level invariants of various classes aren't intact, which means it isn't\nsafe to run the `deinit`-ializers for the classes.\n\nBeyond the semantic problems we face, there are also practical complexity and efficiency\nissues at stake: it takes code and metadata to be able to unwind the actor's stack and release\nactive resources.  This code and metadata takes space in the application, and it also takes\ntime at compile time to generate it.  As such, the choice to provide a model that attempted\nto recover from these\nsorts of failures would mean burning significant code size and compile time for something\nthat isn't supposed to happen.\n\nA final (and admittedly weak) reason for this approach is that a \"too clean\" cleanup runs the\nrisk that programmers will start treating fail-fast conditions as a soft error that\ndoesn't need to be handled with super-urgency.  We really do want these bugs to be found\nand fixed in order to achieve the high reliability software systems that we seek.\n\n## Part 4: Improving system architecture\n\nAs described in the motivation section, a single application process runs in the context of a\nlarger system: one that often involves multiple processes (e.g. an app and an XPC daemon)\ncommunicating through [IPC](https://www.mikeash.com/pyblog/friday-qa-2009-01-16.html),\nclients and servers communicating through networks, and\nservers communicating with each other in \"[the cloud](https://tr4.cbsistatic.com/hub/i/r/2016/11/29/9ea5f375-d0dd-4941-891b-f35e7580ae27/resize/770x/982bcf36f7a68242dce422f54f8d445c/49nocloud.jpg)\" (using\nJSON, protobufs, GRPC, etc...).  The points\nof similarity across all of these are that they mostly consist of independent tasks that\ncommunicate with each other by sending structured data using asynchronous message\nsends, and that they cannot practically share mutable state.  This is starting to sound familiar.\n\nThat said, there are differences as well, and attempting to papering over them (as was done\nin the older Objective-C \"[Distributed\nObjects](https://www.mikeash.com/pyblog/friday-qa-2009-02-20-the-good-and-bad-of-distributed-objects.html)\" system)\nleads to serious problems:\n\n- Clients and servers are often written by different entities, which means that APIs must be\nable to evolve independently.  Swift is already great at this.\n- Networks introduce new failure modes that the original API almost certainly did not\n  anticipate.  This is covered by \"reliable actors\" described above.\n- Data in messages must be known-to-be `Codable`.\n- Latency is much higher to remote systems, which can impact API design because\n  too-fine-grained APIs perform poorly.\n\nIn order to align with the goals of Swift, we cannot sweep these issues under the rug: we\nwant to make the development process fast, but \"getting something up and running\" isn't the\ngoal: it really needs to work - even in the failure cases.\n\n### Design sketch for interprocess and distributed compute\n\nThe actor model is a well-known solution in this space, and has been deployed\nsuccessfully in less-mainstream languages like\n[Erlang](https://en.wikipedia.org/wiki/Erlang_(programming_language)#Concurrency_and_distribution_orientation).\nBringing the ideas to Swift just requires that we make sure it fits cleanly into the existing\ndesign, taking advantage of the characteristics of Swift and ensuring that it stays true to the\nprinciples that guide it.\n\nOne of these principles is the concept of [progressive disclosure of\ncomplexity](https://en.wikipedia.org/wiki/Progressive_disclosure): a Swift developer\nshouldn't have to worry about IPC or distributed compute if they don't care about it.  This\nmeans that actors should opt-in through a new declaration modifier, aligning with the ultimate\ndesign of the actor model itself, i.e., one of:\n\n```swift\n  distributed actor MyDistributedCache { ... }\n  distributed actor class MyDistributedCache { ... }\n```\n\nBecause it has done this, the actor is now subject to two additional requirements.\n\n - The actor must fulfill the requirements of a `reliable actor`, since a\n   `distributed actor` is a further refinement of a reliable actor.  This means that all\n   value returning `actor` methods must throw, for example.\n - Arguments and results of `actor` methods must conform to `Codable`.\n \n In addition, the author of the actor should consider whether the `actor` methods make\n sense in a distributed setting, given the increased latency that may be faced.  Using coarse\n grain APIs could be a significant performance win.\n \nWith this done, the developer can write their actor like normal: no change of language or\ntools, no change of APIs, no massive new conceptual shifts.  This is true regardless of\nwhether you're talking to a cloud service endpoint over JSON or an optimized API using\nprotobufs and/or GRPC.  There are very few cracks that appear in the model, and the ones\nthat do have pretty obvious reasons: code that mutates global\nstate won't have that visible across the entire application architecture, files created in the file\nsystem will work in an IPC context, but not a distributed one, etc.\n\nThe app developer can now put their actor in a package, share it between their app and their\nservice.  The major change in code is at the allocation site of `MyDistributedCache`, which\nwill now need to use an API to create the actor in another process instead of calling its\ninitializer directly.  If you want to start using a standard cloud API, you should be able to\nimport a package that vends that API as an actor interface, allowing you to completely\neliminate your code that slings around JSON blobs.\n\n### New APIs required\n\nThe majority of the hard part of getting this to work is on the framework side, for example,\nit would be interesting to start building things like:\n\n- New APIs need to be built to start actors in interesting places: IPC contexts, cloud\n  providers, etc.  These APIs should be consistent with each other.\n- The underlying runtime needs to be built, which handles the serialization, handshaking,\n  distributed reference counting of actors, etc.\n- To optimize IPC communications with shared memory (mmaps), introduce a new protocol\n  that refines `ValueSemantical`.  Heavy weight types can then opt into using it where it\n  makes sense.\n- A DSL that describes cloud APIs should be built (or an existing one adopted) to\n  autogenerate the boilerplate necessary to vend an actor API for a cloud service.\n\nIn any case, there is a bunch of work to do here, and it will take multiple years to prototype,\nbuild, iterate, and perfect it.  It will be a beautiful day when we get here though.\n\n## Part 5: The crazy and brilliant future\n\nLooking even farther down the road, there are even more opportunities to eliminate\naccidental complexity by removing arbitrary differences in our language, tools, and APIs.\nYou can find these by looking for places with asynchronous communications patterns,\nmessage sending and event-driven models, and places where shared mutable state doesn't\nwork well.\n\nFor example, GPU compute and DSP accelerators share all of these characteristics: the\nCPU talks to the GPU through asynchronous commands (e.g. sent over DMA requests and\ninterrupts).  It could make sense to use a subset of Swift code (with new APIs for GPU\nspecific operations like texture fetches) for GPU compute tasks.\n\nAnother place to look is event-driven applications like interrupt handlers in embedded\nsystems, or asynchronous signals in Unix.  If a Swift script wants to sign up for notifications\nabout `SIGWINCH`, for example, it should be easy to do this by registering your actor and\nimplementing the right method.\n\nGoing further, a model like this begs for re-evaluation of some long-held debates in the software\ncommunity, such as the divide between microkernels and monolithic kernels.  Microkernels\nare generally considered to be academically better (e.g. due to memory isolation of different\npieces, independent development of drivers from the kernel core, etc), but monolithic kernels\ntend to be more pragmatic (e.g. more efficient).  The proposed model allows some really\ninteresting hybrid approaches, and allows subsystems to be moved \"in process\" of the main\nkernel when efficiency is needed, or pushed \"out of process\" when they are untrusted or\nwhen reliability is paramount, all without rewriting tons of code to achieve it.  Swift's focus on\nstable APIs and API resilience also encourages and enables a split between the core kernel\nand driver development.\n\nIn any case, there is a lot of opportunity to make the software world better, but it is also a\nlong path to carefully design and build each piece in a deliberate and intentional way.  Let's\ntake one step at a time, ensuring that each is as good as we can make it.\n\n# Learning from other concurrency designs\n\nWhen designing a concurrency system for Swift, we should look at the designs of other\nlanguages to learn from them and ensure we have the best possible system.  There are\nthousands of different programming languages, but most have very small communities, which\nmakes it hard to draw practical lessons out from those communities.  Here we look at a few\ndifferent systems, focusing on how their concurrency design works, ignoring syntactic and\nother unrelated aspects of their design.\n\n### Pony\n\nPerhaps the most relevant active research language is the [Pony programming\nlanguage](https://www.ponylang.org).  It is actor-based and uses them along with other techniques\nto provide a type-safe, memory-safe, deadlock-free, and datarace-free programming model.\nThe biggest\nsemantic difference between the Pony design and the Swift design is that Pony invests a\nlot of design complexity into providing reference capabilities, which impose a high\nlearning curve.  In contrast, the model proposed here builds on Swift's mature system of\nvalue semantics.  If transferring object graphs between actors (in a guaranteed memory safe\nway) becomes important in the future, we can investigate expanding the [Swift Ownership\nModel](https://github.com/apple/swift/blob/master/docs/OwnershipManifesto.md) to\ncover more of these use-cases.\n\n\n### Akka Actors in Scala\n\n[Akka](http://akka.io) is a framework written in the [Scala programming\nlanguage](https://www.scala-lang.org), whose mission is to \"Build powerful reactive,\nconcurrent, and distributed applications more easily\".  The key to this is their well developed\n[Akka actor system](http://doc.akka.io/docs/akka/current/scala/actors.html), which is the\nprinciple abstraction that developers use to realize these goals (and it, in turn, was heavily\ninfluenced by [Erlang](https://www.erlang.org).  One of the great things about\nAkka is that it is mature and widely used by a lot of different organizations and people.  This\nmeans we can learn from its design, from the design patterns the community has explored,\nand from experience reports describing how well it works in practice.\n\nThe Akka design shares a lot of similarities to the design proposed here, because it is an\nimplementation of the same actor model.  It is built on futures, asynchronous message sends,\neach actor is a unit of concurrency, there are well-known patterns for when and how actor\nshould communicate, and Akka supports easy distributed computation (which they call\n\"location transparency\").\n\nOne difference between Akka and the model described here is that Akka is a library feature,\nnot a language feature.  This means that it can't provide additional type system and safety\nfeatures that the model we describe does.  For example, it is possible to accidentally [share\nmutable state](https://manuel.bernhardt.io/2016/08/02/akka-anti-patterns-shared-mutable-state/)\nwhich leads to bugs and erosion of the model.  Their message loops are also manually written\nloops with pattern matching, instead of being automatically dispatched to `actor` methods -\nthis leads to somewhat more boilerplate.  Akka actor messages are untyped (marshalled\nthrough an Any), which can lead to surprising bugs and difficulty reasoning about what the\nAPI of an actor is (though the [Akka\nTyped](http://doc.akka.io/docs/akka/2.5.3/scala/typed.html) research project is exploring\nways to fix this).  Beyond that though, the two models are very comparable - and, no, this\nis not an accident.\n\nKeeping these differences in mind, we can learn a lot about how well the model works in\npractice, by reading the numerous blog posts and other documents available online,\nincluding, for example:\n - Lots of [Tutorials](http://danielwestheide.com/blog/2013/02/27/the-neophytes-guide-to-scala-part-14-the-actor-approach-to-concurrency.html)\n- [Best practices and design patterns](https://www.safaribooksonline.com/library/view/applied-akka-patterns/9781491934876/ch04.html)\n- Descriptions of the ease and benefits of [sharding servers written in Akka](http://michalplachta.com/2016/01/23/scalability-using-sharding-from-akka-cluster/)\n- Success reports from lots of folks.\n\nFurther, it is likely that some members of the Swift community have encountered this\nmodel, it would be great if they share their experiences, both positive and negative.\n\n### Go\n\nThe [Go programming language](https://golang.org) supports a first-class approach to\nwriting concurrent programs based on goroutines and (bidirectional) channels. This model\nhas been very popular in the Go community and directly reflects many of the core values of\nthe Go language, including simplicity and preference for programming with low levels of\nabstraction.  I have no evidence that this is the case, but I speculate that this model was\ninfluenced by the domains that Go thrives in: the Go model of channels and communicating\nindependent goroutines almost directly reflects how servers communicate over network\nconnections (including core operations like `select`).\n\nThe proposed Swift design is higher abstraction than the Go model, but directly reflects one\nof the most common patterns seen in Go: a goroutine whose body is an infinite loop over a\nchannel, decoding messages to the channel and acting on them.  Perhaps the most simple\nexample is this Go code (adapted from [this blog\npost](https://www.golang-book.com/books/intro/10)):\n\n```go\nfunc printer(c chan string) {\n  for {\n    msg := <- c\n    fmt.Println(msg)\n  }\n}\n```\n\n... is basically analogous to this proposed Swift code:\n\n```swift\nactor Printer {\n  actor func print(message: String) {\n    print(message)\n  }\n}\n```\n\nThe Swift design is more declarative than the Go code, but doesn't show many advantages\nor disadvantages in something this small.  However, with more realistic examples, the\nadvantages of the higher-level declarative approach show benefit.  For example,\nit is common for goroutines to listen on multiple channels, one for each message they\nrespond to.  This example (borrowed from [this blog\npost](http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang/)) is fairly\ntypical:\n\n```go\n// Worker represents the worker that executes the job\ntype Worker struct {\n  WorkerPool  chan chan Job\n  JobChannel  chan Job\n  quit        chan bool\n}\n\nfunc NewWorker(workerPool chan chan Job) Worker {\n  return Worker{\n    JobChannel: make(chan Job),\n    quit:       make(chan bool)}\n}\n\nfunc (w Worker) Start() {\n  go func() {\n    for {\n      select {\n      case job := <-w.JobChannel:\n        // ...\n      case <-w.quit:\n        // ...\n      }\n    }\n  }()\n}\n\n// Stop signals the worker to stop listening for work requests.\nfunc (w Worker) Stop() {\n  go func() {\n    w.quit <- true\n  }()\n}\n```\n\nThis sort of thing is much more naturally expressed in our proposal model:\n\n```swift\nactor Worker {\n  actor func do(job: Job) {\n    // ...\n  }\n\n  actor func stop() {\n    // ...\n  }\n}\n```\n\nThat said, there are advantages and other tradeoffs to the Go model as well.  Go builds on\n[CSP](https://en.wikipedia.org/wiki/Communicating_sequential_processes), which allows\nmore adhoc structures of communication.  For example, because\ngoroutines can listen to multiple channels it is occasionally easier to set up some (advanced)\ncommunication patterns.  Synchronous messages to a channel can only be completely sent\nif there is something listening and waiting for them, which can lead to performance\nadvantages (and some disadvantages).  Go doesn't\nattempt to provide any sort of memory safety or data isolation, so goroutines have the\nusual assortment of mutexes and other APIs to use, and are subject to standard bugs like\ndeadlocks and [data races](http://accelazh.github.io/go/Goroutine-Can-Race).  Races can\neven break [memory safety](https://research.swtch.com/gorace).\n\nI think that the most important thing the Swift community can learn from Go's concurrency\nmodel is the huge benefit that comes from a highly scalable runtime model.  It is common to\nhave hundreds of thousands or even a million goroutines running around in a server.  The\nability to stop worrying about \"running out of threads\" is huge, and is one of the key decisions\nthat contributed to the rise of Go in the cloud.\n\nThe other lesson is that (while it is important to have a \"best default\" solution to reach for in\nthe world of concurrency) we shouldn't overly restrict the patterns that developers are allowed\nto express.  This is a key reason why the async/await design is independent of futures or any\nother abstraction.  A channel library in Swift will be as efficient as the one in Go, and if shared\nmutable state and channels are the best solution to some specific problem, then we should\nembrace that fact, not hide from it.  That said, I expect these cases to be very rare :-)\n\n### Rust\n\nRust's approach to concurrency builds on the strengths of its ownership system to allow\nlibrary-based concurrency patterns to be built on top.  Rust supports message passing\n(through channels), but also support locks and other typical abstractions for shared mutable\nstate.  Rust's approaches are well suited for systems programmers, which are the primary\ntarget audience of Rust.\n\nOn the positive side, the Rust design provides a lot of flexibility, a wide range of different\nconcurrency primitives to choose from, and familiar abstractions for C++ programmers.\n\nOn the downside, their ownership model has a higher learning curve than the design\ndescribed here, their abstractions are typically very low level (great for systems programmers,\nbut not as helpful for higher levels), and they don't provide much guidance for programmers\nabout which abstractions to choose, how to structure an application, etc.  Rust also doesn't\nprovide an obvious model to scale into distributed applications.\n\nThat said, improving synchronization for Swift systems programmers will be a goal once the\nbasics of the [Swift Ownership\nModel](https://github.com/apple/swift/blob/master/docs/OwnershipManifesto.md) come\ntogether.  When that happens, it makes sense to take another look at the Rust abstractions\nto see which would make sense to bring over to Swift.\n", "url": "https://wpnews.pro/news/swift-concurrency-manifesto", "canonical_source": "https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782", "published_at": "2017-08-11 01:35:38+00:00", "updated_at": "2026-05-23 07:36:09.709834+00:00", "lang": "en", "topics": ["developer-tools", "open-source"], "entities": ["Chris Lattner", "Jason Yu", "Swift"], "alternates": {"html": "https://wpnews.pro/news/swift-concurrency-manifesto", "markdown": "https://wpnews.pro/news/swift-concurrency-manifesto.md", "text": "https://wpnews.pro/news/swift-concurrency-manifesto.txt", "jsonld": "https://wpnews.pro/news/swift-concurrency-manifesto.jsonld"}}