diff --git a/apps/labs/posts/python-abi-abi3t.md b/apps/labs/posts/python-abi-abi3t.md new file mode 100644 index 000000000..5f8c86330 --- /dev/null +++ b/apps/labs/posts/python-abi-abi3t.md @@ -0,0 +1,592 @@ +--- +title: 'What Every Python Developer Should Know About the CPython ABI' +authors: [nathan-goldbaum] +published: June 25, 2026 +description: 'An introduction to the concept of the Application Binary Interface (ABI), the various CPython ABIs, and the new abi3t stable ABI in Python 3.15.' +category: [PyData ecosystem] +featuredImage: + src: /posts/python-abi-abi3t/cpython_api_layers_listing.png + alt: 'Five nested ellipses illustrating the layering of the Python C API. The outermost ellipse is gray and labeled "Internal API". The next enclosed ellipse is red and is labeled "Private API". The next enclosed ellipse is yellow and is labeled "Unstable API". The next enclosed ellipse is blue and labeled "Version-specific API". The next enclosed ellipse is green and is labeled "Limited API".' +hero: + imageSrc: /posts/python-abi-abi3t/cpython_api_layers_hero.png + imageAlt: 'Five nested ellipses illustrating the layering of the Python C API. The outermost ellipse is gray and labeled "Internal API". The next enclosed ellipse is red and is labeled "Private API". The next enclosed ellipse is yellow and is labeled "Unstable API". The next enclosed ellipse is blue and labeled "Version-specific API". The next enclosed ellipse is green and is labeled "Limited API".' +--- + +# What Every Python Developer Should Know About the CPython ABI + +The CPython Application Binary Interface (ABI) backs Python's main superpower: the ability to easily call into native C, C++, Rust, or Fortran code and for that code to call back into the interpreter and update the state of Python objects. + +What exactly is an ABI? How does it differ from an Application Programming Interface (API)? +Why is this post's title with "What _Every_ Python Developer Should Know about +the CPython ABI"? +Aren't these low-level details the kind of thing we can ignore most of the time in a high-level language like Python? + +In this post I hope to answer all these questions and build up your intuition about these topics. +I also hope you'll learn some useful information about how Python projects that include native extensions are distributed, what the ABI compatibility tags that show up in wheel filenames mean, and how projects can choose to target different Python ABIs depending on the tradeoffs they want to make. +By the end, you should be able to look at any wheel filename and know which Python interpreters it will install on — and why. + +## The CPython C API and the Python ABI + +The Python interpreter does a bit of a magic trick when you execute a script +like in the cartoon below. The `np.array` function can be called by Python +code, but it turns out this function is _implemented_ [_in +C_](https://github.com/numpy/numpy/blob/f8c34f2927ba812a3efe9bc978d84aa47f27bff7/numpy/_core/src/multiarray/multiarraymodule.c#L1720). The CPython interpreter handles this transparently. + +
+ A cartoon showing a Python script with NumPy code calling into a cloud representing the CPython interpreter runtime which in turn calls into a NumPy C extension. +
+ +How does it achieve this trick? +Via the [CPython C API](https://docs.python.org/3/c-api/index.html) and the corresponding [Python ABI](https://docs.python.org/3/c-api/stable.html#c-api-stability). +Of course that doesn't really explain what those terms mean. +It certainly doesn't help that these two related, intertwined concepts are named with such similar-sounding acronyms. +I will attempt to clarify that in the rest of this post. + +### The CPython C API + +[CPython](https://github.com/python/cpython), as the name suggests, is implemented in the C programming language. +It began as a research project and was written to be easy to extend with new functionality. +To access the internals of the interpreter, you need only to `#include "Python.h"` in a C program, which gave rich access to the internal state of the interpreter and hooks to call into the interpreter via the CPython C API, the very same C API that the core interpreter implementation is based on. + +What is the C API exactly? +It's the set of C macros, typedefs, functions, and structs exposed by the header that defines the C interface: `Python.h`. +It's possible to write C code that replicates the behavior of any Python script, at the cost of compilation, verbosity, and exposure to the pitfalls of the C programming language. +The reward is raw execution speed. + +It is often possible to achieve order-of-magnitude or even several order-of-magnitude speedups by translating Python code to a compiled language that can call into the C API. +The [Cython](https://cython.readthedocs.io/en/latest/) programming language takes advantage of this by (among other things) compiling Python code to C source that, once compiled and linked into a larger program, behaves like the Python it was generated from. +This translation isn't always perfect, but it is serviceable enough to back the implementations of popular libraries like [Pandas](https://pandas.pydata.org/), [scikit-image](https://scikit-image.org/), or [scikit-learn](https://scikit-learn.org/). + +### The distinction between a C API and an ABI + +What isn't the C API? +The C API is purely a construct of the C programming language. +Code written in languages that aren't C can call into a C API, but only by using the platform-specific and architecture-specific calling conventions used by all code that executes directly on the CPU, abstracting away functionality that is not expressible in C. + +The diagram below illustrates the distinction between the CPython C API and the Python ABI. + +
+ A two-panel hand-drawn diagram contrasting the API and the ABI. The left panel, in violet and marked with a small C-source-file icon, is titled "API — Application Programming Interface" and lists three things the API covers, each with a code example: function signatures (PyObject *PyDict_GetItemRef(…)); macros, typedefs, and inline functions (Py_INCREF(op)); and the header you compile against (#include "Python.h"). The right panel, in blue and marked with a small computer-chip icon, is titled "ABI — Application Binary Interface" and lists three things the ABI covers: exported symbol names (PyDict_GetItemRef); struct sizes and field offsets, illustrated by a memory-layout diagram of PyObject as two 8-byte fields — ob_refcnt at byte offset 0 and ob_type at offset 8, 16 bytes total; and platform-specific details, illustrated by a matrix of operating systems (rows: Windows, macOS, and Linux) against CPU architectures (columns: x86-64, arm64, and ppc64le), with a green dot marking each supported build — x86-64 and arm64 for all three operating systems, and ppc64le for Linux only. +
+ +A C API includes all of the details that are expressed in a C header, including full function signatures, macros, typedefs, and inline functions. It also includes the header itself. + +A C ABI includes all of the _symbols_ — the variables and function declarations — exposed in a header like `Python.h`. +A symbol in the ABI corresponding to a C API function holds the name of the function, the number of arguments, the types of the arguments, and the type of the value returned by the function, if any. +Many variables exposed in C headers are C structs, so the layout of these structs is also part of the corresponding ABI. +The layout of the struct is the order and types of all of the members of the struct. + +Notably, the a C ABI does _not_ include things that _are_ in the C API. +This includes all items that are particular to the conventions of the C language or the C preprocessor like [macros](), [typedefs](https://en.wikipedia.org/wiki/Typedef), and [inline functions](). +The Python ABI also doesn't depend on the names of the function arguments: all it cares about is how to store and lay out the instances of different types exposed in a C API in memory. + +This can be a little confusing when working with the CPython C API and thinking about the difference between the ABI and API because many names in the C API are implemented as macros but logically behave as functions. +Because they're implemented as macros, these items do not appear in the Python ABI. +This means that programming languages that cannot compile C syntax, like Rust, cannot use any C API items that are defined as C macros or inline functions. +Instead, Rust extensions rely on Rust re-implementations of macros and static inline functions exposed by the C API based on items that _are_ in the Python ABI. +C++ _is_ compatible with the C preprocessor, so you can use typedefs, macros, and inline functions exposed in the CPython C API in C++ extensions. + +A C ABI is also fundamentally platform-specific. Below, I refer to parts of the ABI that are specific only to CPython's ABI as "the Python ABI", but always keep in mind that there are distinct ABIs on different platforms. + +These details about the C ABI in particular are very important in practice for C++ and Rust. +Rust does not have any ABI stability guarantees: the layout and ordering of struct members are not guaranteed by the compiler except if a struct is explicitly set to follow C ABI conventions using [`#[repr(C)]`](https://doc.rust-lang.org/nomicon/other-reprs.html#reprc). +C++ does allow defining C++ APIs with stable ABIs but doing so is [much more complex](https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C++) due to details around ABI stability in various C++ standard library implementations as well as complexities like [name mangling](https://en.wikipedia.org/wiki/Name_mangling). +In practice, the C ABI is the low-level lingua franca of the modern computing environment, used by most popular programming languages to enable interoperability at the binary level. + +## The Python ABI + +When talking about a concrete Python extension module that targets a particular CPU and OS, it helps to conceptually organize the ABI definition into two layers: the _platform ABI_ (platform-specific details) and the _Python ABI_ (Python-specific details), which I'll discuss in turn. + +### What is a platform ABI? + +The platform ABI governs details of how exactly machine code executes on each architecture: how a compiler translates calling a C function like [`PyDict_GetItemRef`](https://docs.python.org/3/c-api/dict.html#c.PyDict_GetItemRef) into a concrete set of machine code that sets CPU registers and other platform-specific details per the specification of whatever platform the code ultimately runs on. + +A platform ABI determines things like the exact way machine code needs to pass arguments to a function. For example, on the [`x86_64`](https://wiki.osdev.org/System_V_ABI#x86-64) architecture, only the first six non-floating-point arguments of a function are passed via registers, the remaining arguments are passed via the stack. +Normally, developers do not need to worry about details like this: compilers automatically generate machine code appropriate for whatever platform ABI a developer wants to target. +However, it _is_ important to know that different operating systems and CPU architectures have unique ABIs and that code compiled for one platform ABI is completely unusable on another platform ABI. +This fact determines much of the design of the [binary wheel distribution format](https://packaging.python.org/en/latest/specifications/binary-distribution-format/), which we'll discuss in more detail below. + +The biggest consequence is that each platform and CPU architecture, each with its own distinct platform ABI, requires its own unique builds. +This is one reason projects like `NumPy` distribute so many binary wheels with each release: projects need to build binaries for each platform ABI they want to support. + +I am glossing over details a little bit here, things like the system libc library implementation (musl vs glibc on Linux) or the compiler implementation (MSVC vs cygwin on Windows) can also impact the ABI. +In many contexts, this complexity is parameterized with a so-called ["target triplet"](https://mcyoung.xyz/2025/04/14/target-triples/) like `aarc64-apple-darwin`. +Each distinct target triple corresponds to a distinct set of ABI conventions that a compiler toolchain must support to generate valid binaries. + +### ABI tags and wheel filenames: understanding wheel compatibility + +Most popular Python packages ship binaries to the [Python Package Index](https://pypi.python.org) (PyPI) in the form of binary wheels. A wheel is a zip-compressed folder containing Python code and, optionally, compiled artifacts. + +Other posts go into this in a lot more detail, but here we're particularly concerned with the filename of a wheel file. Let's consider what happens when I `pip install cryptography` on my development environment running on an ARM Mac laptop: + +```bash +$ pip install cryptography +Collecting cryptography + Downloading cryptography-48.0.0-cp311-abi3-macosx_10_9_universal2.whl.metadata (4.3 kB) +Collecting cffi>=2.0.0 (from cryptography) + Downloading cffi-2.0.0-cp314-cp314-macosx_11_0_arm64.whl.metadata (2.6 kB) +Collecting pycparser (from cffi>=2.0.0->cryptography) + Downloading pycparser-3.0-py3-none-any.whl.metadata (8.2 kB) +Downloading cryptography-48.0.0-cp311-abi3-macosx_10_9_universal2.whl (8.0 MB) + ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.0/8.0 MB 11.6 MB/s 0:00:00 +Downloading cffi-2.0.0-cp314-cp314-macosx_11_0_arm64.whl (181 kB) +Downloading pycparser-3.0-py3-none-any.whl (48 kB) +Installing collected packages: pycparser, cffi, cryptography +Successfully installed cffi-2.0.0 cryptography-48.0.0 pycparser-3.0 +``` + +Let's pay particular attention to the filenames of the wheel files, because most of the information in the wheel file is related to ABI compatibility. + +
+ A diagram illustrating the structure of a wheel filename. There are six pieces of information as part of each wheel file: the name of the distribution, the distribution's version, the Python version tag, the abi tag, and the platform tag, and the wheel filename extension. +
+ +The third, fourth, and fifth pieces — the [compatibility tags](https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/) — are the ones that matter for ABI compatibility. The "python" tag indicates either the exact version supported by the wheel or the minimum supported version. +Whether or not the version indicates a minimum or an exact version depends on the next ABI tag. +The `py3` tag used by `pycparser` indicates that this is usable by _any_ Python 3 interpreter running _any_ Python version, not necessarily just CPython. +The `cp311` and `cp314` tags indicate that the `cryptography` and `cffi` wheels are only usable with CPython. Other Python implementations will need a different wheel. + +The ABI tag indicates the Python ABI the wheel file targets. +The `none` ABI tag used by `pycparser` indicates that the wheel doesn't target a particular native ABI in particular: the wheel includes only pure-Python code and binary compatibility doesn't need to be considered. +The `cp314` tag used by `cffi` indicates that the wheel supports Python 3.14 exactly and no other minor Python version. +Finally, the `abi3` tag used by `cryptography` indicates that this wheel targets the [Python Stable ABI subset](https://docs.python.org/3/c-api/stable.html#stable-application-binary-interface), and is forward-compatible with all future Python 3 versions that support the `abi3` ABI. + +It's worth emphasizing that `abi3` wheels are _not_ installable on [the free-threaded build](https://py-free-threading.github.io) of CPython. We'll explain more below what the `cp3XY` and `abi3` tags mean and see why the free-threaded build threw a monkey wrench into this scheme. + +## The Layers of the CPython C API + +In the early days, there wasn't much distinction between "the interpreter" and "what's exposed by the C API": you simply were able to monkey around at will with internal state in the interpreter. +While this enabled lots of cool stuff, it also had a cost: extensions regularly broke with different Python releases, requiring careful fixes to adapt to changes in interpreter internals. +This also introduced a design pressure to freeze internals, lest people building on Python complain about changes breaking their code. + +These days there is a much better distinction between CPython internals and publicly exposed API, with all signs pointing towards the trend of isolating interpreter internals to continue into the future. + +This is managed by breaking up the "full" C API surface used by the interpreter into five different layers, illustrated in the diagram below. + +
+ Five nested ellipses illustrating the layering of the Python C API. The outermost ellipse is gray and labeled "Internal API, exposed only if `Py_BUILD_CORE` is defined". The next enclosed ellipse is red and outlined with a dashed line defined in the legend to mean "Usable with `#include "Python.h"`". The red ellipse is labeled "Private API `_Py*` prefix". The next enclosed ellipse is yellow and is labeled "Unstable API `PyUnstable_*` prefix". The next enclosed ellipse is blue and labeled "Version-specific API". The next enclosed ellipse is green with a dark shaded outline the legend defines to mean "Usable if `Py_LIMITED_API` is defined" and is labeled "Limited API". +
+ +Below we describe each of these layers and explain how this separation enables many different use cases. + +### Internal C API + +The outermost layer is for people who work on the CPython implementation. +In order to use it, a C or C++ program must define the `Py_BUILD_CORE` compiler macro, which is equivalent to declaring that your program is a part of the CPython interpreter itself. +As [the CPython developer guide](https://devguide.python.org/developer-workflow/c-api/#the-internal-api) indicates, the internal API is subject to change at any moment and should not be relied on. +Defining `Py_BUILD_CORE` and opting into using the internal C API is equivalent to saying you're willing to take on maintenance for code that can and will break at any time. +For 99.9% of people who are not CPython contributors, the internal API should not be used. + +### Private C API + +This layer is _private_ in the sense that its items shouldn't be used outside CPython, but "public" in the sense that they're available after a plain `#include "Python.h"`, with no special macros. +Names in the C API with a leading underscore are private. +To pick a random example: [`_PyDict_GetItem_KnownHash`](https://github.com/python/cpython/blob/2e5843e13fcfd768a435d82e6182af403844432c/Include/cpython/dictobject.h#L45-L46) allows users to bypass the normal Python `dict` interface to directly access a dict entry with an already-computed hash. +This optimization is useful sometimes in the interpreter, so it's defined. +It's not a documented primitive, so there are no guarantees about it being available in a future version of the C API. +It _is_ part of the Python ABI, though, so it inherits the ABI's stability guarantees even though it carries no API guarantees. +Sometimes symbols are exposed like this for technical reasons, sometimes for historical reasons, and sometimes because a CPython user requested the ability to do something and was OK with the lack of API stability. + +### Unstable C API + +The next innermost layer encloses all of the documented functionality in the CPython C API. +The outermost documented layer is [the unstable C API](https://docs.python.org/3/c-api/stable.html#unstable-c-api). +There is an extra layer here because not everything that's documented is stable. +If a serious defect is discovered in an unstable C API item it may be radically changed or removed entirely in a CPython minor release. + +These items serve a real purpose, but the CPython developers are not ready to commit to long-term support. +Usually this is because unstable API items rely on or expose interpreter implementation details and it's not yet clear if the resulting behavior should be enshrined in the "official" C API. +The unstable API is most useful for projects or organizations who can follow development of the CPython C API and are OK with occasional breakage when warranted. + +### The Version-Specific C API + +The next innermost layer is the Python version-specific C API and includes all symbols that follow the CPython C API [stability policy](https://docs.python.org/3/c-api/stable.html#c-api-stability), as first laid out in [PEP 387](https://peps.python.org/pep-0387/) back in 2009. +While public symbols in the CPython C API can be removed following a deprecation period or to fix a serious defect, this only happens after a period of public discussion and consideration of the community impact. Critically, CPython has a policy not to add or remove items in the version-specific API from the first beta release of a Python minor version onward. +For example, this year Python 3.15.0b1 came out in May and that release froze the Python 3.15 version-specific C API. +All subsequent betas, release candidates, and official releases share the same set of API items, including function signatures and struct layouts, with no changes allowed until the next Python minor release. +This also means the ABI is frozen. +This allows projects to release binaries targeting a specific Python version and be assured it will continue to work for future bugfixes releases of that version. + +### Limited C API + +Finally, the most restricted innermost layer corresponds, fittingly, to the "Limited" C API, a deliberately curated subset of the full CPython C API. +The limited C API corresponds to a subset of the full Python ABI: the stable Python ABI. +The CPython project [commits](https://docs.python.org/3/c-api/stable.html#stable-abi) to keeping items in the stable ABI as part of the Python ABI forever. +C or C++ extensions can opt into compiling for this mode by defining the `Py_LIMITED_API` macro to a hex constant that corresponds to a particular CPython version. + +A somewhat confusing point that is often glossed over is that the limited C API _also_ changes from Python version to Python version. +You can build extensions that target _either one_ of, say, the Python 3.13 version-specific API or the limited API. +It's also possible to target limited API versions older than the Python version used to build an extension. +An extension or wheel build targeting, say, the Python 3.10 limited API can be imported on Python 3.10 and _any newer Python release_. + +Items in the limited API can be deprecated and removed, but the corresponding symbol in the ABI must remain available forever to ensure binary compatibility. +That means that the limited API is not necessarily forward compatible: an extension targeting the limited API for Python 3.9 may not necessarily compile on the limited API targeting Python 3.12. +However, a binary wheel targeting the Python 3.9 limited API built from the same project will be importable on Python 3.12, thanks to the stable ABI. + +There is a big "but" in the above claim that targeting the limited API allows supporting _any_ Python version newer than Python 3.10. +It turns out that this guarantee depends critically on structs that have long been exposed in the CPython ABI and how free-threaded Python wants to change the layout of these structs. + +## CPython C ABI stability and binary wheel builds + +There are two levels of ABI stability provided by CPython: + +- The ABI is frozen for all releases within a Python minor version, say from Python 3.13.0 to the last Python 3.13.x bugfix release. +- ABI items that are in the limited C API will never be removed, even if a future version of the limited API removes an API item. + +Of course "never" is a little strong. If a particularly bad design flaw is discovered, an item can be removed from the limited API, and the _implementation_ behind its ABI entry replaced with one that raises at runtime. +The _symbol_ itself stays in the ABI, though, so an extension that uses the removed function still compiles, links, and imports successfully. It only risks a runtime error later, on a newer Python, if it actually _calls_ the removed function. + +These two guarantees map onto two of the API layers from the previous section: the version-specific API backs the per-release `cp3XY` ABI, and the limited API's stable subset backs the forward-compatible `abi3` ABI. Each enables a different kind of build for distributors of extension modules: + +### The version-specific ABI: `cp3XY` + +The guarantee that the Python version-specific C API does not change within a Python minor release series corresponds to a stability policy for the CPython version-specific ABI. +In practice, it means that extension modules and binary wheels built using Python 3.15.0b1 can be imported using any subsequent version of Python 3.15, even years into the future. + +Projects shipping binary wheels using the version-specific ABI have ABI compatibility tags in the wheel filename like `cp314-cp314`. If a wheel has this particular tag, that indicates the wheel has binaries that are built using the GIL-enabled version-specific ABI for Python 3.14. +We'll cover shortly why I had to add "GIL-enabled" there and why that's important. + +Because the version-specific ABI is specific to a particular minor release of CPython, this means a project must explicitly add support for new Python releases every year. +Some projects, like NumPy, target this ABI because they have the contributor resources to manage that level of coordination, with an annual deadline corresponding to the release of CPython in the Fall. +Many projects do not have the resources to track CPython like that and instead tend to fall behind, adding support for new CPython versions months or even years after the CPython release happened. + +### The Stable ABI: `abi3` + +The guarantee that items in the limited C API will never go away enables a different kind of _forward-compatible_ ABI. +You can build binaries using the limited API as it was defined in, say, Python 3.10. +Because of the stability guarantees provided by the stable subset of the CPython ABI, you can rely on the fact future Python versions will be able to import your binary: all the symbols it needs from the CPython ABI should still be there. + +There is one major problem with this scheme: `abi3` as it was originally defined shared a detail with the version-specific ABI: the layout of the `PyObject` struct is exposed. + +Until the advent of the free-threaded interpreter, projects had a choice of supporting a build matrix like the following: + +