The Bitcoin Core Kernel
It has been nearly two years since I started working on the Bitcoin Core kernel library - a software library encapsulating the Bitcoin Core validation logic, or all the logic required to determine if a given block extends the current best-work chain. The library purposely does not include any P2P networking, JSON RPC, REST or ZMQ servers, a wallet, additional database indexes, or a user interface. Originally started by Carl Dong, it has been long desired by contributors and users of Bitcoin Core. He originally introduced it as a de-spaghettification of Bitcoin Core’s validation logic into a more distilled, maintainable order that cleanly separates validation from non-validation. The design of its API was left for a later point in time. Building upon this original work, the project is now at a point where this separation is largely complete. It is now later.
In this post, I want to lay out my vision for its future. Articulating personal visions for a project of this scope is dangerous. It may generate undue influence and blur reality to resemble desired outcomes. Ideas should be able to stand on their own merit and not on the hypothesized results. Disagreement on even the details of these visions can lead to a hardening of dogmatic opinions and make us inflexible to new data and arguments. My ego is not immune to this. I had to concede bad ideas and abandon work that already cost me effort over the past two years, and it made me upset. At the same time, attracting people to your project requires you to share ideas and tell a narrative. Not talking about goals is equally exclusionary. It makes it harder for newcomers to enter and equally supports hidden agendas. People interacting with the project are often left guessing what its goals are.
It may well be that none of the features I talk about ever ship. They are equally likely to be overengineered, won’t prove useful (or imo more importantly fun) enough to other developers, or may have dangerous side effects. The project has already been a success for Bitcoin Core. Developers profit from the increased clarity of splitting validation from non-validation code and it made the code more consistent compared to three years ago. Even if it were never to progress beyond the current stage, it already provided significant benefits.
The project has so far concentrated on cleanly separating Bitcoin Core’s validation logic from its non-validation logic. However, there are still some bits left where functionality used by validation is somewhat duplicated in multiple locations. For example, some of the RPC code could be simplified if it consistently called logic through the same interface other components of the code used.
Wholesale exposing the bulk of the logic currently composing the library could be dangerous. It would bind users to Bitcoin Core’s internal development. If the code changes it may break their use case, or cause frequent re-bases. This could place more pressure on Bitcoin Core developers making it even harder to evolve the code. In the case of a future soft fork, users of the library would have to carefully vet every data structure and function in case it changes their logic. This would include low-level changes in Bitcoin Core’s code that might not be documented. If not done, continued usage could lead to a consensus failure. A stable and safe API for external users is required to solve this.
Many changes to the internal validation code, such as making serialization more performant, introducing new database formats, or further improving the structure of the code, would not need to change the API. Development on Bitcoin Core would remain flexible, while care for backward compatibility and versioning would only have to be taken if the API is changed. Ensuring backward compatibility of a C API is a long-established practice across thousands of software projects. For example, if a function requires different or more arguments after a soft fork, a newly tweaked version of the function would be added to the API.
Most recently I opened a pull request for an initial version of an API for the library. It provides a C header (src/kernel/bitcoinkernel.h) that external projects can use to access Bitcoin Core’s validation logic directly. Concretely this means the C header defines functions that call the same code that executes when running a Bitcoin Core full node. The header not only allows users to validate blocks, but also more low-level tasks, such as reading block data and validating scripts, and in future iterations validating headers and transactions. A more in-depth description of it is provided in the pull request, and I have a rust-bitcoinkernel crate that links to it.
There are downsides to shipping a C header over a common C++ header as an interface. With the bulk of Bitcoin Core’s code being written in C++ there is a need for a thin translation layer for translating it to C that will take additional review and necessarily duplicate some code. The benefit of a C API is that most other programming languages have mature C bindings, but often more haphazard or no C++ bindings. C++ is also less stable. Newer features evolve more rapidly and new standards are released every three years. Its choice could place pressure on the project to adopt new language features and standards more slowly to allow the ecosystem to update more slowly and retain some backward compatibility.
Another downside of having a stable external C API is that it is not elegantly usable directly by the existing internal code of Bitcoin Core. If it were it would further complicate the code by adding yet another interface. Translating from C++ to C back to C++ within the same codebase introduces needless indirection. Not having a direct user of the API in Bitcoin Core could increase maintenance burden, since problems with it are not readily noticed, or not perceived as pressing. An example of this problem could be found in the now deprecated libbitcoinconsensus, a library for validating just scripts. It took nearly two years after the activation of taproot for support to land in its API. A solution to this could be writing some small utilities that use the API, like a block linearizer, a reindexing tool, or a replacement of the existing bitcoin-chainstate example program. Tests against the API may also help. A more permanent solution could be moving any logic introduced by the API that goes beyond just translating C++ types to C into a separate interface that may then be reused by other parts of the code. Using your product within your organisation is commonly referred to as dogfooding.
The C header could be used by external projects to re-implement a full node while still relying on Bitcoin Core’s validation code. This would greatly reduce the risk of consensus bugs that re-writing the validation code could otherwise introduce. Alternative Bitcoin full node implementations have run into consensus / script validation bugs in their validation code, e.g. btcd and libbitcoin (though the libbitcoin bug was unreleased). It should be noted that libbitcoin already optionally supports using the now deprecated libbitcoinconsensus (the projects inconveniently have confusingly similar names, a choice that predates me). Similarly floresta also uses libbitcoinconsensus for its script validation. Determining the amount of API flexibility required to make this a success, even for highly optimized and opinionated use cases, is difficult. For maximum efficiency users would be forced to re-use the same data structure types that the library uses. At a minimum, the library should leave “Bitcoin Core”-isms like assumevalid, assumeutxo, and transaction policy configurable to the user. Alternative node implementations could implement their own P2P, wallet, indexes, and UI functionality in any programming language they see fit.
The library may also be useful for data science projects or index builders like electrs that need to read block data, to not have to re-implement Bitcoin Core’s internal data representations. Bitcoin-Core’s latest v28 release introduced XOR-ed block data storage. This broke many existing block data parsers, for example, the built-in one from mempool/electrs. If they use the kernel library in the future, they could instead of having to roll out a patch just update to the newest version of the library and continue operations. When I was writing my thesis a couple of years ago I also had to implement my own Bitcoin Core data reader for collecting block and transaction data. It is generally a bad idea for the ecosystem to rely on Bitcoin Core internal data representations as opposed to a stable interface. Projects currently using the JSON RPC interface to read block data from Bitcoin Core could use the kernel library in the future. Reading directly from the filesystem through the kernel is far more performant compared to serializing the data to JSON and sending it over a socket.
The kernel library is called “a kernel” because, unlike many other libraries, it manages resources, writes to disk, and spawns threads. It stores blocks and maintains indexes for the block store and unspent outputs. It could be beneficial to extend the API to become sans-I/O. Instead of doing all the database operations internally, the library would allow the user to define their own storage strategies. This could enable users to store data in their own formats and database choices, or even leave everything in-memory. Having abstract interfaces inside the code can also benefit testing. Unit and fuzz tests would not have to touch any file system and the abstract interfaces could be mocked to test their behaviour. However, there is bad history of changing the database used for the consensus engine. When the database for tracking the unspent outputs was changed in version 0.8 from Berkeley DB to LevelDB a subsequent unanticipated change in the validation rules led to a chain split, and a double spend (details in BIP 50).
Having a sans-I/O interface can enable the library to be targeted for platforms that do not provide a full operating system. This could enable validation on embedded devices, or, by compiling to Wasm, directly in the browser. Embedded validation with a much-reduced storage requirement could be interesting for projects like the validating lightning signer. More recently there has been interest in the Bitcoin Core repository for compiling some of the validation files to riscv-32 bit bare metal. This is the target architecture used by the risczero project to create a zero-knowledge circuit from an arbitrary program. Being able to validate blocks, headers, scripts, or transactions in zero-knowledge using the same logic Bitcoin Core uses could give people more confidence in bridges like BitVM.
The project is by its very nature slow, incremental work. Coming up with an ideal structure for everything and then implementing and submitting it all at once is not workable, since it would bring too much risk. Instead, the path to a good API will continue to be incremental and I expect it to be marked as “experimental” for a while if it gets merged. I hope to eventually blog about some of the topics mentioned here in detail. Especially C API design (a first for me) was a topic I could not find many textbooks on. In contrast, I have received a wealth of feedback on it from other developers, so it could be an intriguing follow-up topic. So far the kernel project has met with a great deal of charity and enthusiasm from existing Bitcoin Core contributors. This was especially not a given since I took the project over from a prior contributor who had an established contribution portfolio, while I was an infrequent contributor with just a couple of pull requests to my name. I hope that enthusiasm will not only carry over into the next stage but also motivate external developers to take a look at the proposed API. I’m excited about this phase of the project, and it will be interesting to see which parts will make it and how.