Here are three protocol changes that I consider important to improve reliability and performance of the Solana network and the Firedancer client. In other words: My Solana-land New Year’s resolution for 2024.
(Software optimization is a game of whack-a-mole)
Preparing an on-chain program for execution should be as simple as mapping bytecode and data into virtual machine memory. In reality, loading a typical user program often takes longer than actually executing it.
This is mostly owed to historical choices to allow the protocol to execute the outputs of
rustc directly. This was thought to feature better developer experience than requiring some post-processing step.
What resulted was an error-prone and inefficient format.
The situation is best explained by Firedancer’s implementation:
A 1200 line code footprint in this context is not only slow, but also complicates cross-client compatibility testing.
Designing a replacement binary format is a trivial exercise. It is mostly a social issue, as it will require approval from all contributors to the virtual machine and compiler toolchain.
Once the new format is implemented, the old format should be retired.
My preferred solution thus far is a one-time sweep over the account database that converts all existing deployed programs. Any VM bytecode would be kept strictly identical, such that immutability of program code is preserved.
The state size of the Solana network is gradually approaching a point where it is no longer practical to keep in DRAM. In fact, it is technically unbounded due to an overly simplistic storage cost model.
One obvious solution is to spilling over to accounts to cheaper storage such as NVMe. But unless done very carefully, would kill any prospect of greatly increasing network performance.
Solana is DRAM-heavy for reasons that this Intel technical paper neatly summarizes: https://www.intel.com/content/www/us/en/developer/articles/technical/memory-performance-in-a-nutshell.html
Keeping account state in-memory allows on-chain programs to access arbitrary account data with consistently low latency. More importantly, current generation x86 systems can achieve total memory bandwidth of hundreds of gigabytes per second.
When operating in a storage environment with highly asymmetric latency and bandwidth, a blockchain runtime will certainly become limited by I/O capabilities unless fee models become aware of account locality.
For example, writing ten 1MB accounts 1 million times each is going to be significantly faster than writing (10^6)x10MB accounts once, even though both cases write 10TB of data. The former would operate at a bandwidth of ~hundreds GB/s via L3 cache, the latter at least 100x slower)
With the goal of staying in DRAM territory in mind, let’s instead take measures to reduce state growth rate.
An effective first step to improve cost modeling is to dynamically increase storage fees as free space decreases. Currently, storage fees are implemented via a “minimum account balance” that can be fully reclaimed.
I don’t expect that it is possible to achieve negative state growth with this dynamic alone, though. At least, not until exceedingly high fees make the network unusable.
Which leads us to the next research item.
Did you know that 76% of Solana accounts have not been accessed in the last 6 months? (Credit to
@andrewhong5297 for this data point)
Continuing from the above, there is significant opportunity to reduce the in-memory set of accounts.
Hash trees have become the de-facto standard approach to provide large amounts of state to on-chain programs (commonly called “compression” within Solana). Compressed program data consists of just the root of a concurrently modifiable hash tree. The original data itself will no longer get replicated across the blockchain network and is stored separately by whoever chooses to (such as in a p2p torrent network). But it can be recovered as needed. Currently, each program would have to include logic for state compression separately.
@toly has suggested to introduce compression generically by introducing a new storage class of compressed accounts. To reclaim DRAM, the runtime would periodically sweep the database for the oldest accounts and evict them if their balance drops below a dynamic minimum (determined using the aforementioned state growth limiting mechanic).
To avoid complexities with storage asymmetry, the transactions that attempt to access compressed accounts would fail. Users can decompress their accounts at any time be re-uploading all data fragments, along with a cryptographic proof that the data was not tampered with.
The main difficulties lie in client/wallet “RPC” infrastructure and choosing an appropriate storage solution. Failure to do so risks data loss. Storage networks like Filecoin and Arweave are of particular interest.
Preliminary design work for all of the above is under way at the Solana Labs and Firedancer. I hope to publish detailed designs and technical proposals for all three items in the coming weeks. As always, I’d love to invite the wider community for discussion. I’m curious what you think.