Bank Capture File

ripatel-jump · July 24, 2023, 3:12pm

Summary

Resolving bank hash mismatches between different validators and validator releases is an arduous process.

The most widely used approach involves

dumping bank hash pre-images to the validator log files (shared with arbitrary other log output)
using a log parsing tool to extract information
accumulating the changes every slot, then constructing a diff

Background

The bank hash commits to the execution inputs and state changes of a slot
A bank hash mismatch occurs when two Solana runtime implementations output different bank hashes for the same inputs (same state, same slot)
This implies that these two runtime implementations are incompatible, which is a severe bug that has to be fixed
The bank hash pre-image refers to the raw inputs fed into the hash function
This pre-image is highly useful for debugging, as it pin-points the input that is different
There is no standard for encoding the pre-image; All solutions so far rely on hacks that are incompatible across different validator code bases

Requirements

The following pseudocode describes the declarations of the hash constructions part of the bank hash.

account_hash := blake3 {
  le u64 lamports
  le u64 slot
  le u64 rent_epoch
  []u8 data
  u8 executable
  [32]u8 owner
  [32]u8 key
}

accounts_delta_hash := merkle {
  leaf = [32]byte account_hash
  branch = sha256 {
    [1..=16][32]byte node
  }
}

bank_hash := merkle {
  leaf = [32]byte account_hash
  branch = sha256 {
    [1..=16][32]byte node
  }
}

The solution must be able to serialize all of the above data in a language-agnostic format. There should be consensus among validator developers, and every team should be willing to implement and work with this format.

The serialized size is estimated to be hundreds of megabytes per slot.
Therefore, the serialization scheme used should also be efficient.

Stretch Goals

Ideally, this file format should support streaming use and compress well.
Perhaps, we could wrap the Protobuf blobs in a binary container format, such as .tar.zst or a custom format.

Possible Solutions

Designing a data structure representing the above information is trivial.
It is not obvious which serialization scheme should be used however.

JSON

steviez at Solana Labs has been working on a JSON-based solution.

This format can be easily upgraded, but we’d argue it is a little too free form, and does not offer great performance.

Custom Binary Format

I’ve worked on a custom binary format for maximum performance.
There are a number of obvious shortcomings:

It is not easily upgradable
It is more difficult to implement and debug

Protobuf

After meeting with the Firedancer team on this topic, we settled on the mix between the above two. A Protobuf schema can be upgraded just like JSON structures, but it also features powerful cross-language tooling, a schema language for coordinating these upgrades, as well as decent performance. Finally, Solana validators already use the Protobuf stack for RPC.

We would like to request comments from client developers, and invite validator developers to collaborate on a solution.

ripatel-jump · July 24, 2023, 11:38pm

Published a draft of the file format here: Initial solcap API by ripatel-fd · Pull Request #543 · firedancer-io/firedancer · GitHub

Topic		Replies	Views
sRFC 00006: Writing SVG Images as PDAs of a Solana Program to implement On-chain images for NFTs sRFC	8	1005	April 8, 2023
Discriminator Database RFP account-resolution , interfaces , dev-tooling	6	618	September 24, 2024
sRFC 36 - Typed Message Payload Rendering in Wallets sRFC interfaces , feature , cryptography	1	150	April 16, 2025
sRFC 00010: Program Trait - Transfer Spec sRFC account-resolution , interfaces	2	1105	April 27, 2023
sRFC 00002: Off-Chain Instruction Account Resolution sRFC account-resolution , interfaces , spl , anchor	4	847	April 16, 2025