sRFC 00004: Native Events Program

ngundotra · March 15, 2023, 3:52pm

Solana Events for Programs

Summary

This RFC proposes a native “Event” program built into the Solana runtime to address the current limitations in log storage and validation. By implementing a standardized event interface, we aim to achieve the following goals:

Logs should be available as long as Solana blocks are available.
There must be a way to validate that logs (or events) returned by the RPC operators are stored in the block and are truthful.
There must be a clear pricing curve for permanent log storage on Solana, either implicit or explicit.

Problem Description

Currently, Solana logs are truncated to 10kb max per transaction, and there is no process to find “untruncated” logs stored in blocks or to validate the logs returned by RPC nodes. Solana applications work around log-truncation by calling another program (like a “no operation” program) and sending the desired log as instruction data. Such invocations are reliably returned by RPC operators, but they could maliciously (or accidentally) drop history of invocations, and nobody would be able to verify without replaying the historical transaction.

It’s worth noting that EVM chains index their programs only through events, since program data layout is theoretically unspecified. Indexing solana programs only through logs is currently not feasible until there is a reliable way to retrieve them.

Goals

Define an “Event” interface for Solana programs that allows logs to be stored for as long as Solana blocks are available.
Implement a mechanism to validate that the logs returned by RPC operators are accurate and stored in the block.
Establish a clear pricing curve for log storage on Solana.

Possible Implementation

High level summary

Expose a new program built-in to the runtime, that merkelizes logs emitted over the course of a block and writes the merkle tree root to a read-only account via PDA (e.g. [block_number.to_le_bytes()]) at the end of the block. These are referred to as EventStorage accounts. This should allow dApps to reliably use this built-in program’s logs for app-critical needs such as indexing.

Event Instruction and Storage

We can define an event program for Solana that stores event data into a Solana block, hashes the event data in the whole block into a 32 byte hash, and stores the hash into an EventStorage account. This will allow logs to be recovered from block information, and allow downstream clients to reliably verify RPC responses.

/// Event program instruction
enum EventProgramInstruction {
  /// `LogEvent` instruction that takes an `Event` as input 
  /// and logs the event to the on-chain storage.
  LogEvent(Vec<u8>)
}

/// EventStorage account
pub struct EventStorage {
  pub block_id: u64,
  pub events_hash: u32
}

/// Event struct returned by RPC operators
struct Event {
  pub block_index: u16, 
  pub event_data: Vec<u8>
}

Log Validation

To validate that the logs returned by RPC operators are accurate and stored in the block, we can ask RPC nodes to return the Merkle proof along with the requested logs, allowing clients to verify the logs’ authenticity against the Merkle root stored in the EventStorage account.

Pricing Curve

To establish a clear pricing curve for log storage on Solana, we can implement the following:

Modify the LogEvent instruction to charge a storage fee, paid in native tokens, based on the size of the event data to the feePayer of the executing transaction.
Introduce a StorageFee struct that contains the base storage fee and the additional fee per byte of event data that can be modified via SIMD proposal.

Alternative Solutions

We could instead just add a pricing mechanism for existing logs emitted via sol_log_data that requires RPC operates to store them permanently. This could be as simple as increasing the CU for log syscalls to reflect an explicit “price per byte” of a transaction. If transaction logs are usually capped at 10 KiB, and a single transaction has a flat cost of 5000 lamports (at minimum, for a single signer), then each byte of transaction execution record is implicitly priced at 0.5 lamports per byte.

Currently, this low cost of adding to the Solana ledger incentivizes applications to run their own validator to retrieve logs their programs have emitted. This can be both a time-expensive and financially-expensive operation for an application to run their own validator to simply index & debug their own program.

Perhaps by introducing a separate fee schedule for logging that has price per byte > 0.5 lamports/byte, we can discourage the volume of logging per transaction, to less than 10 KiB. This would allow most currently existing RPC operators to comply with an SLA to consistently deliver complete logs for each transaction.

Recommended Implementation

We recommend implementing the built-in event program and storage as described above. This approach would address the current limitations in log storage and validation while also providing clear pricing for writing data to the Solana ledger.

There are 2 issues with the alternative approach of simply adding explicit transaction fees for logging.
One, existing programs that perform logging will now have increased fees, which may cause unfair competitive disadvantage between DeFi dApps. Two, the only mechanism to validate transaction logs returned by RPC operators is to run your own validator & compare transaction logs after transaction replay.

In comparison, the built-in program also provides a new mechanism to define log storage, but provides a superior method to validate logs that reduces burden on dApps. Thus we recommend the built-in event program solution, and encourage further implementation research.

joec · March 15, 2023, 7:18pm

Cool!

So, just for clarity: you basically take all of the emitted logs for a block, hash them into a merkle tree, store the root in the account for that blockhash, and can verify all logs emitted during that block are valid via the merkle proof?

So with this implementation, one could get their program logs from their RPC provider, and the RPC provider can also provide the merkle proof to validate that those logs are represented in the stored account?
aka, “proof of logging”

Since the suggested program is proposed to be embedded in the runtime, I’m curious how we could possibly expand/leverage this proposal to solve for the issue of truncating the logs as well.

Thinking about something like persistent logging or a logging retention pipeline, if an enterprise-scale large program wanted to retain lots of logs for a period of time and also have the validation proposed here that those logs are accurate and represented within the block, how can we possibly integrate un-truncating logs?

ngundotra · March 16, 2023, 2:20pm

Yes and instead of logging, they’ll be executing a syscall to the event program, which does the hashing at the end of the block, and posts the merkle root to an EventStorage account on-chain. This should also increase the fee charged to the transaction feePayer, proportional to the amount of data that was emitted to the event program.

If an enterprise-scale large programs want to store “logs” (aka sol_log_data logs) then they should run their own validator & only trust the logs that are emitted from replaying their programs own transactions. This could potentially be a good business for RPC operators.

I think it’s fine to make current “logs” a very transient & unreliable source of program metadata, and then encourage RPC operators to charge customers to serve program logs, since they are additional burden on RPC operation.

santhosh · March 16, 2023, 3:02pm

I think it would be great if there was a reliable way to associate events with programs that emitted them!

ngundotra · March 25, 2023, 12:00am

Here’s a concrete example of program control flow that is uninterpretable without access to full logs for transaction history.

These two transaction look like they both have the same control flow, but the second inner instruction is actually different.

Program A → Program B
Program A → Program C

Program A → Program B
Program B → Program C

With full logs, we would be able to index the 2nd inner instruction properly, but when we only have access to truncated logs, like in the 2nd transaction, we cannot tell which program invoked program C.

Just wanted to add this example to the discussion to emphasize the importance of this primitive for indexing programs. Personally I don’t think there are any RPC providers that offer “full logs” as a service, but having a few different ones offering that service would solve this problem entirely.

Code for the repro transactions here: GitHub - ngundotra/solana-event-logging-repro: indexing is hard

Topic		Replies	Views
sRFC 00013 - CPI Events sRFC	5	901	June 16, 2023
sRFC 00011: A smart contract that allows for easy storage and retrieval of data on-chain sRFC	4	1969	October 1, 2023
sRFC 00015: Interfaces sRFC	10	1692	June 9, 2023
sRFC 00014: Rethinking SPL Token sRFC	6	2030	June 7, 2023
sRFC 00016: Generalized Ownable Indexable Assets sRFC	2	566	May 29, 2023