Discriminator Database

Context

Teams in the Solana ecosystem regularly face challenges interacting with unknown deployed contracts and parsing unknown instructions. A community discriminator dataset would lead to a public good grouping of IDL discriminators that any developer can pull from when needed, which would lead to an incremental increase in developer productivity and speed of interaction of Solana tooling.

Please see the following RFP that outlines a request to create an open source discriminator database. The Solana Foundation lays out a list of proposed solutions, but the technology used to build the database will be decided during grant negotiation by the grantee and the Solana Foundation.

Logistics

Take note of the application deadline (10/04/2024). The maximum grant amount is currently earmarked to $60k in USD-equivalent locked SOL. The final grantee will work with the Solana Foundation to decide on the final terms of the agreement, including negotiation of rigorous but attainable milestones.

Ground Rules

This thread can be used for comments, questions, praise, and / or criticism, and is intended to be an open forum for any prospective responders. This thread is also an experiment in increasing the transparency through which RFPs are fielded by the Solana ecosystem too, so please be mindful that we’re all here to learn and grow.

Responses to this RFP are not required to be public (but recommended), but if it is helpful to share notes or combine forces, then please use this thread for such purposes.


Link: Airtable - Solana Foundation Active RFPs

3 Likes

Hi! Thank you for publishing this RFP, sounds like a really valuable improvement!

Could you please help with the following questions?

  1. What kind of collisions/conflicts/duplications is mentioned in the RFP description?
  2. Should the frontend interface for uploading IDLs require the signature of program authority to upload the IDL for it? For the contract verification tools it isn’t necessary because their main purpose is comparing the uploaded source code with the bytecode deployed to the blockchain, but in this case some 3rd party actor can upload incorrect IDL for a given program intentionally, so it sounds like that this feature is necessary, unless I’m missing something.
  3. What does “resolve an input given a hash” mean in the RFP description?

Thank you in advance!

UPD: as far as I understood from the discussions in Anchor discord, the apr.dev registry is discontinued and no longer working. Is it right that the RFP part about Anchor registry watcher is no longer relevant?

Hi @pkxro,

We’re a team of 3 seasoned blockchain engineers, actively working full time on the problem that this spec is describing and we are super excited to see a RFP for this!
The specs makes sense to us, I have a few questions regarding some specific points:

A Community dataset that allows developers to upload discriminators and their associated inputs with metadata about program relevance with collision detection

I’d like to check my understanding here, is the idea helping developers checking the validity of their payloads pre transaction signing?

Community dataset hosted with chunked and compressed parquet files that developers can download on some regular cadence

Would it make sense to partition this dataset by protocol author / teams?
Could you unpack the intention / need driving this requirement?

Thank you for your help!

@ludovic

I’d like to check my understanding here, is the idea helping developers checking the validity of their payloads pre transaction signing?

Yes exactly that – you should be able to upload an ABI and parse out all of the discriminators, or upload them individually with the sha256 of the 8bytes – note that there is some new yet-to-be upstreamed work that makes discriminators work for arbitrary lengths Support custom discriminators · Issue #3097 · coral-xyz/anchor · GitHub

Would it make sense to partition this dataset by protocol author / teams?
Could you unpack the intention / need driving this requirement?

Generally speaking, it is quite hard for data and monitoring teams to parse instructions cleanly. The goal if this RFP is to offer a UI, an API, and a rate-limited ability to get a full dump of the dataset (apache parquet is just what the data ecosystem converged on). Partitioning is ultimately up to you, but seeing as the output of this is just a set of discriminators/idls and not every parsed instruction ever, this should ideally not be too cumbersome because the footprint is quite small

1 Like

@dmozhevitin

What kind of collisions/conflicts/duplications is mentioned in the RFP description?

As a result of Support custom discriminators · Issue #3097 · coral-xyz/anchor · GitHub the footprint for collisions increases. You ultimately move from 2^256 to much lower parameters.

Should the frontend interface for uploading IDLs require the signature of program authority to upload the IDL for it? For the contract verification tools it isn’t necessary because their main purpose is comparing the uploaded source code with the bytecode deployed to the blockchain, but in this case some 3rd party actor can upload incorrect IDL for a given program intentionally, so it sounds like that this feature is necessary, unless I’m missing something.

Ideally this does not require the program authority to upload a discriminator or IDL. The idea here is to put the dataset in the hands of developers. If someone has previously found an IDL or someone has a discriminator and knows the args, they should be able to upload it. The key here is that the dataset does not attribute an IDL to a program. You can combine this user-generated dataset and scrape all of the anchor PDAs that hold IDLs and create a very comprehensive list that also includes authorized IDLs (we’ve already found indexed all of the anchor IDLs and are happy to share the dataset)

What does “resolve an input given a hash” mean in the RFP description?

Reverse mapping of hash → function if the function is already in the dataset

@pkxro thank you for the details + pointer to 3097!
I would love to show you what we’re building, will you be Breakpoint?

Hi @pkxro!

Thank you for your answers!

I have one more question regarding this RFP: the RFP mentions the ability to upload/search discriminators/IDL for a given program. I wonder what’s the point of providing the ability to upload the discriminator separately, since uploading the IDL covers all the program discriminators. Does it make sense to make the frontend/API to upload only IDLs?

As a follow-up question, I also wonder what does the “upload discriminator” mean? It seems that upload the discriminator itself doesn’t make much sense, is it right that “uploading the discriminator” also includes uploading the associated account/instruction data (i.e. the part of the IDL)?

Thank you in advance!

1 Like