SIMD-0326: Proposal for the New Alpenglow Consensus Protocol

Yes, very much so. Rotor will have its own SIMD.

Yes, there will be many more SIMDs related to Alpenglow. Some of them are already PRs, for instance SIMD-0337 about the data structure of optimistic leader change.

Many of the SIMDs (like 337) will be about “implementation details”. This first Alpenglow SIMD is really setting the stage to what else is to come.

Votor and Rotor however are independent of each other. Votor is more urgent since Turbine is already good (but Alpenglow will further improve with Rotor in place).

We understand that this SIMD is special, and does not follow exactly the vibe set in the past. However, nobody wants to read a 200 page SIMD. So it is a bit more high level that what you might be used to.

Yes, but this SIMD is special, and it is the base for all the other SIMDs to come. If we do not get consensus here, we cannot move forward.

Look, Votor is much more than “replacing the current voting mechanism”. It is a complete change of the whole core protocol.

Agreed. We’re have been starting to implement Alpenglow independently of this SIMD. However, we engineers need a signal whether we can go ahead, i.e., there is a general consensus going forward with Alpenglow.

We are working on a migration plan as we speak, and this migration plan also includes a fallback plan.

@solostaker:

  1. Could you expand on the future plans for the VAT? As I understand it SIMD-0257 proposed removing vote fees entirely, greatly reducing the fixed cost for validator operators. Is such a thing possible under the VAT scheme?

We don’t have a concrete plan for VAT in the future. We discussed a few alternatives, and also another post higher up suggests a future VAT plan. We will try to come up with a proposal for a more dynamic VAT, however, that will come completely with its own SIMD. (This is the Alpenglow SIMD, not the “change the economics” SIMD.)

  1. What is the replacement for blockhash now that PoH is gone? As an ecosystem observer this seems like the biggest double spend attack vector, do we still have the guarantee that transactions cannot be spoofed or resubmitted under the new schema?

We will still have a blockhash, and (unless we write another SIMD about this) we will keep the 150 slot replay security. We we also still have slot numbers. Nothing is changing regarding this.

  1. Could you expand on Definition 17 from the paper, specifically what is Δtimeout? I see that is is 1Δ + 2Δ, what does this mean in ms? Does Timeout correspond to 400ms as in leader will have less time to build the block? Are there expected changes to Jito auction because of Δtimeout?

We will start with the timeouts exactly defined as in the white paper, i.e. all the Δ = 400 ms, and the timeout being computed accordingly. However, during testing, we might set the timeout a bit lower. We will always make sure that the timeouts are generous enough even in the most far away geographic location.

  1. What happens to unstaked nodes under Alpenglow? I see in the SIMD you specified that all validators are staked - will unstaked validators still be able to participate to send txs and run RPC queries?

Let us postpone this discussion to the Rotor SIMD. For the moment, nothing changes still we will still operate with Turbine. So unstaked nodes can still get their shreds from Turbine.

  1. How does section 2.7 work for transactions? “In this case, slices 1,…,t - 1 are ignored for the purpose of execution” - how should I alter my user workflow if this happens? If my user submits a tx and it ends up getting ignored will it be retried? Or should we ask them to resign the tx?

Section 2.7 should not affect your workflow. Section 2.7 makes sure that we can change leaders as efficiently as possible. In 99% of the cases, the leader will anticipate the parent block correctly. Only if the parent changes while sending out the block, the leader will basically restart the block. While technically we do something more advanced, think of it as the leader just scratching whatever they have been doing, and sending out a new block. In other words, if your tx was in the scratched part of the block and it is still valid with the new parent, the leader will just send it out again (in the same position).

  1. Finally my boomer question, what is the motivation? is it not possible to speed up TowerBFT? Or is there some fundamental flaw in Solana that requires us to switch? I welcome the speedup but this seems like a really risky upgrade that opens us up to a lot of FUD, could you provide some context on why this is necessary - especially now in a bull market with so many eyes on us.

a) There is no security proof for TowerBFT. It’s nice that it works, but Solana needs a protocol which is secure.
b) Alpenglow makes Solana 100x faster (regarding finalization). It is very important that Solana has the best protocol in the market, and Alpenglow will achieve that goal. Ultimately the protocol with the highest adoption (most apps, most txs), the highest security, the highest performance will win. With Alpenglow, Solana will be the top chain in all three metrics.

  1. To add on to the previous point, what steps are we taking to test this change? It seems on par with The Merge, will there be a parallel chain or will this take place all at once? Are there any auditors that have signed off on this change? Are there any eyes on loss of funds attacks?

We are currently writing a doc how to change from Tower to Alpenglow. We call this AlpenSwitch. We will test this switch extensively. In fact, we plan to switch back and forth between Tower and Alpenglow 1000x on the Testnet before we go to Main. I agree with you that this is similar to Ethereum’s “Merge”, but we plan to not have a parallel chain. We simply switch. We have made an audit on Alpenglow, but not yet on AlpenSwitch.

Thanks for these good questions!

2 Likes

@Umberto:

You are correct. This is the main reason why the VAT is lower than the current voting cost. Note that burning also helps validators because the burn will essentially help inflation, so the money stays in the community. In fact, the burning will help the validators according to their stake as well, so adding 1 SOL of inflation and then send that to validators or burning 1 SOL is essentially very similar.

I disagree with you. 7% inflation – 1% burning ≈ 6% inflation ≈ 7% inflation – 1% additional income. Not everybody who owns SOL is staking, but 80% are… this is the main reason why we didn’t simply go for 2 SOL VAT, but something slightly below.

This is true, and is an argument on top of the previous argument that the switch to Alpenglow will actually be an economic incentive for validators.

Regarding “timing games”: Alpenglow will make this much better.

Your comments are appreciated.

However, please note that all the points you make were clear to us when we proposed the VAT. Our biggest worry is that all of a sudden we have 10,000 new validators joining because now participation is “free”. Once Alpenglow is running and stable, we will look into this again. If you have a better solution that makes guarantees that the validator count will be stable during the transition, I would like to hear from you.

@Roger thanks for the quick reply:

This is true if all the stake is self-stake, and this is not the current Solana’s snapshot. Further, current race to 0 on commissions is forcing more and more operators to set inflation commissions to 0%, and you have a huge portion of validators relying mostly on block rewards.

On another point

inflation is meant to be a tax on non stakers. We do agree that current inflation might be dumb (see discussion around SIMD228), but it is still fair. In this way, you are doing quite the opposite of the meaning of inflation. Indeed, you are moving SOL from validators pockets to non stakers pockets, that’s not exactly what you should do with inflation. Even if the effect is small, that’s not fair.

I see no study regarding this. One should consider how parallelization can be more effective and how CU by application can be increased to compensate the absence of vote fee. But I think this is no the goal of this proposal, my concerns are related only to economic changes introduced by the VAT.

I think that the purpose of the proposal is to make all points clear to everyone. I cannot know what is clear to the proposer and what’s not. I know it’s a tedious work to write down all the points, but that’s how a proposal should be done so everyone is aware of what’s known.

You can add a cap + a 1 SOL VAT (100% burned). In this way you don’t change the economic (since 1 SOL per validator per epoch is already burned) and avoid new validator joining.

Finally,

the article highlight the presence of a possible bug/inefficiency in Agave around CU and how timing games exploit it. So it was not just for the timing game issue, but related to the CU discussion. Re the “Alpenglow will make this much better”, I argue that’s not true. If the upsides are really related to block packaging inefficiency, Alpenglow will do little about that. And for the “pure timing game”, I can do it on any chain no matter what the consensus is. The question is more on what are the incentives compared to risks, this is why you see it happening on Solana and Ethereum (rewards >> risks).

1 Like

The cap is an ultimate security measure. It works best if it never needs to be enforced. Once it needs to be enforced, it generates a lot of nasty side effects depending on how it is implemented:

  • Say, if we choose the validators randomly, then some validators will sometimes not be included (while they still have to keep the hardware going, and paying for it)
  • If we choose the validators by stake, then the smaller validators will be excluded, and for the medium ones they might try to outbid each other to be included in the next epoch at the cost of the others.

“moving SOL from validators pockets to non stakers pockets” is a bit an exaggeration isn’t it? You make it sound as if it would be more valuable to be a non-staker than a staker, which is clearly not the case. As a non-staker you still pay all the inflation costs, whereas as a staker you clearly do not.

Let me try to argue again: The idea of this proposal is to make the Solana protocol secure and performant. The 1.6 VAT is our proposal to make the economic changes as small as possible.

If I want to maximize “security and performance,” then we want 30-50 validators, all with 1% to 5% stake, geographically diverse. Apart from geographic diversity, this can best be achieved by setting the VAT dynamically so that we have 30-50 validators. If we have more, the VAT goes up, if we have less, the VAT goes down. Then we don’t have to argue about the VAT amount, as it is chosen by the market. Better?

You can create a list based on performance (like vote performances or whatever).

I’m not saying it’s the right solution, but for sure it wouldn’t affect current economic since you set the cap > actual number of validators → the issue presents iif we start to see increase in N° of validators, that’s what we want to take under control. Further, as I understand, it’s a temporary solution to make the transition smooth. So, I don’t really see the issue in what you pointed.

of course it’s an exaggeration, it is meant to make clear that the reasoning around lower inflation due to more burn is not accurate. Indeed the burn comes solely from validators, not network wide usage (like the burn in current implementation).

meh not really…(exaggerating again to make it clear) if I impose a 10% VAT on stake and I burn it, the VAT is paid by validators but inflation is lowered for everyone. So now I have

real_mint = inflation x tot_supply - vat x tot_stake

or

real_apr = real_mint / tot_stake = current_apr - vat (potentially negative)

meaning the dilution for non stakers is VAT lower, and minted amount to redistribute to stakers is VAT lower.

We all want that, and I’m in favour of this change (as I think everyone at Chorus One)!

It’s this statement that imho is not really detailed. It may be that you did a companion analysis to prove that, but with the facts we have it’s hard to figure out how this can be “small”. The 1.6 SOL is not the issue, you can set it to 2 SOL and would be fine as well. It’s the idea to burn it entirely that has no proof to be a “small economic change”

We don’t have a formal analysis. This part of economics is not a hard science.

We understood that the value should not be higher than 2 (because of the cost today), and not lower than 1 (because of the burning rate today). Because roughly 80% of SOL is staked, we argued (among ourselves) that the number should be closer to 2 than to 1, and then we gave a bit of a discount to not make anybody grumpy. I understand that this is not as scientific as the months we invested into finding the best consensus protocol (and proving its correctness).

In the end, for the protocol switch, our goal is to keep the set of validators as close as possible to the set we have today, because we want the system as stable as possible. If you know validators which would quit because of the 1.6 SOL, let me know. If anything, I would expect the number of validators to go slightly up with the chosen constant.

I’m really confused :sweat_smile:

Maybe it’s me, I’m not able to explain what I mean…

You can have 10 SOL as well for VAT, it’s not the VAT amount per se. The feature that changes the economics is “the willingness to burn it - entirely”. That’s not rocket science, and it’s changing the current economic - for all point discussed above. No need for an analysis that goes above the rough estimate done where you wipe out 1k SOL per epoch for no real reason.

It doesn’t make sense to collect 10 SOL but then redistribute 9 SOL to the validators, as this is more complicated but equivalent of just collecting 1 SOL. With everything else becoming strictly better (for instance more space for actual transactions, more MEV), potential validators have now an increased incentive to join the chain.

:sweat_smile: I think we are not on the same page…

Well, that’s not true for several reasons. It’s not that if you remove vote txs you magically have more space and then more MEV…it’s far from being like that

Indeed weird. I thought I understand your argument, but maybe not? Let me try a few questions to understand where you disagree first:
a) Would you agree that collecting 2 SOL VAT / epoch (flat), burning 1 SOL, and redistributing the other 1 SOL (according to stake) is not worse for anybody than the status quo?
b) Would you agree that for small validators it would be better to just collect 1.6 SOL (as in the current version of the SIMD), whereas for big validators it is a bit worse?

I do agree on both!

The issue is that you have an extra feature:

and as we discussed this has 2 main consequences:

  1. Move SOL from validator pockets to non stakers (indeed reducing inflation means reducing the cost for non stakers, and the VAT is paid by validators).
  2. With current design, validators down to 0.2% of the stake are roughly delta neutral to vote costs per epoch. With this proposal passing as is, these are now 1.6 SOL negative per epoch.

Let me ask differently: If somebody asked you to suggest a simple economic system that preserves the current economics as much as possible, what would you suggest?

Mainly 2 models:

  1. 2 SOL VAT - 50% burn 50% redistributed by stake (i.e. the current implementation)
  2. 1 SOL VAT + cap on validator number (e.g. 2k validator max) with 1 SOL burnt

Model 1 is less fair, but it’s exactly what we have right now.

Model 2 is economically better since you don’t move SOL from low stake to high stake, but you need to define a cap on max validator and define a metric to rank validators (that’s not rocket science, for the purpose of chain incentives alignment you can use a performance metric)

What is the incentive for validators to validate in Alpenglow? Credits appear to be awarded regardless of the “accuracy” of the vote, so a validator could just vote “skip” on every block and receive full credits without actually validating anything. It is possible that I misunderstand the vote credits system as proposed though, and if I do, please help me to understand. Thank you.

2 Likes

You are 100% correct. Let me try to answer this question in two very different ways:

  1. We assume that less than 20% of the validators are malicious, and voting wrongly is malicious behavior! You would need to change your code for absolutely no gain, as you make no less money by just acting correctly. So would anybody really do that?
  2. We will have metrics, so other nodes report what you’re doing. If you vote differently from everybody else, it will be quite obvious. So if this really happens, we can (and will) take countermeasures.

This is changing the economics, and didn’t you clearly state that you didn’t like the 1.6 SOL because it’s changing the economics? :slight_smile:

I can also live with 2 SOL + redistribution. It’s more complex to implement but at least nobody can argue that it’s different from today!

But you can’t prove malicious behavior for a validator voting skip. The validator could have a legitimate case that it didn’t see the shreds. So essentially incentives for proper behavior are not in protocol, they are by “honor system”. This seems worse than the existing vote credits system which through the lockout mechanism rewards votes that align with the supermajority.

The reason for aligning rewards with desired behavior is to ensure that even in cases where you can’t ahead of time predict why validators would have other incentives to vote poorly, validators still will choose to vote in a cluster-beneficial way.

Furthermore, because in protocol you have defined skip votes as just as valid and worth reward as other votes, you are encoding into protocol the notion that voting just skip all the time is perfectly valid. It doesn’t make a lot of sense to me to design a rewards mechanism that through its very design makes voting skip all the time just as valid as any other type of voting, and then calling voting in a way that is aligned with protocol-defined incentives “malicious behavior”.

4 Likes