Currently on mainnet-beta on-chain votes are sent incrementally. While network packets are always subject to some packet loss, resulting in missing votes, another issue arises when votes are received out of order. This can lead to a situation where the receiving validator has an inaccurate view of the on-chain vote state and will have to wait for its local lockout to explore before votes start to land.
While waiting for lockout to expire is standard behavior when a validator needs to switch forks. When a validator’s local vote state diverges from the on-chain vote state, there is potential for the validator to think its lockout is much higher than was observed by the rest of the cluster. This can therefore lead to a situation where the validator spins unnecessarily waiting for the lockout to expire, or simply continues voting with invalid votes and does not have its stake properly observed by the rest of the cluster. This can then cause a chain reaction where other validators on the network are slower to switch off of minority forks because they are delayed in observing the divergent validator’s stake. Therefore, the further a validator’s local vote state diverges from the on-chain vote state, the more potential for degraded fork choice performance within the cluster.
In order to address this potential issue, the Solana Labs validator client leverages two conjoined changes. Firstly, each validator no longer just sends the incremental vote, but instead sends its entire vote state, known as its vote tower. This reduces forking because the local vote-state of a validator is now more tightly in sync with the on-chain vote state. The code responsible for this part of the changes has been included in the Solana Labs validator client codebase since v1.10, however the feature gate to activate it has not been activated on mainnet-beta. It has however been live on testnet since epoch 365 and has received extensive testing as a result.
The second conjoined change comes in v1.14 and motivates the reason why the first part was not activated on mainnet-beta while being active on testnet for so long. While sending this extra information has the effect of improving fork choice performance, it causes significantly more block space to be used for votes. Testnet trials showed this change to increase the vote instruction size by around 4 times. This caused a noticeable increase in the number of broadcast shreds, but did not cause any consensus issues. In light of this, Solana Labs engineers opted to delay activation till further optimization could be achieved. This optimization comes as part of v1.14, and addresses the increase in memory and gives the change its namesake, “Compact Vote State.”
These vote states are now losslessly compressed by utilizing a few tricks. The first is that the vote trees index their slots by offsets from the root, allowing the use of smaller data types for indexing their position in the vote tower. Secondly, a special serialization method is employed that allows representation of the data during transmission to be further compacted. All in all, the resulting compact vote state now consumes comparable memory overhead to the original votes. Only increasing the memory footprint of votes by about 20%, while allowing that vote to contain the full vote tower of votes rather than a single sequential member.
While this change improves fork choice performance of the network, it also enables the implementation of Solana Improvement Document (SIMD) titled “Timely Vote Credits.” Without this change the proposed changes are unable to proceed due to dependency on the vote tower.