QUIC-TLS in Firedancer (fd_tls)

Opportunities to simplify TLS over peer-to-peer connections


I wanted to share some progress on fd_tls and kick off general discussion about the use of TLS in the Solana protocol.

Disclaimer: fd_tls is not an officially supported component of Firedancer.


Background

Since the adoption of the QUIC protocol, Solana’s peer-to-peer layer depends on the TLS protocol for securing connections. Currently, the Solana Labs client uses the rustls library, and Firedancer uses quictls, a fork of OpenSSL.

I started fd_tls as an experiment to replace third-party network dependencies in Firedancer, with the intention of making fd_quic entirely self-hosted. It aims to implement the minimum amount of components required to secure peer-to-peer connectivity, while staying compliant with TLS 1.3 (RFC 8446) and QUIC-TLS (RFC 9001).

TLS is commonly seen as a complex standard due to its lengthy history of bugs and changes, all while maintaining backward-compatibility. Since the deployment of TLS in Solana has no such backwards-compatibility requirements, there is opportunity to shed some complexity and make the handshake logic of the QUIC protocol more robust against various types of attacks.

The development philosophy for Firedancer thus far has been to own the entire Solana validator stack from OSI Layer 2 upwards. This is a lot of work, but has the advantage of reducing the amount of unknowns. (Such as: “How would our QUIC library behave in a specific edge case?”). It also reveals opportunities for deep optimization. However, all of this new networking code presents additional attack surface and will have to get audited.

Considering the above, we strongly suggest minimizing code complexity and the amount of cryptographic algorithms in the Solana validator network.

Protocol

https://quic.xargs.org/ is a great resource explaining every step of the QUIC-TLS handshake. I will try to summarize it in my own words.

QUIC-TLS in Solana is a combination of three separate protocols:

  1. The TLS handshake layer (as the name implies, only active during the handshake)
  2. X.509 certificates (mostly unused)
  3. The QUIC record layer, which specifies how QUIC packets get encrypted (comparable to the TLS or DTLS record layers for TLS connections over TCP or UDP)

In TLS version 1.3, the latest version at the time of writing, creating a connection involves the following high-level steps:

  1. Negotiate a suite of cryptographic algorithms
  2. Establish a “handshake-level” symmetric encryption key using X25519, an Elliptic Curve Diffie-Hellman key exchange algorithm
  3. Exchange and verify X.509 peer certificates containing Ed25519 signatures
  4. Establish an “application-level” symmetric encryption key

TLS versions

An obvious first step is to drop support for legacy TLS versions. TLS 1.3 is more secure and much simpler than older TLS versions. It finds almost ubiquitous support and is currently the default in Solana peer-to-peer connections.

Cryptographic Algorithms

TLS 1.3 incorporates a flexible mechanism for negotiating cryptographic algorithms. In early steps of the handshake, the client advertises a list of algorithms it supports. The server then picks a combination of them.

The main types of algorithms being negotiated are as follows:

  1. Key Exchange cryptography. Solana Labs validators support X25519, secp256r1, and secp384r1.
  2. Cipher Suites: Solana Labs validators support the TLS 1.3 recommended Authenticated Encryption suites: AES-128-GCM-SHA256, AES-384-GCM-SHA256, and ChaCha20-Poly1305-SHA256. (Note: This implies HMAC-SHA256, not “pure” SHA)
  3. Signature Algorithms: Solana Labs validators support 9 signature hash algorithms, including EdDSA (Ed25519), 2x ECDSA-based schemes, and 6x RSA-based schemes.

Some of the above cryptography is already in use in the Solana protocol.

  • SHA-256 (almost everywhere in the Solana protocol)
  • Ed25519 (transaction signatures)
    • by extension, Curve25519 used in X25519
  • The ChaCha20 block function (on-chain randomness)

Other algorithms were newly introduced by adopting QUIC. Notably, RSA-based signature schemes are considerably slower than the elliptic curve alternatives.

Luckily, to establish a TLS connection, only one cryptographic algorithm of each type is required. Therefore the first version of fd_tls will only support X25519 KEX, AES-128-GCM-SHA256 AEAD, and Ed25519 signatures. (Potentially also ChaCha20-Poly1305-SHA256)

X.509

Another obvious opportunity for reducing complexity is eliminating the use of X.509 certificates. X.509 secures peer identity through a chain of trust, anchored in a set of root CAs. This model does not fit permissionless networks well, in which peers are inherently identified by their public keys, as opposed to a domain name (like the server of the forum.solana.com site you are currently reading).

Consequently, the use of X.509 certificates in Solana is awkward: Nodes serve auto-generated certificates that are signed by themselves, and their peers verify this useless signature.

For each connection the validator makes, it then generates an additional “CertificateVerify” proof. It involves using the certificate’s key to sign a hash that is tied to the current connection. This proves that it is in possession of the key advertised by the certificate.

From the perspective of the verifier, this means the following steps are involved when accepting a new QUIC connection:

  1. Parse the X.509 certificate (DER serialization over various complex ASN.1 data structures)
  2. Verify the certificate chain (signature verification)
  3. Extract the Ed25519 public key of the peer
  4. Verify the “CertificateVerify” proof

Raw Public Keys

RFC 7250 introduces a second certificate type: Raw Public Keys (RPKs)

RPKs consist of a minimal ASN.1/DER prefix followed by a copy of the serialized public key.

The new verifier steps then become:

  1. Negotiate RPK support via the CertificateType extension
  2. Parse the RPK ASN.1 prefix
  3. Verify the “CertificateVerify” proof

Not only is this mechanism much simpler; It also decreases the maximum byte count of a TLS handshake.

Unfortunately, support for RPKs is sparse. It is currently not supported by stable releases of OpenSSL, GnuTLS, quictls, rustls, nor the Go standard library. OpenSSL and GnuTLS both provide experimental support. My attempt to use an Ed25519 RPK with GnuTLS failed for unknown reasons.

If time permits, I would like to contribute RFC 7250 support to the Go standard library and rustls. I would greatly appreciate any help with this task.

fd_tls

Finally, an update on fd_tls:

fd_tls is currently able to correctly derive TLS 1.3 decryption keys up to the handshake level when speaking to OpenSSL. So far, my experience with implementing TLS 1.3 has been quite pleasant. There are no obvious blockers to completing self-hosted QUIC-TLS support; it is simply a matter of time. I am currently working on additional TLS extension types and the Certificate/CertificateVerify message types.

Whether we’ll use fd_tls in production is unclear. Certainly, before attempting to do so, fd_tls needs to pass tlsfuzzer torture and various other conformance tests.

I hope this post was informative and I’m looking forward to continue discussion on Solana’s network protocols.

6 Likes

:+1: Not only is this much easier, it is also considered best practice. Same with restricting cipher suites to the small set of recommended modern suites.

Agreed that removing ASN.1 and X509 parsing attack surface is highly desirable.

But, the big question would be interoperability - as you say, there’s barely any support in third party clients, and it would have to be plumbed all the way up from the TLS library to each QUIC library people might want to use.

I’m not sure how feasible that is and nobody really knows what third party clients are currently in use (mostly traders and similar users).

The QUIC TPU is a public interface used by clients other than solana-validator and Firedancer, and there would have to be some period of time where both kinds of credentials would be accepted.

3 Likes

Agree with all of the above.

But, the big question would be interoperability - as you say, there’s barely any support in third party clients, and it would have to be plumbed all the way up from the TLS library to each QUIC library people might want to use.

Once OpenSSL 3.2 and GnuTLS 3.8.0 get released, hopefully a critical mass is reached to drive further adoption of RFC 7250. I consider this an opportunity for the Solana community to introduce RPK support to other TLS libraries.

My random guess would be that widespread support for RPK takes at least 6 months.

The QUIC protocol itself is cleanly separated from peer authentication and other QUIC libraries I’ve looked at (quinn, ngtcp2) don’t require plumbing for this feature. Those take an arbitrary TLS config object. Once rustls supports RPKs, adding support to quinn is as easy as bumping the rustls dependency version number.

The logic to present an X.509 certificate is luckily minimal (a literal memcpy). It is more work to verify a cert, but fd_tls could still call down to the OpenSSL API. So it’s hardly a blocker.

I’m not sure how feasible that is and nobody really knows what third party clients are currently in use (mostly traders and similar users).

The improvement document process will give maintainers of custom peer-to-peer clients an opportunity to share their concerns.

The QUIC TPU is a public interface used by clients other than solana-validator and Firedancer, and there would have to be some period of time where both kinds of credentials would be accepted.

This is the right way and is straightforward to implement: Negotiate between both certificate types using the CertificateType extension. If the other peer does not recognize the extension, it will fall back to X.509.

1 Like

Server-side key schedule

2 Likes