Deprecate X.509 certs for P2P connections

In QUIC-TLS in Firedancer, I shared some specifics of the TLS setup in TPU/QUIC. I also proposed a number of protocol changes that sheds some unnecessary complexity.

In this post, I wanted to share a technique to almost entirely remove X.509 logic from uses of QUIC in Solana, without changing protocol logic. TL;DR It allows validator implementations to

  • Eliminate thousands of lines of third party code
  • Reduce validator identity key exposure and signing operations

I’m looking forward to your feedback!

Background

The Solana Labs implementation of the Solana peer-to-peer protocols currently produce unnecessary signatures and import thousands of lines of unnecessary dependency code. In the interest of safety, it is worth critically reviewing any line of code exposed to untrusted users, as well as any line of code exposed to sensitive data (such as node private keys).

Meet QUIC, the transport layer used in some P2P connections. QUIC comes with a great deal of complexity and feature creep.

Somewhere deep within the connection establishment logic is the target of this post: X.509 certificates. X.509 is an incredibly complex protocol that is mostly useless to the peer-to-peer layer. The only reason it is used is historical: QUIC uses the TLS 1.3 handshake for authentication, and rustls (the TLS library used in Solana Labs) does not support any replacement for it.

Mutual authentication

We are actually only interested in one property of the certificate: It holds the peer’s supposed public key.
When making a connection, peers exchange their public keys and then use a challenge-response mechanism each other to prove that they are in possession of the corresponding private keys. (TLS 1.3 CertificateVerify; RFC 8446, Section 4.4.3)

TLS connections on the web would typically also use this X.509 certificate to associate an external identity, like a domain name (e.g. forum.solana.com), as well as a signature chain vouching for the certificate’s validity.

Solana validators, however, are inherently identified by their identity public key. There is no need to associate this key with external information. Consequently, there is no need for these X.509 certificates any signature chain nor any other pieces of data other than the public key itself.

Notably, validators also have the ability to treat peers as “anonymous” and ignore their identity. This works because the message content is often authenticated by itself, regardless who is the sender. (Such as a gossip message)

Parsing is useless

A surprising amount of code is required to encode and decode X.509 certificates. Complex parsers are particularly susceptible to security issues, such invalid memory accesses, infinite loops, and unbounded heap allocations. Even memory safe languages are not enough to prevent these sort of bugs.

But the only winning move for this game is to not play. If we don’t do complex parsing, we significantly reduce attack surface for these vulnerability classes.

A commendable effort by ANSSI-FR to write a verified parser weighs in at about 10000 lines of code: GitHub - ANSSI-FR/x509-parser: a RTE-free X.509 parser Unfortunately, I could not get the Frama-C verification tooling to run without errors myself though. No full X.509 parsing verification effort exists for Rust, but some projects have partial coverage through fuzzing.

To understand how to ditch parsing, let’s look at a hex dump of a minimal certificate that TLS libraries can decode. The ff ff ff ff strings are the public key (SubjectPublicKeyInfo) and the signature placeholders respectively.

0000:  30 81 f1 30 81 a4 a0 03 02 01 02 02 08 01 01 01  0..0............
0010:  01 01 01 01 01 30 05 06 03 2b 65 70 30 11 31 0f  .....0...+ep0.1.
0020:  30 0d 06 03 55 04 03 0c 06 53 6f 6c 61 6e 61 30  0...U....Solana0
0030:  20 17 0d 37 30 30 31 30 31 30 30 30 30 30 30 5a   ..700101000000Z
0040:  18 0f 34 30 39 36 30 31 30 31 30 30 30 30 30 30  ..40960101000000
0050:  5a 30 00 30 2a 30 05 06 03 2b 65 70 03 21 00 ff  Z0.0*0...+ep.!..
0060:  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
0070:  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff a3  ................
0080:  29 30 27 30 17 06 03 55 1d 11 01 01 ff 04 0d 30  )0'0...U.......0
0090:  0b 82 09 6c 6f 63 61 6c 68 6f 73 74 30 0c 06 03  ...localhost0...
00a0:  55 1d 13 01 01 ff 04 02 30 00 30 05 06 03 2b 65  U.......0.0...+e
00b0:  70 03 41 00 ff ff ff ff ff ff ff ff ff ff ff ff  p.A.............
00c0:  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
00d0:  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
00e0:  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
00f0:  ff ff ff ff                                      ....

It is sufficient to replace the bytes of the public key and signature placeholders with the validator’s actual values.

Similarly, to parse a certificate we could apply this template as a “mask” check that all bytes except for the placeholder match, and then trivially extract the public key. If the mask does not match, we simply consider the peer fully anonymous, or reject the connection.

This mechanism is also backwards compatible: Validators using a real parser will still be able to decode this template.

It is admittedly hacky, but also simple to reason about. Considering X.509 should have never been used in the Solana protocol, I find it worth deleting a large amount of pointless code.

Self-signing is useless

While tinkering with this, I realized that the signature part of the X.509 certificate is also entirely pointless in the context of peer-to-peer connections. Peer authenticity is proven using a separate signature mechanism in the TLS 1.3 layer. The X.509 signature allows trusted third parties to sign (and thereby certify) someone’s certificate. But as mentioned eariler, we don’t need anyone to certify validator identity keys.

But don’t take my word for it. Validators do not verify the X.509 signature field, and you can put whatever you want in it.

The signature field is required nonetheless, so Solana Labs validators used the “self-signed certificate” pattern, by signing their certificate with their own validator identity key. This is more problematic than it sounds: It creates an instance of key reuse, where the same key is used to sign messages of different types.

Using the CBMC verification system, I was able to prove that this instance of X.509 signing is not ambiguous with regards to any other types. (See here and here). But yet again, it is preferable to not have this risk in the first place.

Another concern is the exposure of the private key itself. Although unlikely, a supply chain attack in third-party dependency code could compromise the private key.

So, let’s just simply put a bunch of one bits in the X.509 signature field.

Conclusion

In the Firedancer validator, we replaced ~10000 of lines of code with about a dozen.

FD_IMPORT_BINARY( template, "cert_template.der" );

void
generate_cert( uchar       cert_out[ static 0xf4 ],
               uchar const pubkey  [ static 0x20 ] ) {
  memcpy( cert_out,      template, 0xf4 );
  memcpy( cert_out+0x5f, pubkey,   0x20 );
}

uchar *
extract_pubkey( uchar         pubkey_out[ static 0x20 ],
                uchar const * cert,
                ulong         cert_sz ) {
  uchar check[ 0xf4 ];
  if( cert_sz!=0xf4 ) return NULL;
  memcpy( pubkey_out, cert+0x5f, 0x20 );
  memcpy( check,      cert,      0xf4 );
  memset( check+0x5f, 0xff,      0x20 );
  return 0==memcmp( check, template, 0xf4 ) ? pubkey_out : NULL;
} 

A related patch is available for Solana Labs: don't sign X.509 certs by ripatel-fd · Pull Request #34202 · solana-labs/solana · GitHub

A standard solution to address this problem exists and is also implemented in Firedancer: RFC 7250 - Using Raw Public Keys in Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS)

6 Likes

Solana Labs v1.18, which is the majority of testnet, now runs with X.509 dummy certs :tada:

1 Like