Why Signature Verification in OpenPGP is hard

An Enigma cipher machine which is probably easier to understand than OpenPGP
An Enigma cipher machine which is less secure but also easier to understand than OpenPGP.
Photo by Mauro Sbicego on Unsplash.

When I first thought about signature verification in OpenPGP I thought “well, it cannot be that hard, right?”. In the end all you got to do is check if a signature was made by the given key and if that signature checks out (is cryptographically correct). Oh boy, was I wrong.

The first major realization that struck me was that there are more than just two factors involved in the signature verification process. While the first two are pretty obvious – the signature itself and the key that created it – another major factor is played by the point in time at which a signature is being verified. OpenPGP keys are changing over the course of their lifespan. Subkeys may expire or be revoked for different reasons. A subkey might be eligible to create valid signatures until its binding signature expires. From that point in time all new signatures created with that key must be considered invalid. Keys can be rebound, so an expired key might become valid again at some point, which would also make (new) signatures created with it valid once again.

But what does it mean to rebind a key? How are expiration dates set on keys and what role plays the reason of a revocation?

The answer to the first two questions is – Signatures!

OpenPGP keys consist of a set of keys and subkeys, user-id packets (which contain email addresses and names and so forth) and lastly a bunch of signatures which tie all this information together. The root of this bunch is the primary key – a key with the ability to create signatures, or rather certifications. Signatures and certifications are basically the same, they just have a different semantic meaning, which is quite an important detail. More to that later.

The main use of the primary key is to bind additional data to it. First and foremost user-id packets, which can (along with the primary keys key-ID) be used to identify the key. Alice might for example have a user-id packet on her key which contains her name and email address. Keys can have more than one user-id, so Alice might also have an additional user-id packet with her work-email or her chat address added to the key.

But simply adding the packet to the key is not enough. An attacker might simply take her key, change the email address and hand the modified key to Bob, right? Wrong. Signatures Certifications to the rescue!

   Primary-Key
      [Revocation Self Signature]
      [Direct Key Signature...]
      [User ID [Signature ...] ...]
      [User Attribute [Signature ...] ...]
      [[Subkey [Binding-Signature-Revocation]
              Subkey-Binding-Signature ...] ...]

Information is not just loosely appended to the key. Instead it is cryptographically bound to it by the help of a certification. Certifications are signatures which can only be created by a key which is allowed to create certifications. If you take a look at any OpenPGP (v4) key, you will see that most likely every single primary key will be able to create certifications. So basically the primary key is used to certify that a piece of information belongs to the key.

The same goes for subkeys. They are also bound to the primary key with the help of a certification. Here, the certification has a special type and is called “subkey binding signature”. The concept though is mostly the same. The primary key certifies that a subkey belongs to it by help of a signature.

Now it slowly becomes complicated. As you can see, up to this point the binding relations have been uni-directional. The primary key claims to be the dominant part of the relation. This might however introduce the risk of an attacker using a primary key to claim ownership of a subkey which was used to make a signature over some data. It would then appear as if the attacker is also the owner of that signature. That’s the reason why a signing-capable subkey must somehow prove that it belongs to its primary key. Again, signatures to the rescue! A subkey binding signature that binds a signing capable subkey MUST contain a primary key binding signature made by the subkey over the primary key. Now the relationship is bidirectional and attacks such as the one mentioned above are mitigated.

So, about certifications – when is a key allowed to create certifications? How can we specify what capabilities a key has?

The answer are Signature… Subpackets!
Those are some pieces of information that are added into a signature that give it more semantic meaning aside from the signature type. Examples for signature subpackets are key flags, signature/key creation/expiration times, preferred algorithms and many more. Those subpackets can reside in two areas of the signature. The unhashed area is not covered by the signature itself, so here packets can be added/removed without breaking the signature. The hashed area on the other hand gets its name from the fact that subpackets placed here are taken into consideration when the signature is being calculated. They cannot be modified without invalidating the signature.

So the unhashed area shall only contain advisory information or subpackets which are “self-authenticating” (meaning information which is validated as a side-effect of validating the signature). An example of a self-authenticating subpacket would be the issuers-id packet, which contains the key-id of the key that created the signature. This piece of information can be verified by checking if the denominated key really created the signature. There is no need to cover this information by the signature itself.

Another really important subpacket type is the key flags packet. It contains a bit-mask that declares what purpose a key can be used for, or rather what purpose the key is ALLOWED to be used for. Such purposes are encryption of data at rest, encryption of data in transit, signing data, certifying data, authentication. Additionally there are key flags indicating that a key has been split by a key-splitting mechanism or that a key is being shared by more than one entity.

Each signature MUST contain a signature creation time subpacket, which states at which data and time a signature was created. Optionally a signature might contain a signature expiration time subpacket which denotes at which point in time a signature expires and becomes invalid. So far so good.

Now, those subpackets can also be placed on certifications, eg. subkey binding signatures. If a subkey binding signature contains a key expiration time subpacket, this indicates that the subkey expires at a certain point in time. An expired subkey must not be used anymore and signatures created by it after it has been expired must be considered invalid. It gets even more complicated if you consider, that a subkey binding signature might contain a key expiration time subpacket, along with a signature expiration time subpacket. That could lead to funny situations. For example a subkey might have two subkey binding signatures. One simply binds the key indefinitely, while the second one has an expiration time. Here the latest binding signature takes precedence, meaning the subkey might expire at lets say t+3, while at t+5 the signature itself expires, meaning that the key regains validity, as now the former binding signature is “active” again.

Not yet complicated enough? Consider this: Whether or not a key is eligible to create signatures is denoted by the key flags subpacket which again is placed in a signature. So when verifying a signature, you have to consult self-signatures on the signing key to see if it carries the sign-data key flag. Furthermore you have to validate that self-signature and check if it was created by a key carrying the certify-other key flag. Now again you have to check if that signature was created by a key carrying the certify-other key flag (given it is not the same (primary) key). Whew.

Lastly there are key revocations. If a key gets lost or stolen or is simply retired, it can be revoked. Now it depends on the revocation reason, what impact the revocation has on past and/or future signatures. If the key was revoked using a “soft” revocation reason (key has not been compromised), the revocation is mostly handled as if it were an expiration. Past signatures are still good, but the key must no longer be used anymore. If it however has a “hard” revocation reason (or no reason at all) this could mean that the key has been lost or compromised. This means that any signature (future and past) that was made by this key has now to be considered invalid, since an attacker might have forged it.

Now, a revocation can only be created by a certification capable key, so in order to check if a revocation is valid, we have to check if the revoking key is allowed to revoke this specific subkey. Permitted revocation keys are either the primary key, or an external key denoted in a revocation key subpacket on a self-signature. Can you see why this introduces complexity?

Revocation signatures have to be handled differently from other signatures, since if the primary key is revoked, is it eligible to create revocation signatures in the first place? What if an external revocation key has been revoked and is now used to revoke another key?

I believe the correct way to tackle signature validity is to first evaluate the key (primary and subkeys) at signature creation time. Evaluating the key at a given point in time tn means we reject all signatures made after tn (except hard revocations) as those are not yet valid. Furthermore we reject all signatures that are expired a tn as those are no longer valid. Furthermore we remove all signatures that are superseded by another more recent signature. We do this for all signatures on all keys in the “correct” order. What we are left with is a canonicalized key ring, which we can now use to verify the signature in question with.

So lets try to summarize every step that we have to take in order to verify a signatures validity.

  • First we have to check if the signature contains a creation time subpacket. If it does not, we can already reject it.
  • Next we check if the signature is expired by now. If it is, we can again reject.
  • Now we have to evaluate the key ring that contains the signatures signing key at the time at which the signature was created.
    • Is the signing key properly bound to the key ring?
      • Was is created before the signature?
      • Was it bound to the key ring before the signature was made?
      • Is the binding signature not expired?
      • Is the binding signature not revoked?
      • Is the subkey binding signature carrying a valid primary key binding signature?
      • Are the binding signatures using acceptable algorithms?
    • Is the subkey itself not expired?
    • Is the primary key not expired?
    • Is the primary key not revoked?
    • Is the subkey not revoked?
  • Is the signing key capable of creating signatures?
  • Was the signature created with acceptable algorithms? Reject weak algorithms like SHA-1.
  • Is the signature correct?

Lastly of course, the user has to decide if the signing key is trustworthy or not, but luckily we can leave this decision up to the user.

As you can see, this is not at all trivial and I’m sure I missed some steps and/or odd edge cases. What makes implementing this even harder is that the specification is deliberately sparse in places. What subpackets are allowed to be placed in the unhashed area? What MUST be placed in the hashed area instead? Furthermore the specification contains errors which make it even harder to get a good picture of what is allowed and what isn’t. I believe what OpenPGP needs is a document that acts as a guide to implementors. That guide needs to specify where and where not certain subpackets are to be expected, how a certain piece of semantic meaning can be represented and how signature verification is to be conducted. It is not desirable that each and every implementor has to digest the whole specification multiple times in order to understand what steps are necessary to verify a signature or to select a valid key for signature creation.

Did you implement signature verification in OpenPGP? What are your thoughts on this? Did you go through the same struggles that I do?

Lastly I want to give a shout-out to the devs of Sequoia-PGP, which have a pretty awesome test suite that covers lots and lots of edge-cases and interoperability concerns of implementations. I definitely recommend everyone who needs to work with OpenPGP to throw their implementation against the suite to see if there are any shortcomings and problems with it.

Closer Look at the Double Ratchet

In the last blog post, I took a closer look at how the Extended Triple Diffie-Hellman Key Exchange (X3DH) is used in OMEMO and which role PreKeys are playing. This post is about the other big algorithm that makes up OMEMO. The Double Ratchet.

The Double Ratchet algorithm can be seen as the gearbox of the OMEMO machine. In order to understand the Double Ratchet, we will first have to understand what a ratchet is.

Before we start: This post makes no guarantees to be 100% correct. It is only meant to explain the inner workings of the Double Ratchet algorithm in a (hopefully) more or less understandable way. Many details are simplified or omitted for sake of simplicity. If you want to implement this algorithm, please read the Double Ratchet specification.

A ratchet tool can only turn in one direction, hence it is eponymous for the algorithm.
Image by Benedikt.Seidl [Public domain]

A ratchet is a tool used to drive nuts and bolts. The distinctive feature of a ratchet tool over an ordinary wrench is, that the part that grips the head of the bolt can only turn in one direction. It is not possible to turn it in the opposite direction as it is supposed to.

In OMEMO, ratchet functions are one-way functions that basically take input keys and derives a new keys from that. Doing it in this direction is easy (like turning the ratchet tool in the right direction), but it is impossible to reverse the process and calculate the original key from the derived key (analogue to turning the ratchet in the opposite direction).

Symmetric Key Ratchet

One type of ratchet is the symmetric key ratchet (abbrev. sk ratchet). It takes a key and some input data and produces a new key, as well as some output data. The new key is derived from the old key by using a so called Key Derivation Function. Repeating the process multiple times creates a Key Derivation Function Chain (KDF-Chain). The fact that it is impossible to reverse a key derivation is what gives the OMEMO protocol the property of Forward Secrecy.

A Key Derivation Function Chain or Symmetric Ratchet

The above image illustrates the process of using a KDF-Chain to generate output keys from input data. In every step, the KDF-Chain takes the input and the current KDF-Key to generate the output key. Then it derives a new KDF-Key from the old one, replacing it in the process.

To summarize once again: Every time the KDF-Chain is used to generate an output key from some input, its KDF-Key is replaced, so if the input is the same in two steps, the output will still be different due to the changed KDF-Key.

One issue of this ratchet is, that it does not provide future secrecy. That means once an attacker gets access to one of the KDF-Keys of the chain, they can use that key to derive all following keys in the chain from that point on. They basically just have to turn the ratchet forwards.

Diffie-Hellman Ratchet

The second type of ratchet that we have to take a look at is the Diffie-Hellman Ratchet. This ratchet is basically a repeated Diffie-Hellman Key Exchange with changing key pairs. Every user has a separate DH ratcheting key pair, which is being replaced with new keys under certain conditions. Whenever one of the parties sends a message, they include the public part of their current DH ratcheting key pair in the message. Once the recipient receives the message, they extract that public key and do a handshake with it using their private ratcheting key. The resulting shared secret is used to reset their receiving chain (more on that later).

Once the recipient creates a response message, they create a new random ratchet key and do another handshake with their new private key and the senders public key. The result is used to reset the sending chain (again, more on that later).

Principle of the Diffie-Hellman Ratchet.
Image by OpenWhisperSystems (modified by author)

As a result, the DH ratchet is forwarded every time the direction of the message flow changes. The resulting keys are used to reset the sending-/receiving chains. This introduces future secrecy in the protocol.

The Diffie-Hellman Ratchet

Chains

A session between two devices has three chains – a root chain, a sending chain and a receiving chain.

The root chain is a KDF chain which is initialized with the shared secret which was established using the X3DH handshake. Both devices involved in the session have the same root chain. Contrary to the sending and receiving chains, the root chain is only initialized/reset once at the beginning of the session.

The sending chain of the session on device A equals the receiving chain on device B. On the other hand, the receiving chain on device A equals the sending chain on device B. The sending chain is used to generate message keys which are used to encrypt messages. The receiving chain on the other hand generates keys which can decrypt incoming messages.

Whenever the direction of the message flow changes, the sending and receiving chains are reset, meaning their keys are replaced with new keys generated by the root chain.

The full Double Ratchet Algorithms Ratchet Architecture

An Example

I think this rather complex protocol is best explained by an example message flow which demonstrates what actually happens during message sending / receiving etc.

In our example, Obi-Wan and Grievous have a conversation. Obi-Wan starts by establishing a session with Grievous and sends his initial message. Grievous responds by sending two messages back. Unfortunately the first of his replies goes missing.

Session Creation

In order to establish a session with Grievous, Obi-Wan has to first fetch one of Grievous key bundles. He uses this to establish a shared secret S between him and Grievous by executing a X3DH key exchange. More details on this can be found in my previous post. He also extracts Grievous signed PreKey ratcheting public key. S is used to initialize the root chain.

Obi-Wan now uses Grievous public ratchet key and does a handshake with his own ratchet private key to generate another shared secret which is pumped into the root chain. The output is used to initialize the sending chain and the KDF-Key of the root chain is replaced.

Now Obi-Wan established a session with Grievous without even sending a message. Nice!

The session initiator prepares the sending chain.
The initial root key comes from the result of the X3DH handshake.
Original image by OpenWhisperSystems (modified by author)

Initial Message

Now the session is established on Obi-Wans side and he can start composing a message. He decides to send a classy “Hello there!” as a greeting. He uses his sending chain to generate a message key which is used to encrypt the message.

Principle of generating message keys from the a KDF-Chain.
In our example only one message key is derived though.
Image by OpenWhisperSystems

Note: In the above image a constant is used as input for the KDF-Chain. This constant is defined by the protocol and isn’t important to understand whats going on.

Now Obi-Wan sends over the encrypted message along with his ratcheting public key and some information on what PreKey he used, the current sending key chain index (1), etc.

When Grievous receives Obi-Wan’s message, he completes his X3DH handshake with Obi-Wan in order to calculate the same exact shared secret S as Obi-Wan did earlier. He also uses S to initialize his root chain.

Now Grevious does a full ratchet step of the Diffie-Hellman Ratchet: He uses his private and Obi-Wans public ratchet key to do a handshake and initialize his receiving chain with the result. Note: The result of the handshake is the same exact value that Obi-Wan earlier calculated when he initialized his sending chain. Fantastic, isn’t it? Next he deletes his old ratchet key pair and generates a fresh one. Using the fresh private key, he does another handshake with Obi-Wans public key and uses the result to initialize his sending chain. This completes the full DH ratchet step.

Full Diffie-Hellman Ratchet Step
Image by OpenWhisperSystems

Decrypting the Message

Now that Grievous has finalized his side of the session, he can go ahead and decrypt Obi-Wans message. Since the message contains the sending chain index 1, Grievous knows, that he has to use the first message key generated from his receiving chain to decrypt the message. Because his receiving chain equals Obi-Wans sending chain, it will generate the exact same keys, so Grievous can use the first key to successfully decrypt Obi-Wans message.

Sending a Reply

Grievous is surprised by bold actions of Obi-Wan and promptly goes ahead to send two replies.

He advances his freshly initialized sending chain to generate a fresh message key (with index 1). He uses the key to encrypt his first message “General Kenobi!” and sends it over to Obi-Wan. He includes his public ratchet key in the message.

Unfortunately though the message goes missing and is never received.

He then forwards his sending chain a second time to generate another message key (index 2). Using that key he encrypt the message “You are a bold one.” and sends it to Obi-Wan. This message contains the same public ratchet key as the first one, but has the sending chain index 2. This time the message is received.

Receiving the Reply

Once Obi-Wan receives the second message and does a full ratchet step in order to complete his session with Grevious. First he does a DH handshake between his private and the Grevouos’ public ratcheting key he got from the message. The result is used to setup his receiving chain. He then generates a new ratchet key pair and does a second handshake. The result is used to reset his sending chain.

Obi-Wan notices that the sending chain index of the received message is 2 instead of 1, so he knows that one message must have been missing or delayed. To deal with this problem, he advances his receiving chain twice (meaning he generates two message keys from the receiving chain) and caches the first key. If later the missing message arrives, the cached key can be used to successfully decrypt the message. For now only one message arrived though. Obi-Wan uses the generated message key to successfully decrypt the message.

Conclusions

What have we learned from this example?

Firstly, we can see that the protocol guarantees forward secrecy. The KDF-Chains used in the three chains can only be advanced forwards, and it is impossible to turn them backwards to generate earlier keys. This means that if an attacker manages to get access to the state of the receiving chain, they can not decrypt messages sent prior to the moment of attack.

But what about future messages? Since the Diffie-Hellman ratchet introduces new randomness in every step (new random keys are generated), an attacker is locked out after one step of the DH ratchet. Since the DH ratchet is used to reset the symmetric ratchets of the sending and receiving chain, the window of the compromise is limited by the next DH ratchet step (meaning once the other party replies, the attacker is locked out again).

On top of this, the double ratchet algorithm can deal with missing or out-of-order messages, as keys generated from the receiving chain can be cached for later use. If at some point Obi-Wan receives the missing message, he can simply use the cached key to decrypt its contents.

This self-healing property was eponymous to the Axolotl protocol (an earlier name of the Signal protocol, the basis of OMEMO).

Acknowledgements

Thanks to syndace and paul for their feedback and clarification on some points.