Closer Look at the Double Ratchet

In the last blog post, I took a closer look at how the Extended Triple Diffie-Hellman Key Exchange (X3DH) is used in OMEMO and which role PreKeys are playing. This post is about the other big algorithm that makes up OMEMO. The Double Ratchet.

The Double Ratchet algorithm can be seen as the gearbox of the OMEMO machine. In order to understand the Double Ratchet, we will first have to understand what a ratchet is.

Before we start: This post makes no guarantees to be 100% correct. It is only meant to explain the inner workings of the Double Ratchet algorithm in a (hopefully) more or less understandable way. Many details are simplified or omitted for sake of simplicity. If you want to implement this algorithm, please read the Double Ratchet specification.

A ratchet tool can only turn in one direction, hence it is eponymous for the algorithm.
Image by Benedikt.Seidl [Public domain]

A ratchet is a tool used to drive nuts and bolts. The distinctive feature of a ratchet tool over an ordinary wrench is, that the part that grips the head of the bolt can only turn in one direction. It is not possible to turn it in the opposite direction as it is supposed to.

In OMEMO, ratchet functions are one-way functions that basically take input keys and derives a new keys from that. Doing it in this direction is easy (like turning the ratchet tool in the right direction), but it is impossible to reverse the process and calculate the original key from the derived key (analogue to turning the ratchet in the opposite direction).

Symmetric Key Ratchet

One type of ratchet is the symmetric key ratchet (abbrev. sk ratchet). It takes a key and some input data and produces a new key, as well as some output data. The new key is derived from the old key by using a so called Key Derivation Function. Repeating the process multiple times creates a Key Derivation Function Chain (KDF-Chain). The fact that it is impossible to reverse a key derivation is what gives the OMEMO protocol the property of Forward Secrecy.

A Key Derivation Function Chain or Symmetric Ratchet

The above image illustrates the process of using a KDF-Chain to generate output keys from input data. In every step, the KDF-Chain takes the input and the current KDF-Key to generate the output key. Then it derives a new KDF-Key from the old one, replacing it in the process.

To summarize once again: Every time the KDF-Chain is used to generate an output key from some input, its KDF-Key is replaced, so if the input is the same in two steps, the output will still be different due to the changed KDF-Key.

One issue of this ratchet is, that it does not provide future secrecy. That means once an attacker gets access to one of the KDF-Keys of the chain, they can use that key to derive all following keys in the chain from that point on. They basically just have to turn the ratchet forwards.

Diffie-Hellman Ratchet

The second type of ratchet that we have to take a look at is the Diffie-Hellman Ratchet. This ratchet is basically a repeated Diffie-Hellman Key Exchange with changing key pairs. Every user has a separate DH ratcheting key pair, which is being replaced with new keys under certain conditions. Whenever one of the parties sends a message, they include the public part of their current DH ratcheting key pair in the message. Once the recipient receives the message, they extract that public key and do a handshake with it using their private ratcheting key. The resulting shared secret is used to reset their receiving chain (more on that later).

Once the recipient creates a response message, they create a new random ratchet key and do another handshake with their new private key and the senders public key. The result is used to reset the sending chain (again, more on that later).

Principle of the Diffie-Hellman Ratchet.
Image by OpenWhisperSystems (modified by author)

As a result, the DH ratchet is forwarded every time the direction of the message flow changes. The resulting keys are used to reset the sending-/receiving chains. This introduces future secrecy in the protocol.

The Diffie-Hellman Ratchet

Chains

A session between two devices has three chains – a root chain, a sending chain and a receiving chain.

The root chain is a KDF chain which is initialized with the shared secret which was established using the X3DH handshake. Both devices involved in the session have the same root chain. Contrary to the sending and receiving chains, the root chain is only initialized/reset once at the beginning of the session.

The sending chain of the session on device A equals the receiving chain on device B. On the other hand, the receiving chain on device A equals the sending chain on device B. The sending chain is used to generate message keys which are used to encrypt messages. The receiving chain on the other hand generates keys which can decrypt incoming messages.

Whenever the direction of the message flow changes, the sending and receiving chains are reset, meaning their keys are replaced with new keys generated by the root chain.

The full Double Ratchet Algorithms Ratchet Architecture

An Example

I think this rather complex protocol is best explained by an example message flow which demonstrates what actually happens during message sending / receiving etc.

In our example, Obi-Wan and Grievous have a conversation. Obi-Wan starts by establishing a session with Grievous and sends his initial message. Grievous responds by sending two messages back. Unfortunately the first of his replies goes missing.

Session Creation

In order to establish a session with Grievous, Obi-Wan has to first fetch one of Grievous key bundles. He uses this to establish a shared secret S between him and Grievous by executing a X3DH key exchange. More details on this can be found in my previous post. He also extracts Grievous signed PreKey ratcheting public key. S is used to initialize the root chain.

Obi-Wan now uses Grievous public ratchet key and does a handshake with his own ratchet private key to generate another shared secret which is pumped into the root chain. The output is used to initialize the sending chain and the KDF-Key of the root chain is replaced.

Now Obi-Wan established a session with Grievous without even sending a message. Nice!

The session initiator prepares the sending chain.
The initial root key comes from the result of the X3DH handshake.
Original image by OpenWhisperSystems (modified by author)

Initial Message

Now the session is established on Obi-Wans side and he can start composing a message. He decides to send a classy “Hello there!” as a greeting. He uses his sending chain to generate a message key which is used to encrypt the message.

Principle of generating message keys from the a KDF-Chain.
In our example only one message key is derived though.
Image by OpenWhisperSystems

Note: In the above image a constant is used as input for the KDF-Chain. This constant is defined by the protocol and isn’t important to understand whats going on.

Now Obi-Wan sends over the encrypted message along with his ratcheting public key and some information on what PreKey he used, the current sending key chain index (1), etc.

When Grievous receives Obi-Wan’s message, he completes his X3DH handshake with Obi-Wan in order to calculate the same exact shared secret S as Obi-Wan did earlier. He also uses S to initialize his root chain.

Now Grevious does a full ratchet step of the Diffie-Hellman Ratchet: He uses his private and Obi-Wans public ratchet key to do a handshake and initialize his receiving chain with the result. Note: The result of the handshake is the same exact value that Obi-Wan earlier calculated when he initialized his sending chain. Fantastic, isn’t it? Next he deletes his old ratchet key pair and generates a fresh one. Using the fresh private key, he does another handshake with Obi-Wans public key and uses the result to initialize his sending chain. This completes the full DH ratchet step.

Full Diffie-Hellman Ratchet Step
Image by OpenWhisperSystems

Decrypting the Message

Now that Grievous has finalized his side of the session, he can go ahead and decrypt Obi-Wans message. Since the message contains the sending chain index 1, Grievous knows, that he has to use the first message key generated from his receiving chain to decrypt the message. Because his receiving chain equals Obi-Wans sending chain, it will generate the exact same keys, so Grievous can use the first key to successfully decrypt Obi-Wans message.

Sending a Reply

Grievous is surprised by bold actions of Obi-Wan and promptly goes ahead to send two replies.

He advances his freshly initialized sending chain to generate a fresh message key (with index 1). He uses the key to encrypt his first message “General Kenobi!” and sends it over to Obi-Wan. He includes his public ratchet key in the message.

Unfortunately though the message goes missing and is never received.

He then forwards his sending chain a second time to generate another message key (index 2). Using that key he encrypt the message “You are a bold one.” and sends it to Obi-Wan. This message contains the same public ratchet key as the first one, but has the sending chain index 2. This time the message is received.

Receiving the Reply

Once Obi-Wan receives the second message and does a full ratchet step in order to complete his session with Grevious. First he does a DH handshake between his private and the Grevouos’ public ratcheting key he got from the message. The result is used to setup his receiving chain. He then generates a new ratchet key pair and does a second handshake. The result is used to reset his sending chain.

Obi-Wan notices that the sending chain index of the received message is 2 instead of 1, so he knows that one message must have been missing or delayed. To deal with this problem, he advances his receiving chain twice (meaning he generates two message keys from the receiving chain) and caches the first key. If later the missing message arrives, the cached key can be used to successfully decrypt the message. For now only one message arrived though. Obi-Wan uses the generated message key to successfully decrypt the message.

Conclusions

What have we learned from this example?

Firstly, we can see that the protocol guarantees forward secrecy. The KDF-Chains used in the three chains can only be advanced forwards, and it is impossible to turn them backwards to generate earlier keys. This means that if an attacker manages to get access to the state of the receiving chain, they can not decrypt messages sent prior to the moment of attack.

But what about future messages? Since the Diffie-Hellman ratchet introduces new randomness in every step (new random keys are generated), an attacker is locked out after one step of the DH ratchet. Since the DH ratchet is used to reset the symmetric ratchets of the sending and receiving chain, the window of the compromise is limited by the next DH ratchet step (meaning once the other party replies, the attacker is locked out again).

On top of this, the double ratchet algorithm can deal with missing or out-of-order messages, as keys generated from the receiving chain can be cached for later use. If at some point Obi-Wan receives the missing message, he can simply use the cached key to decrypt its contents.

This self-healing property was eponymous to the Axolotl protocol (an earlier name of the Signal protocol, the basis of OMEMO).

Acknowledgements

Thanks to syndace and paul for their feedback and clarification on some points.

Shaking Hands With OMEMO: X3DH Key Exchange

This is the first part of a small series about the cryptographic building blocks of OMEMO. This post is about the Extended Triple Diffie Hellman Key Exchange Algorithm (X3DH) which is used to establish a session between OMEMO devices.
Part 2: Closer Look at the Double Ratchet

In the past I have written some posts about OMEMO and its future and how it does compare to the Olm encryption protocol used by matrix.org. However, some readers requested a closer, but still straightforward look at how OMEMO and the underlying algorithms work. To get started, we first have to take a look at its past.

OMEMO was implemented in the Android Jabber Client Conversations as part of a Google Summer of Code project by Andreas Straub in 2015. The basic idea was to utilize the encryption library used by Signal (formerly TextSecure) for message encryption. So basically OMEMO borrows almost all the cryptographic mechanisms including the Double Ratchet and X3DH from Signals encryption protocol, which is appropriately named Signal Protocol. So to begin with, lets look at it first.

The Signal Protocol

The famous and ingenious protocol that drives the encryption behind Signal, OMEMO, matrix.org, WhatsApp and a lot more was created by Trevor Perrin and Moxie Marlinspike in 2013. Basically it consists of two parts that we need to further investigate:

  • The Extended Triple-Diffie-Hellman Key Exchange (X3DH)
  • The Double Ratchet Algorithm

One core principle of the protocol is to get rid of encryption keys as soon as possible. Almost every message is encrypted with another fresh key. This is a huge difference to other protocols like OpenPGP, where the user only has one key which can decrypt all messages ever sent to them. The later can of course also be seen as an advantage OpenPGP has over OMEMO, but it all depends on the situation the user is in and what they have to protect against.

A major improvement that the Signal Protocol introduced compared to encryption protocols like OTRv3 (Off-The-Record Messaging) was the ability to start a conversation with a chat partner in an asynchronous fashion, meaning that the other end didn’t have to be online in order to agree on a shared key. This was not possible with OTRv3, since both parties had to actively send messages in order to establish a session. This was okay back in the days where people would start their computer with the intention to chat with other users that were online at the same time, but it’s no longer suitable today.

Note: The recently worked on OTRv4 will not come with this handicap anymore.

The X3DH Key Exchange

Let’s get to it already!

X3DH is a key agreement protocol, meaning it is used when two parties establish a session in order to agree on a shared secret. For a conversation to be confidential we require, that only sender and (intended) recipient of a message are able to decrypt it. This is possible when they share a common secret (eg. a password or shared key). Exchanging this key with one another has long been kind of a hen and egg problem: How do you get the key from one end to the other without an adversary being able to get a copy of the key? Well, obviously by encrypting it, but how? How do you get that key to the other side? This problem has only been solved after the second world war.

The solution is a so called Diffie-Hellman-Merkle Key Exchange. I don’t want to go into too much detail about this, as there are really great resources about how it works available online, but the basic idea is that each party possesses an asymmetric key pair consisting of a public and a private key. The public key can be shared over insecure networks while the
private key must be kept secret. A Diffie-Hellman key exchange (DH) is the process of combining a public key A with a private key b in order to generate a shared secret. The essential trick is, that you get the same exact secret if you combine the secret key a with the public key B. Wikipedia does a great job at explaining this using an analogy of mixing colors.

Deniability and OTR

In normal day to day messaging you don’t always want to commit to what you said. Especially under oppressive regimes it may be a good idea to be able to deny that you said or wrote something specific. This principle is called deniability.

Note: It is debatable, whether cryptographic deniability ever saved someone from going to jail, but that’s not scope of this blog post.

At the same time you want to be absolutely sure that you are really talking to your chat partner and not to a so called man in the middle. These desires seem to be conflicting at first, but the OTR protocol featured both. The user has an IdentityKey, which is used to identify the user by means of a fingerprint. The (massively and horribly simplified) procedure of creating a OTR session is as follows: Alice generates a random session key and signs the public key with her IdentityKey. She then sends that public key over to Bob, who generates another random session key with which he executes his half of the DH handshake. He then sends the public part of that key (again, signed) back to Alice, who does another DH to acquire the same shared secret as Bob. As you can see, in order to establish a session, both parties had to be online. Note: The signing part has been oversimplified for sake of readability.

Normal Diffie-Hellman Key Exchange

From DH to X3DH

Perrin and Marlinspike improved upon this model by introducing the concept of PreKeys. Those basically are the first halves of a DH-handshake, which can – along with some other keys of the user – be uploaded to a server prior to the beginning of a conversation. This way another user can initiate a session by fetching one half-completed handshake and completing it.

Basically the Signal protocol comprises of the following set of keys per user:

IdentityKey (IK)Acts as the users identity by providing a stable fingerprint
Signed PreKey (SPK)Acts as a PreKey, but carries an additional signature of IK
Set of PreKeys ({OPK})Unsigned PreKeys

If Alice wants to start chatting, she can fetch Bobs IdentityKey, Signed PreKey and one of his PreKeys and use those to create a session. In order to preserve cryptographic properties, the handshake is modified like follows:

DH1 = DH(IK_A, SPK_B)
DH2 = DH(EK_A, IK_B)
DH3 = DH(EK_A, SPK_B)
DH4 = DH(EK_A, OPK_B)

S = KDF(DH1 || DH2 || DH3 || DH4)

EK_A denotes an ephemeral, random key which is generated by Alice on the fly. Alice can now derive an encryption key to encrypt her first message for Bob. She then sends that message (a so called PreKeyMessage) over to Bob, along with some additional information like her IdentityKey IK, the public part of the ephemeral key EK_A and the ID of the used PreKey OPK.

Visual representation of the X3DH handshake

Once Bob logs in, he can use this information to do the same calculations (just with swapped public and private keys) to calculate S from which he derives the encryption key. Now he can decrypt the message.

In order to prevent the session initiation from failing due to lost messages, all messages that Alice sends over to Bob without receiving a first message back are PreKeyMessages, so that Bob can complete the session, even if only one of the messages sent by Alice makes its way to Bob. The exact details on how OMEMO works after the X3DH key exchange will be discussed in part 2 of this series 🙂

X3DH Key Exchange TL;DR

X3DH utilizes PreKeys to allow session creation with offline users by doing 4 DH handshakes between different keys.

A subtle but important implementation difference between OMEMO and Signal is, that the Signal server is able to manage the PreKeys for the user. That way it can make sure, that every PreKey is only used once. OMEMO on the other hand solely relies on the XMPP servers PubSub component, which does not support such behavior. Instead, it hands out a bundle of around 100 PreKeys. This seems like a lot, but in reality the chances of a PreKey collision are pretty high (see the birthday problem).

OMEMO does come with some counter measures for problems and attacks that arise from this situation, but it makes the protocol a little less appealing than the original Signal protocol.

Clients should for example keep used PreKeys around until the end of catch -up of missed message to allow decryption of messages that got sent in sessions that have been established using the same PreKey.

Unified Encrypted Payload Elements for XMPP

Requirements on encryption change from time to time. New technologies pop up and crypto protocols get replaced by new ones. There are also different use-cases that require different encryption techniques.

For that reason there is a number of encryption protocols specified for XMPP, amongst them OMEMO and OpenPGP for XMPP.

Most crypto protocols share in common, that they all aim at encrypting certain parts of the message that is being sent, so that only the recipient(s) can read the encrypted content.

OMEMO is currently only capable to encrypt the messages body. For that reason the body of the message is being encrypted and stored in a <payload/> element, which is added to the message. This is inconvenient, as it makes OMEMO quite inflexible. The protocol cannot be used to secure arbitrary extension elements, which might contain sensitive content as well.

<message to='juliet@capulet.lit' from='romeo@montague.lit' id='send1'>
  <encrypted xmlns='eu.siacs.conversations.axolotl'>
    <header>...</header>
    <!-- the payload contains the encrypted content of the body -->
    <payload>BASE64ENCODED</payload>
  </encrypted>
</message>

The modern OpenPGP for XMPP XEP also uses <payload/> elements, but to transport arbitrary extension elements. The difference is, that in OpenPGP, the payload elements contain the actual payload as plaintext. Those <payload/> elements are embedded in either a <crypt/> or <signcrypt/> element, depending on whether or not the message will be signed and then passed through OpenPGP encryption. The resulting ciphertext is then appended to the message element in form of a <openpgp/> element.

<signcrypt xmlns='urn:xmpp:openpgp:0'>
  <to jid='juliet@example.org'/>
  <time stamp='...'/>
  <rpad>...</rpad>
  <payload>
    <body xmlns='jabber:client'>
      This is a secret message.
    </body>
  </payload>
</signcrypt>

<!-- The above element is passed to OpenPGP and the resulting ciphertext is included in the actual message as an <openpgp/> element -->

<message to='juliet@example.org'>
  <openpgp xmlns='urn:xmpp:openpgp:0'>
    BASE64_OPENPGP_MESSAGE
  </openpgp>
</message>

Upon receiving a message containing an <openpgp/> element, the receiver decrypts the content of it, does some verity checks and then replaces the <openpgp/> element of the message with the extension elements contained in the <payload/> element. That way the original, unencrypted message is constructed.

The benefit of this technique is that the <payload/> element can in fact contain any number of arbitrary extension elements. This makes OpenPGP for XMPPs take on encrypting message content way more flexible.

A logical next step would be to take OpenPGP for XMPPs <payload/> elements and move them to a new XEP, which specifies their use in a unified way. This can then be used by OMEMO and any other encryption protocol as well.

The motivation behind this is, that it would broaden the scope of encryption to cover more parts of the message, like read markers and other metadata.

It could also become easier to implement end-to-end encryption in other scenarios such as Jingle file transfer. Even though there is Jingle Encrypted Transports, this protocol only protects the stream itself and leaves the metadata such as filename, size etc. in the clear. A unified <encrypted/> element would make it easier to encrypt such metadata and could be the better approach to the problem.

QR-Code Generator for OMEMO

OMEMO is, like any other encryption protocol based on trust. The user has to make sure, that the keys they are trusting belong to the right users. Otherwise a so called Man-in-the-Middle attack is possible. An attacker might pretend to be your contact and secretly steal all the messages you thought were encrypted. They are, just to the wrong recipient.

To counteract such attacks, OMEMO encourages the user to verify their contacts fingerprints. A fingerprint can be considered the name of a contacts key. The user has to make sure, that the key A he is presented with really belongs to contact C by checking, if the fingerprints match. As those fingerprints are usually long, random series of characters, this is a tedious task. Luckily there are techniques like QR codes, which make our lifes easier. Instead of comparing two long strings character by character, you just scan a code and are done.

The QR-Code contains the Jabber-ID of the owner, as well as all of their fingerprints. So by scanning your code, a friend might automatically add you to their contact list and mark your devices as trusted. You can also put the QR-Code on your personal website, so people who want to reach out to you can easily establish a secure connection to you.

I spent the last few days looking into JavaFX (I have no real UI designing experience in Java, apart from Android), designing a small tool that can generate OMEMO fingerprint QR-Codes. This is what I came up with:

QR-Code generator with selectable fingerprints

The tool logs into your XMPP account and fetches all your published keys. Then it presents you with a list in which you can select, which fingerprints you want to include in the QR-Code.

There are still a lot of features missing and I consider the tool in no means as completed. My plans are to add the possibility to export QR-Codes to file, as well as to copy the text content of the code to clipboard. You see, there is a lot of work left to do, however I wanted to share my thoughts with you, so maybe client developers can adopt my idea.

Starting to use the Fellowship Card

I recently became a fellow of the FSFE and so I received a nice letter containing the FSFE fellowship OpenPGP smartcard.

After a quick visual examination I approved the card to be *damn cool*, even though the portrait format of the print of it still confuses me when I look at it. I especially like, how optimistically many digits the membership number field has (we can do it!!!). What I don’t like, is the non-https link on the bottom of the backside.

But how to use it now?

It took me some time to figure out, what that card exactly is. The FSFE overview page about the fellowship card misses the information, that this is a OpenPGP V2 card, which might be handy when choosing key sizes later on. I still don’t know, whether the card is version 2.0 or 2.1, but for my usecase it doesn’t really matter. So, what exactly is a smart-card and what CAN I actually do with it?

Well, OpenPGP is a system that allows to encrypt and sign emails, files and other information. That is and was nothing new to me, but what actually was new to me is the fact, that the encryption keys can be stored elsewhere than on the computer or phone. That intrigued me. So why not jump right into it and get some keys on there? – But where to plug it in?

My laptop has no smart-card slot, but there is that big ugly slit at one side, that never really came to value for me, simply because most peripherals that I wanted to connect to my computer, I connected via loved USB. It’s an ExpressCard slot. I knew that there are extension cards that can be fit in there, so they aren’t in the way (like eg. a USB dongle would be). There must be smart-card readers for ExpressCards, right? Right. And since I want to read mails when I’m on a train or bus, I’d find it convenient, when the card reader vanishes inside my laptop.

So I went online and searched for ExpressCard smart-card readers. I ended up buying a Lenovo Gemplus smart-card reader for about 25€. Then I waited. After half an hour I asked myself, if that particular device would work well with GNU/Linux (I use Debian testing on my ThinkPad), so I did some research and reassured me, that there are free drivers. Nice!

While I waited for my card to arrive, I received another letter with my admin pin for the card. Just for the record 😉

Some days later the smart-card reader arrived and I happily shoved it into the ExpressCard slot. I inserted the card and checked via

gpg –card-status

what’s on the card, but I got an error message (unfortunately I don’t remember what exactly it was) about that there was no card available. So I did some more research and it turns out I had to install the package

pcscd

to make it work. After the installation, my smart-card was detected, so I could follow the FSFEs tutorial on how to use the card. So I booted into a live Ubuntu that I had laying around, shut off the internet connection, realized that I needed to install pcscd here as well, reactivated the internet, installed pcscd and disconnected again. At that point in time I wondered, what exact kind of OpenPGP card I had. Somewhere else (forgot where) I read, that the fellowship card is a version 2.0 card, so I could go full 4096 bit RSA. I generated some new keys, which took forever! While I did so, I wrote some nonsense stories into a text editor to generate enough entropy. It still took about 15 minutes for each key to generate (and a lot of nonsense!). What confused me, was the process of removing secret keys and adding them back later (see the tutorial.)

But I did it and now I’m proud owner of a fully functional OpenPGP smart-card + reader. I had some smaller issues with an older GPG key, that I simply revoked (it was about time anyway) and now everything works as intended. I’m a little bit sad, because nearly none of my contacts uses GPG/PGP, so I had to write mails to myself in oder to test the card, but that feel when that little window opens, asking me to insert my card and/or enter my pin pays up for everything 🙂

My main usecase for the card became signing git commits though. Via

git commit -S -m “message”

git commits can be signed with the card (works with normal gpg keys without a card as well)! You just have to add your keys fingerprint to the .gitconfig. Man, that really adds to the experience. Now every time I sign a commit, I feel as if my work is extremely important or I’m a top secret agent or something. I can only recommend that to everyone!

Of course, I know that I might sound a little silly in the last paragraph, but nevertheless, I hope I could at least entertain somebody with my first experiences with the FSFE fellowship card. What I would add to the wish list for a next version of the card is a little field to note the last digits of the fingerprint of the key thats stored on the card. That could be handy for remembering the fingerprint when there is no card reader available. Also it would be quite nice, if the card was usable in combination with smart-phones, even though I don’t know, how exactly that could be accomplished (maybe a usb connector on the card?)

Anyways that’s the end of my first blog post. I hope you enjoyed it. Btw: My GPG key has the ID 0xa027db2f3e1e118a 🙂

Edit: This is a repost from october. In the mean time, I lost my admin pin, because I generated it with KeePassX and did not click on “accept” afterwards. That’s a real issue that should be addressed by the developers, but thats another story. I can still use the card, but I can’t change the key on it, so some day I’ll have to order a new card.