OMEMO Specification Sprint

Photo of an orange and white clown fish inside an anemone

The past weekend some members of the XMPP community gathered in Düsseldorf to work on the next iteration of the OMEMO Encryption Specification. All of us agree that the result – version 0.4 of XEP-0384 – is a huge step forward and better than ever!

On Saturday morning we met up at the Chaosdorf, a local Hacker Space who’s members kindly hosted the sprint. Huge thanks to them for having us!

Prior to the sprint we had collected a list of bullet points of topics we wanted to discuss. Among the more urging topics was proper specification of OMEMO for group chats, support for encrypting extension elements other than the body, as well as clarification on how to implement OMEMO without having to use libsignal. While the latter was technically already possible having a clear written documentation on how to do it is very important.

We spent most of the first day discussing several changes, features and problems and later started writing down the solutions we found. In between – in true Düsseldorf fashion – we snacked on some Onigiri and later went for some nice Ramen together. Saturday afternoon we started working in smaller groups on different parts of the specification. I’m amazed by the know-how and technical understanding that people brought to the table!

On the second day I had to leave relatively early after lunchtime due to family commitments, so I could only follow the remaining development of the XEP via git commits on the train.

Apart from further clarification, the updated spec now contains some additional features and tweaks. It is now possible to encrypt near arbitrary contents of messages with the help of Stanza Content Encryption. OMEMO now defines its own SCE profile. This enables workflows like fully end-to-end encrypted read markers and reactions. Thanks to Marvin and Klaus, the specification now also contains a section about how to opt-out of OMEMO encryption, both completely as well as on a per-conversation basis. Now you no longer have to manually disable OMEMO for that one contact on EVERY device you own.

The biggest part of the discussions went into properly specifying the cryptographic primitives for use with the Double Ratchet Algorithm. Tim and Andy did a great job of describing how to use hash functions and cipher algorithms to keep be able to re-implement OMEMO without having to rely on libsignal alone. Klaus and Marvin figured out some sane rules that help to decide when a device becomes active / inactive / stale. This should preserve the cryptographic guarantees of the encryption even if you don’t use one of your devices for a longer time.

Daniel properly described the workflow of recovering from broken sessions. This should improve OMEMO session stability. He also defined the exact form of OMEMO related XML elements. One notable feature from a users perspective are human readable labels for identity keys. This should make it easier for you to distinguish keys from another.

I’m really excited about the changes and can’t wait to see the first implementations in the real world!

One thing that’s left to do for now is to determine a smooth upgrade path. Clients will probably have to use both the new and old OMEMO in parallel for some time, as the changes are not backwards compatible. This would mean that we cannot immediately benefit from stanza content encryption and are bound to body-only encryption for some more time.

Pitfalls for OMEMO Implementations – Part 1: Inactive Devices

Smack’s OMEMO implementation received a security audit a while ago (huge thanks to the Guardian Project for providing the funding!). Radically Open Security, a non-profit pentesting group from the Netherlands focused on free software and ethical hacking went through the code in great detail to check its correctness and to search for any vulnerabilities. In the end they made some findings, although I wouldn’t consider them catastrophically bad (full disclosure – its my code, so I might be biased :D). In this post I want to go over two of the finding and discuss, what went wrong and how the issue was fixed.

Finding 001: Inactive Devices break Forward Secrecy

Inactive devices which no longer come online retain old chaining keys, which, if compromised, break
confidentiality of all messages from that key onwards.

Penetration Test Report – Radically Open Security
  • Vulnerability Type: Loss of Forward Secrecy.
  • Threat Level: High

Finding 003: Read-Only Devices Compromise Forward Secrecy

Read-only devices never update their keys and thus forfeit forward security.

Penetration Test Report – Radically Open Security
  • Vulnerability Type: Loss of Forward Secrecy.
  • Threat Level: Elevated

These vulnerabilities have the same cause and can be fixed by the same patch. They can happen, if a contact has a device which doesn’t come online (Finding 001), or just doesn’t respond to messages (Finding 003) for an extended period of time. To understand the issue, we have to look deeper into how the cryptography behind the double ratchet works.

Background

In OMEMO, your device has a separate session with every device it sends messages to. In case of one-to-one chat, your phone may have a session with your computer (you can read messages you sent from your phone on your computer too), with your contacts phone, your contacts computer and so on.

In each session, the cryptography happens in a kind of a ping pong way. Lets say you send two messages, then your contact responds with 4 messages, then its your turn again to send some messages and so on and so forth.

To better illustrate whats happening behind the scenes in the OMEMO protocol, lets compare an OMEMO chat with a discussion which is moderated using a “talking stick” (or “speachers staff”). Every time you want to send a message, you acquire the talking stick. Now you can send as many messages as you want. If another person wants to send you a message, you’ve got to hand them the stick first.

Now, cryptographically OMEMO is based on the double ratchet. As the name suggests, the double ratchet consists of two ratchets; the Diffie-Hellman-Ratchet and the Symmetric-Key-Ratchet.

The Symmetric-Key-Ratchet is responsible to encrypt every message you send with a new key. Roughly speaking, this is done by deriving a fresh encryption key from the key which was used to encrypt the previous message you sent. This procedure is called Key Derivation Function Chain (KDF-Chain). The benefit of this technique is, that if an attacker manages to break the key of a message, they cannot use that key to decrypt any previous messages (this is called “forward secrecy“). They can however use that key to derive the message keys of following messages, which is bad.

Functional Principle of a Key Derivation Function Chain

This is where the Diffie-Hellman-Ratchet comes into play. It replaces the symmetric ratchet key every time the “talking stick” gets handed over. Whenever you pass it to your contact (by them sending a message to you), your device executes a Diffie-Hellman key exchange with your contact and the result is used to generate a new key for the symmetric ratchet. The consequence is, that whenever the talking stick gets handed over, the attacker can no longer decrypt any following messages. This is why the protocol is described as “self healing”. The cryptographic term is called “future secrecy“.

Functional Principle of the Diffie-Hellman Ratchet Algorithm

Vulnerability

So, what’s wrong with this technique?

Well, lets say you have a device that you used only once or twice with OMEMO and which is now laying in the drawer, collecting dust. That device might still have active sessions with other devices. The problem is, that it will never acquire the talking stick (it will never send a message). Therefore the Diffie-Hellman-Ratchet will never be forwarded, which will lead to the base key of the Symmetric-Key-Ratchet never being replaced. The consequence is, that once an attacker manages to break one key of the chain, they can use that key to derive all following keys, which will get them access to all messages that device receives in that session. The property of “forward secrecy” is lost.

Smack-omemo already kind of dealt with that issue by removing inactive devices from the device list after a certain amount of time (two weeks). This does however only include own devices, not inactive devices of a contact. In the past I tried to fix this by ignoring inactive devices of a contact, by refusing to encrypt message for them. However, this lead to deadlocks, where two devices were mutually ignoring each other, resulting in a situation where no device could send a message to the other one to recover. As a consequence, Smack omitted deactivating stale contacts devices, which resulted in the vulnerability.

The Patch

This problem could not really be solved without changing the XEP (I’d argue). In my opinion the XEP must specify an explicit way of dealing with inactive devices. Therefore I proposed to use message counters, which gets rid of the deadlock problem. A device is considered as read-only, if the other party sent n messages without getting a reply. That way we can guarantee, that in a worst case scenario, an attacker can get access to n messages. That magical number n would need to be determined though.
I opened a pull request against XEP-0384 which introduces these changes.

Mitigation for the issue has been merged into Smack’s source code already 🙂

As another prevention against this vulnerability, clients could send empty ping messages to forward the Diffie-Hellman-Ratchet. This does however require devices to take action and does not work with clients that are permanently offline though. Both changes would however go hand in hand to improve the OMEMO user experience and security. Smack-omemo already offers the client developer a method to send ping messages 🙂

I hope I could help any fellow developers to avoid this pitfall and spark some discussion around how to address this issue in the XEP.

Happy (ethical) Hacking!

Closer Look at the Double Ratchet

In the last blog post, I took a closer look at how the Extended Triple Diffie-Hellman Key Exchange (X3DH) is used in OMEMO and which role PreKeys are playing. This post is about the other big algorithm that makes up OMEMO. The Double Ratchet.

The Double Ratchet algorithm can be seen as the gearbox of the OMEMO machine. In order to understand the Double Ratchet, we will first have to understand what a ratchet is.

Before we start: This post makes no guarantees to be 100% correct. It is only meant to explain the inner workings of the Double Ratchet algorithm in a (hopefully) more or less understandable way. Many details are simplified or omitted for sake of simplicity. If you want to implement this algorithm, please read the Double Ratchet specification.

A ratchet tool can only turn in one direction, hence it is eponymous for the algorithm.
Image by Benedikt.Seidl [Public domain]

A ratchet is a tool used to drive nuts and bolts. The distinctive feature of a ratchet tool over an ordinary wrench is, that the part that grips the head of the bolt can only turn in one direction. It is not possible to turn it in the opposite direction as it is supposed to.

In OMEMO, ratchet functions are one-way functions that basically take input keys and derives a new keys from that. Doing it in this direction is easy (like turning the ratchet tool in the right direction), but it is impossible to reverse the process and calculate the original key from the derived key (analogue to turning the ratchet in the opposite direction).

Symmetric Key Ratchet

One type of ratchet is the symmetric key ratchet (abbrev. sk ratchet). It takes a key and some input data and produces a new key, as well as some output data. The new key is derived from the old key by using a so called Key Derivation Function. Repeating the process multiple times creates a Key Derivation Function Chain (KDF-Chain). The fact that it is impossible to reverse a key derivation is what gives the OMEMO protocol the property of Forward Secrecy.

A Key Derivation Function Chain or Symmetric Ratchet

The above image illustrates the process of using a KDF-Chain to generate output keys from input data. In every step, the KDF-Chain takes the input and the current KDF-Key to generate the output key. Then it derives a new KDF-Key from the old one, replacing it in the process.

To summarize once again: Every time the KDF-Chain is used to generate an output key from some input, its KDF-Key is replaced, so if the input is the same in two steps, the output will still be different due to the changed KDF-Key.

One issue of this ratchet is, that it does not provide future secrecy. That means once an attacker gets access to one of the KDF-Keys of the chain, they can use that key to derive all following keys in the chain from that point on. They basically just have to turn the ratchet forwards.

Diffie-Hellman Ratchet

The second type of ratchet that we have to take a look at is the Diffie-Hellman Ratchet. This ratchet is basically a repeated Diffie-Hellman Key Exchange with changing key pairs. Every user has a separate DH ratcheting key pair, which is being replaced with new keys under certain conditions. Whenever one of the parties sends a message, they include the public part of their current DH ratcheting key pair in the message. Once the recipient receives the message, they extract that public key and do a handshake with it using their private ratcheting key. The resulting shared secret is used to reset their receiving chain (more on that later).

Once the recipient creates a response message, they create a new random ratchet key and do another handshake with their new private key and the senders public key. The result is used to reset the sending chain (again, more on that later).

Principle of the Diffie-Hellman Ratchet.
Image by OpenWhisperSystems (modified by author)

As a result, the DH ratchet is forwarded every time the direction of the message flow changes. The resulting keys are used to reset the sending-/receiving chains. This introduces future secrecy in the protocol.

The Diffie-Hellman Ratchet

Chains

A session between two devices has three chains – a root chain, a sending chain and a receiving chain.

The root chain is a KDF chain which is initialized with the shared secret which was established using the X3DH handshake. Both devices involved in the session have the same root chain. Contrary to the sending and receiving chains, the root chain is only initialized/reset once at the beginning of the session.

The sending chain of the session on device A equals the receiving chain on device B. On the other hand, the receiving chain on device A equals the sending chain on device B. The sending chain is used to generate message keys which are used to encrypt messages. The receiving chain on the other hand generates keys which can decrypt incoming messages.

Whenever the direction of the message flow changes, the sending and receiving chains are reset, meaning their keys are replaced with new keys generated by the root chain.

The full Double Ratchet Algorithms Ratchet Architecture

An Example

I think this rather complex protocol is best explained by an example message flow which demonstrates what actually happens during message sending / receiving etc.

In our example, Obi-Wan and Grievous have a conversation. Obi-Wan starts by establishing a session with Grievous and sends his initial message. Grievous responds by sending two messages back. Unfortunately the first of his replies goes missing.

Session Creation

In order to establish a session with Grievous, Obi-Wan has to first fetch one of Grievous key bundles. He uses this to establish a shared secret S between him and Grievous by executing a X3DH key exchange. More details on this can be found in my previous post. He also extracts Grievous signed PreKey ratcheting public key. S is used to initialize the root chain.

Obi-Wan now uses Grievous public ratchet key and does a handshake with his own ratchet private key to generate another shared secret which is pumped into the root chain. The output is used to initialize the sending chain and the KDF-Key of the root chain is replaced.

Now Obi-Wan established a session with Grievous without even sending a message. Nice!

The session initiator prepares the sending chain.
The initial root key comes from the result of the X3DH handshake.
Original image by OpenWhisperSystems (modified by author)

Initial Message

Now the session is established on Obi-Wans side and he can start composing a message. He decides to send a classy “Hello there!” as a greeting. He uses his sending chain to generate a message key which is used to encrypt the message.

Principle of generating message keys from the a KDF-Chain.
In our example only one message key is derived though.
Image by OpenWhisperSystems

Note: In the above image a constant is used as input for the KDF-Chain. This constant is defined by the protocol and isn’t important to understand whats going on.

Now Obi-Wan sends over the encrypted message along with his ratcheting public key and some information on what PreKey he used, the current sending key chain index (1), etc.

When Grievous receives Obi-Wan’s message, he completes his X3DH handshake with Obi-Wan in order to calculate the same exact shared secret S as Obi-Wan did earlier. He also uses S to initialize his root chain.

Now Grevious does a full ratchet step of the Diffie-Hellman Ratchet: He uses his private and Obi-Wans public ratchet key to do a handshake and initialize his receiving chain with the result. Note: The result of the handshake is the same exact value that Obi-Wan earlier calculated when he initialized his sending chain. Fantastic, isn’t it? Next he deletes his old ratchet key pair and generates a fresh one. Using the fresh private key, he does another handshake with Obi-Wans public key and uses the result to initialize his sending chain. This completes the full DH ratchet step.

Full Diffie-Hellman Ratchet Step
Image by OpenWhisperSystems

Decrypting the Message

Now that Grievous has finalized his side of the session, he can go ahead and decrypt Obi-Wans message. Since the message contains the sending chain index 1, Grievous knows, that he has to use the first message key generated from his receiving chain to decrypt the message. Because his receiving chain equals Obi-Wans sending chain, it will generate the exact same keys, so Grievous can use the first key to successfully decrypt Obi-Wans message.

Sending a Reply

Grievous is surprised by bold actions of Obi-Wan and promptly goes ahead to send two replies.

He advances his freshly initialized sending chain to generate a fresh message key (with index 1). He uses the key to encrypt his first message “General Kenobi!” and sends it over to Obi-Wan. He includes his public ratchet key in the message.

Unfortunately though the message goes missing and is never received.

He then forwards his sending chain a second time to generate another message key (index 2). Using that key he encrypt the message “You are a bold one.” and sends it to Obi-Wan. This message contains the same public ratchet key as the first one, but has the sending chain index 2. This time the message is received.

Receiving the Reply

Once Obi-Wan receives the second message and does a full ratchet step in order to complete his session with Grevious. First he does a DH handshake between his private and the Grevouos’ public ratcheting key he got from the message. The result is used to setup his receiving chain. He then generates a new ratchet key pair and does a second handshake. The result is used to reset his sending chain.

Obi-Wan notices that the sending chain index of the received message is 2 instead of 1, so he knows that one message must have been missing or delayed. To deal with this problem, he advances his receiving chain twice (meaning he generates two message keys from the receiving chain) and caches the first key. If later the missing message arrives, the cached key can be used to successfully decrypt the message. For now only one message arrived though. Obi-Wan uses the generated message key to successfully decrypt the message.

Conclusions

What have we learned from this example?

Firstly, we can see that the protocol guarantees forward secrecy. The KDF-Chains used in the three chains can only be advanced forwards, and it is impossible to turn them backwards to generate earlier keys. This means that if an attacker manages to get access to the state of the receiving chain, they can not decrypt messages sent prior to the moment of attack.

But what about future messages? Since the Diffie-Hellman ratchet introduces new randomness in every step (new random keys are generated), an attacker is locked out after one step of the DH ratchet. Since the DH ratchet is used to reset the symmetric ratchets of the sending and receiving chain, the window of the compromise is limited by the next DH ratchet step (meaning once the other party replies, the attacker is locked out again).

On top of this, the double ratchet algorithm can deal with missing or out-of-order messages, as keys generated from the receiving chain can be cached for later use. If at some point Obi-Wan receives the missing message, he can simply use the cached key to decrypt its contents.

This self-healing property was eponymous to the Axolotl protocol (an earlier name of the Signal protocol, the basis of OMEMO).

Acknowledgements

Thanks to syndace and paul for their feedback and clarification on some points.

Shaking Hands With OMEMO: X3DH Key Exchange

This is the first part of a small series about the cryptographic building blocks of OMEMO. This post is about the Extended Triple Diffie Hellman Key Exchange Algorithm (X3DH) which is used to establish a session between OMEMO devices.
Part 2: Closer Look at the Double Ratchet

In the past I have written some posts about OMEMO and its future and how it does compare to the Olm encryption protocol used by matrix.org. However, some readers requested a closer, but still straightforward look at how OMEMO and the underlying algorithms work. To get started, we first have to take a look at its past.

OMEMO was implemented in the Android Jabber Client Conversations as part of a Google Summer of Code project by Andreas Straub in 2015. The basic idea was to utilize the encryption library used by Signal (formerly TextSecure) for message encryption. So basically OMEMO borrows almost all the cryptographic mechanisms including the Double Ratchet and X3DH from Signals encryption protocol, which is appropriately named Signal Protocol. So to begin with, lets look at it first.

The Signal Protocol

The famous and ingenious protocol that drives the encryption behind Signal, OMEMO, matrix.org, WhatsApp and a lot more was created by Trevor Perrin and Moxie Marlinspike in 2013. Basically it consists of two parts that we need to further investigate:

  • The Extended Triple-Diffie-Hellman Key Exchange (X3DH)
  • The Double Ratchet Algorithm

One core principle of the protocol is to get rid of encryption keys as soon as possible. Almost every message is encrypted with another fresh key. This is a huge difference to other protocols like OpenPGP, where the user only has one key which can decrypt all messages ever sent to them. The later can of course also be seen as an advantage OpenPGP has over OMEMO, but it all depends on the situation the user is in and what they have to protect against.

A major improvement that the Signal Protocol introduced compared to encryption protocols like OTRv3 (Off-The-Record Messaging) was the ability to start a conversation with a chat partner in an asynchronous fashion, meaning that the other end didn’t have to be online in order to agree on a shared key. This was not possible with OTRv3, since both parties had to actively send messages in order to establish a session. This was okay back in the days where people would start their computer with the intention to chat with other users that were online at the same time, but it’s no longer suitable today.

Note: The recently worked on OTRv4 will not come with this handicap anymore.

The X3DH Key Exchange

Let’s get to it already!

X3DH is a key agreement protocol, meaning it is used when two parties establish a session in order to agree on a shared secret. For a conversation to be confidential we require, that only sender and (intended) recipient of a message are able to decrypt it. This is possible when they share a common secret (eg. a password or shared key). Exchanging this key with one another has long been kind of a hen and egg problem: How do you get the key from one end to the other without an adversary being able to get a copy of the key? Well, obviously by encrypting it, but how? How do you get that key to the other side? This problem has only been solved after the second world war.

The solution is a so called Diffie-Hellman-Merkle Key Exchange. I don’t want to go into too much detail about this, as there are really great resources about how it works available online, but the basic idea is that each party possesses an asymmetric key pair consisting of a public and a private key. The public key can be shared over insecure networks while the
private key must be kept secret. A Diffie-Hellman key exchange (DH) is the process of combining a public key A with a private key b in order to generate a shared secret. The essential trick is, that you get the same exact secret if you combine the secret key a with the public key B. Wikipedia does a great job at explaining this using an analogy of mixing colors.

Deniability and OTR

In normal day to day messaging you don’t always want to commit to what you said. Especially under oppressive regimes it may be a good idea to be able to deny that you said or wrote something specific. This principle is called deniability.

Note: It is debatable, whether cryptographic deniability ever saved someone from going to jail, but that’s not scope of this blog post.

At the same time you want to be absolutely sure that you are really talking to your chat partner and not to a so called man in the middle. These desires seem to be conflicting at first, but the OTR protocol featured both. The user has an IdentityKey, which is used to identify the user by means of a fingerprint. The (massively and horribly simplified) procedure of creating a OTR session is as follows: Alice generates a random session key and signs the public key with her IdentityKey. She then sends that public key over to Bob, who generates another random session key with which he executes his half of the DH handshake. He then sends the public part of that key (again, signed) back to Alice, who does another DH to acquire the same shared secret as Bob. As you can see, in order to establish a session, both parties had to be online. Note: The signing part has been oversimplified for sake of readability.

Normal Diffie-Hellman Key Exchange

From DH to X3DH

Perrin and Marlinspike improved upon this model by introducing the concept of PreKeys. Those basically are the first halves of a DH-handshake, which can – along with some other keys of the user – be uploaded to a server prior to the beginning of a conversation. This way another user can initiate a session by fetching one half-completed handshake and completing it.

Basically the Signal protocol comprises of the following set of keys per user:

IdentityKey (IK)Acts as the users identity by providing a stable fingerprint
Signed PreKey (SPK)Acts as a PreKey, but carries an additional signature of IK
Set of PreKeys ({OPK})Unsigned PreKeys

If Alice wants to start chatting, she can fetch Bobs IdentityKey, Signed PreKey and one of his PreKeys and use those to create a session. In order to preserve cryptographic properties, the handshake is modified like follows:

DH1 = DH(IK_A, SPK_B)
DH2 = DH(EK_A, IK_B)
DH3 = DH(EK_A, SPK_B)
DH4 = DH(EK_A, OPK_B)

S = KDF(DH1 || DH2 || DH3 || DH4)

EK_A denotes an ephemeral, random key which is generated by Alice on the fly. Alice can now derive an encryption key to encrypt her first message for Bob. She then sends that message (a so called PreKeyMessage) over to Bob, along with some additional information like her IdentityKey IK, the public part of the ephemeral key EK_A and the ID of the used PreKey OPK.

Visual representation of the X3DH handshake

Once Bob logs in, he can use this information to do the same calculations (just with swapped public and private keys) to calculate S from which he derives the encryption key. Now he can decrypt the message.

In order to prevent the session initiation from failing due to lost messages, all messages that Alice sends over to Bob without receiving a first message back are PreKeyMessages, so that Bob can complete the session, even if only one of the messages sent by Alice makes its way to Bob. The exact details on how OMEMO works after the X3DH key exchange will be discussed in part 2 of this series 🙂

X3DH Key Exchange TL;DR

X3DH utilizes PreKeys to allow session creation with offline users by doing 4 DH handshakes between different keys.

A subtle but important implementation difference between OMEMO and Signal is, that the Signal server is able to manage the PreKeys for the user. That way it can make sure, that every PreKey is only used once. OMEMO on the other hand solely relies on the XMPP servers PubSub component, which does not support such behavior. Instead, it hands out a bundle of around 100 PreKeys. This seems like a lot, but in reality the chances of a PreKey collision are pretty high (see the birthday problem).

OMEMO does come with some counter measures for problems and attacks that arise from this situation, but it makes the protocol a little less appealing than the original Signal protocol.

Clients should for example keep used PreKeys around until the end of catch -up of missed message to allow decryption of messages that got sent in sessions that have been established using the same PreKey.

A look at Matrix.org’s OLM | MEGOLM encryption protocol

Everyone who knows and uses XMPP is probably aware of a new player in the game. Matrix.org is often recommended as a young, arising alternative to the aging protocol behind the Jabber ecosystem. However the founders do not see their product as a direct competitor to XMPP as their approach to the problem of message exchanging is quite different.

An open network for secure, decentralized communication.

matrix.org

During his talk at the FOSDEM in Brussels, matrix.org founder Matthew Hodgson roughly compared the concept of matrix to how git works. Instead of passing single messages between devices and servers, matrix is all about synchronization of a shared state. A chat room can be seen as a repository, which is shared between all servers of the participants. As a consequence communication in a chat room can go on, even when the server on which the room was created goes down, as the room simultaneously exists on all the other servers. Once the failed server comes back online, it synchronizes its state with the others and retrieves missed messages.

Matrix in the French State

Olm, Megolm – What’s the deal?

Matrix introduced two different crypto protocols for end-to-end encryption. One is named Olm, which is used in one-to-one chats between two chat partners (this is not quite correct, see Updates for clarifying remarks). It can very well be compared to OMEMO, as it too is an adoption of the Signal Protocol by OpenWhisperSystems. However, due to some differences in the implementation Olm is not compatible with OMEMO although it shares the same cryptographic properties.

The other protocol goes by the name of Megolm and is used in group chats. Conceptually it deviates quite a bit from Olm and OMEMO, as it contains some modifications that make it more suitable for the multi-device use-case. However, those modifications alter its cryptographic properties.

Comparing Cryptographic Building Blocks

ProtocolOlmOMEMO (Signal)
IdentityKeyCurve25519X25519
FingerprintKey⁽¹⁾Ed25519none
PreKeysCurve25519X25519
SignedPreKeys⁽²⁾noneX25519
Key Exchange
Algorithm⁽³⁾
Triple Diffie-Hellman
(3DH)
Extended Triple
Diffie-Hellman (X3DH)
Ratcheting AlgoritmDouble RatchetDouble Ratchet
  1. Signal uses a Curve X25519 IdentityKey, which is capable of both encrypting, as well as creating signatures using the XEdDSA signature scheme. Therefore no separate FingerprintKey is needed. Instead the fingerprint is derived from the IdentityKey. This is mostly a cosmetic difference, as one less key pair is required.
  2. Olm does not distinguish between the concepts of signed and unsigned PreKeys like the Signal protocol does. Instead it only uses one type of PreKey. However, those may be signed with the FingerprintKey upon upload to the server.
  3. OMEMO includes the SignedPreKey, as well as an unsigned PreKey in the handshake, while Olm only uses one PreKey. As a consequence, if the senders Olm IdentityKey gets compromised at some point, the very first few messages that are sent could possibly be decrypted.

In the end Olm and OMEMO are pretty comparable, apart from some simplifications made in the Olm protocol. Those do only marginally affect its security though (as far as I can tell as a layman).

Megolm

The similarities between OMEMO and Matrix’ encryption solution end when it comes to group chat encryption.

OMEMO does not treat chats with more than two parties any other than one-to-one chats. The sender simply has to manage a lot more keys and the amount of required trust decisions grows by a factor roughly equal to the number of chat participants.

Yep, this is a mess but luckily XMPP isn’t a very popular chat protocol so there are no large encrypted group chats ;P

So how does Matrix solve the issue?

When a user joins a group chat, they generate a session for that chat. This session consists of an Ed25519 SigningKey and a single ratchet which gets initialized randomly.

The public part of the signing key and the state of the ratchet are then shared with each participant of the group chat. This is done via an encrypted channel (using Olm encryption). Note, that this session is also shared between the devices of the user. Contrary to Olm, where every device has its own Olm session, there is only one Megolm session per user per group chat.

Whenever the user sends a message, the encryption key is generated by forwarding the ratchet and deriving a symmetric encryption key for the message from the ratchets output. Signing is done using the SigningKey.

Recipients of the message can decrypt it by forwarding their copy of the senders ratchet the same way the sender did, in order to retrieve the same encryption key. The signature is verified using the public SigningKey of the sender.

There are some pros and cons to this approach, which I briefly want to address.

First of all, you may find that this protocol is way less elegant compared to Olm/Omemo/Signal. It poses some obvious limitations and security issues. Most importantly, if an attacker gets access to the ratchet state of a user, they could decrypt any message that is sent from that point in time on. As there is no new randomness introduced, as is the case in the other protocols, the attacker can gain access by simply forwarding the ratchet thereby generating any decryption keys they need. The protocol defends against this by requiring the user to generate a new random session whenever a new user joins/leaves the room and/or a certain number of messages has been sent, whereby the window of possibly compromised messages gets limited to a smaller number. Still, this is equivalent to having a single key that decrypts multiple messages at once.

The Megolm specification lists a number of other caveats.

On the pro side of things, trust management has been simplified as the user basically just has to decide whether or not to trust each group member instead of each participating device – reducing the complexity from a multiple of n down to just n. Also, since there is no new randomness being introduced during ratchet forwarding, messages can be decrypted multiple times. As an effect devices do not need to store the decrypted messages. Knowledge of the session state(s) is sufficient to retrieve the message contents over and over again.

By sharing older session states with own devices it is also possible to read older messages on new devices. This is a feature that many users are missing badly from OMEMO.

On the other hand, if you really need true future secrecy on a message-by-message base and you cannot risk that an attacker may get access to more than one message at a time, you are probably better off taking the bitter pill going through the fingerprint mess and stick to normal Olm/OMEMO (see Updates for remarks on this statement).

Note: End-to-end encryption does not really make sense in big, especially public chat rooms, since an attacker could just simply join the room in order to get access to ongoing communication. Thanks to Florian Schmaus for pointing that out.

I hope I could give a good overview of the different encryption mechanisms in XMPP and Matrix. Hopefully I did not make any errors, but if you find mistakes, please let me know, so I can correct them asap 🙂

Happy Hacking!

Sources

Updates:

Thanks for Matthew Hodgson for pointing out, that Olm/OMEMO is also effectively using a symmetric ratchet when multiple consecutive messages are sent without the receiving device sending an answer. This can lead to loss of future secrecy as discussed in the OMEMO protocol audit.

Also thanks to Hubert Chathi for noting, that Megolm is also used in one-to-one chats, as matrix doesn’t have the same distinction between group and single chats. He also pointed out, that the security level of Megolm (the criteria for regenerating the session) can be configured on a per-chat basis.

Unified Encrypted Payload Elements for XMPP

Letter in an envelope

Requirements on encryption change from time to time. New technologies pop up and crypto protocols get replaced by new ones. There are also different use-cases that require different encryption techniques.

For that reason there is a number of encryption protocols specified for XMPP, amongst them OMEMO and OpenPGP for XMPP.

Most crypto protocols share in common, that they all aim at encrypting certain parts of the message that is being sent, so that only the recipient(s) can read the encrypted content.

OMEMO is currently only capable to encrypt the messages body. For that reason the body of the message is being encrypted and stored in a <payload/> element, which is added to the message. This is inconvenient, as it makes OMEMO quite inflexible. The protocol cannot be used to secure arbitrary extension elements, which might contain sensitive content as well.

<message to='juliet@capulet.lit' from='romeo@montague.lit' id='send1'>
  <encrypted xmlns='eu.siacs.conversations.axolotl'>
    <header>...</header>
    <!-- the payload contains the encrypted content of the body -->
    <payload>BASE64ENCODED</payload>
  </encrypted>
</message>

The modern OpenPGP for XMPP XEP also uses <payload/> elements, but to transport arbitrary extension elements. The difference is, that in OpenPGP, the payload elements contain the actual payload as plaintext. Those <payload/> elements are embedded in either a <crypt/> or <signcrypt/> element, depending on whether or not the message will be signed and then passed through OpenPGP encryption. The resulting ciphertext is then appended to the message element in form of a <openpgp/> element.

<signcrypt xmlns='urn:xmpp:openpgp:0'>
  <to jid='juliet@example.org'/>
  <time stamp='...'/>
  <rpad>...</rpad>
  <payload>
    <body xmlns='jabber:client'>
      This is a secret message.
    </body>
  </payload>
</signcrypt>

<!-- The above element is passed to OpenPGP and the resulting ciphertext is included in the actual message as an <openpgp/> element -->

<message to='juliet@example.org'>
  <openpgp xmlns='urn:xmpp:openpgp:0'>
    BASE64_OPENPGP_MESSAGE
  </openpgp>
</message>

Upon receiving a message containing an <openpgp/> element, the receiver decrypts the content of it, does some verity checks and then replaces the <openpgp/> element of the message with the extension elements contained in the <payload/> element. That way the original, unencrypted message is constructed.

The benefit of this technique is that the <payload/> element can in fact contain any number of arbitrary extension elements. This makes OpenPGP for XMPPs take on encrypting message content way more flexible.

A logical next step would be to take OpenPGP for XMPPs <payload/> elements and move them to a new XEP, which specifies their use in a unified way. This can then be used by OMEMO and any other encryption protocol as well.

The motivation behind this is, that it would broaden the scope of encryption to cover more parts of the message, like read markers and other metadata.

It could also become easier to implement end-to-end encryption in other scenarios such as Jingle file transfer. Even though there is Jingle Encrypted Transports, this protocol only protects the stream itself and leaves the metadata such as filename, size etc. in the clear. A unified <encrypted/> element would make it easier to encrypt such metadata and could be the better approach to the problem.

Future of OMEMO

OMEMO is an XMPP extension protocol, which specifies end-to-end encryption for XMPP clients using the double ratchet algorithm of the Signal protocol. Introduced back in 2015 by GSoC student Andreas Straub in the Conversations client, OMEMO had a lot of press coverage and many privacy and security oriented websites praise XMPP clients that do support it. Its beyond debate, that OMEMO brought many new faces to XMPP. For many users, having end-to-end encryption built into their chat client is a must. Today OMEMO is implemented in a range of clients on different platforms. While Conversations, ChatSecure and Dino support it out of the box, there is a series of plugins that teach OMEMO to other clients such as Gajim, Pidgin and Miranda NG.

However, there is quite a lot of controversy around OMEMO. Part of it are technical discussions, others are more or less of a political nature. Let me list some of them for you.

Some users and client developers see no value in OMEMOs forward secrecy (the fact, that messages can only be decrypted once per device, so new devices do not have access to the chat history of the user). That is a fair point. Especially webclients have a hard time implementing OMEMO in a sensible way. Also the average user is probably having a hard time understanding what exactly forward secrecy is and what the consequences are. Communicating to the user, that not having access to past messages is actually a feature might be a hard task for a client developer.

OMEMOs trust management (still) sucks. One architectural key feature of OMEMO is, that every device does have its own identity key. If a user wants to send a message to one of their contacts, they’re presented with a list of all of their identity keys. Now they have to decide, which keys to trust and which not by comparing fingerprints (seamingly arbitrary strings of 64 characters). This is not a very comfortable thing to do. Some clients encourage the user to verify your contacts devices by scanning QR-Codes, which is way more comfortable, users do however have to meet up in person or share the QR code on another channel.
But what if you get a new device or just reinstall your chat application? Suddenly all your contacts have to decide whether to trust your new fingerprint or not. In the long run this will lead to the user just being annoyed and blindly accepting new fingerprints, ruining the concept of end-to-end encryption.

Daniel Gultsch introduced the concept of BTBV (Blind Trust Before Verification) which can be summed up as “do not bother the user with encryption and hope everything goes well until the user explicitly states that they are interested in having good security”. The principle is, that clients blindly trust any OMEMO identity keys until the user commits to verifying them manually. This makes OMEMO easy to use for those who don’t really care about it, while offering serious users who depend on it the needed security.

But what do you do, if suddenly a rogue fingerprint appears? Do you panic and message all your contacts not to trust the stranger key? In theory any device which has access to the users account (the users server too) can just add another OMEMO identity key to the users list of devices and there is not really anything the user can do about it. OMEMO does not have a “blacklist”-feature or a signed list of trusted keys. It would however be possible to implement such thing in the future by combining OMEMO with OpenPGP for example. Of course, if some stranger has access to your account, it is time to change the password/server anyways.

Another weakness of OMEMO is, that it is currently only usable for encrypting the messages body. Since XMPP is an extensible protocol with other use cases than messaging, it would be nice to have support for arbitrary extension element encryption. There is however the extension protocol “OX” (XEP-0373: OpenPGP for XMPP), which has such capabilities. This feature can be extracted from OX and reused in OMEMO relatively easy.

Lets now focus on the “political” controversies around OMEMO.

In 2016/2017 there has been a lot of discussions, whether or not OMEMO should become a standard in the first place. The problem is, that in it’s current form (which has not really changes since its introduction), OMEMO depends on the wire format used by libsignal (the Signal protocol library used by conversations). That library however is licensed under the GPLv3 license, preventing permissively licensed and closed source applications from implementing OMEMO. While the Signal protocol itself is openly documented, the wire format used by libsignal is not, so any implementations which want to be compatible to current OMEMO clients must implement the same wire format by looking into the libsignal source code, which in turn makes the implementation a derivative of libsignal, which must be licensed under the GPL as well. There has been a pull request against the OMEMO XEP which addressed this issue by specifying an independent wire format for OMEMO, however that pull request was more or less rejected due to inactivity of the author.

During the phases of hot debates around OMEMO, it was discussed to base the protocol on Olm instead of the Signal protocol. Olm is the encryption protocol used by matrix.org. However, up to this point there is no Olm based OMEMO implementation, that I know of, neither have there been any experiments in that direction from people that proposed the idea (again – not that I know of).

Another idea was to completely redesign OMEMOs double ratchet specification as OMEMO-NEXT from ground up without even mentioning a specific library as the foundation. This would obviously be a way more complex XEP, as all the cryptographic operations and primitives which are currently abstracted and hidden away behind the Signal protocol, would have to be written down in that XEP. However, recently the IETF announced that work is done to create MLS (Message Layer Security), which does exactly that. It specifies a completely open version of the double ratchet algorithm along with infrastructure to share key material and so on. I’m not sure whether this is a coincidence, or if some of those who proposed OMEMO-NEXT are directly involved with the development of MLS. We’ll see, when MLS is ready and whether it will compete against OMEMO. I’d really love to see a cross-protocol encryption algorithm btw 😉 #bridges #federateEverything

Now lets talk about the biggest problem of OMEMO. Currently the XEP does not have an active “legal guardian”. The author has been inactive for an extended period of time, ignoring requests for comments, causing a total halt in the development of the standard (making changes to the XEP needs the authors approval). Things like specifying a new wire protocol are possible and reasonably easy to do. However not having changes written down in the XEP makes it nearly impossible to make coordinated changes. I’m sure there is a ton of potential for OMEMO and it is sad to see its (protocol-) development having come to a halt.

I’m sure that many of its current issues can be addressed by changes to the XEP. This is what I think needs to be done to OMEMO to make it more usable:

  • Specify a new wire protocol: This could make OMEMO accesible for commercial applications and allow independent implementations -> broader deployment
  • Specify a general payload encryption scheme: This could benefit other encryption protocols as well and would make it possible to apply end-to-end encryption to a wider variety of use cases.
  • Reuse the payload encryption scheme in OMEMO: Utilize OMEMO for things besides body encryption.
  • Specify a way to sign device lists with “persistent key” algorithms like OpenPGP: This could simplify trust management.
  • Specify a way to backup the identity key: This could reduce the identity key chaos, since the key could be reused on new devices/installations. However clients would have to make it clear to the user, not to use the same key on multiple devices.

This have been my thoughts about OMEMOs current state. What do you think about it?

Happy Hacking!

Summer of Code: Smack has OpenPGP Support!

I am very proud to announce, that Smack got support for OpenPGP for XMPP!

Today the Pull Request I worked on during my GSoC project was merged into Smacks master branch. Admittedly it will take a few months until smack-openpgp will be included in a Smack release, but that gives me time to further finalize the code and iron out any bugs that may be in there. If you want to try smack-openpgp for yourself, let me know of any issues you encounter 🙂

(Edit: There are snapshot releases of Smack available for testing)

Now Smack does support two end-to-end encryption methods, which complement each other perfectly. OMEMO is best for people that want to be safe from future attacks, while OpenPGP is more suited for users who appreciate being able to access their chat history at any given time. OpenPGP is therefore the better choice for web based applications, although it is perfectly possible to implement web based clients that do OMEMO (see for example the Wire web app, which does ratcheting similar to OMEMO).

What’s left to do now is updating smack-openpgp due to updates made to XEP-0373 and extensive testing against other implementations.

Happy Hacking!

Summer of Code: Finalizing the PR

Quick update:

Only a few days are left until the last and final Evaluation Phase.

I spent the week opening my pull request against Smacks master branch and adding a basic trust management implementation. Now the user is required to make decisions whether to trust a contacts key or not. However, the storage implementation is kept very modular, so an implementor can easily create a trust store implementation that realizes custom behaviour.

Smack-openpgp now allows users which did not subscribe to one another to exchange encryption keys quite easily. If a user receives an encrypted message, the implementation automatically fetches the senders keys to allow signature verification.

Furthermore there are more JUnit tests now, so that Smacks total test coverage actually increases when my PR gets merged 😀

Happy Hacking!

Summer of Code: First PGPainless Release!

I’m very happy and proud to announce the first alpha release of PGPainless!

PGPainless 0.0.1-alpha1 is the first non-snapshot release and is available from maven central. It was an interesting experience to go through the process of creating a release and I’m looking forward to have many more releases in the future 🙂

The current release contains a workaround for the bug I described in an earlier blog post. The issue was, that bouncycastle wouldn’t mark the public sub keys of a secret key ring as sub keys, which results in loss of keys if the user tries to create a public key ring from the exported public keys. My workaround fixes the issue by iterating through all sub keys of an existing key ring and converting the key packages of subkeys to subkey packages. The code is also available as a gist.

Ironically I had some issues related to OpenPGP during the release process. Releases to maven central have to be signed with an OpenPGP key, so I created a dedicated signing key pair using GnuPG, which I wanted to put into a separate GPG key box. After creating the key, I exported it using

gpg --armor --export-secret-keys [key-id] > pgpainless-singing-key.asc

imported it into a dedicated key box using

gpg --no-default-keyring --keyring pgpainless.gpg --import pgpainless-signing-key.asc

and deleted the key from my old key box, as well as the .asc-file. But when I tried to sign my release, I got the error, that a secret key would be needed. After checking the key box, I noticed, that only a public key was present.

Unfortunately that key had already been published to the key servers and I have no way to revoke it, due to lack of a secret key. I have no idea, what exactly happened or how it could happen, but its too late to recover the key.

Edit: I found my error: Importing a secret key is only possible with the flag `–allow-secret-key-import`, eg

gpg --import --allow-secret-key-import key.asc

So in the end I had to create a new OpenPGP key, which I now carefully backed up on a freshly bought USB stick which will be locked away for the event that I lose the copy on my work station. Better safe than sorry.

Happy Hacking!