A look at Matrix.org’s OLM | MEGOLM encryption protocol

Everyone who knows and uses XMPP is probably aware of a new player in the game. Matrix.org is often recommended as a young, arising alternative to the aging protocol behind the Jabber ecosystem. However the founders do not see their product as a direct competitor to XMPP as their approach to the problem of message exchanging is quite different.

An open network for secure, decentralized communication.

matrix.org

During his talk at the FOSDEM in Brussels, matrix.org founder Matthew Hodgson roughly compared the concept of matrix to how git works. Instead of passing single messages between devices and servers, matrix is all about synchronization of a shared state. A chat room can be seen as a repository, which is shared between all servers of the participants. As a consequence communication in a chat room can go on, even when the server on which the room was created goes down, as the room simultaneously exists on all the other servers. Once the failed server comes back online, it synchronizes its state with the others and retrieves missed messages.

Matrix in the French State

Olm, Megolm – What’s the deal?

Matrix introduced two different crypto protocols for end-to-end encryption. One is named Olm, which is used in one-to-one chats between two chat partners (this is not quite correct, see Updates for clarifying remarks). It can very well be compared to OMEMO, as it too is an adoption of the Signal Protocol by OpenWhisperSystems. However, due to some differences in the implementation Olm is not compatible with OMEMO although it shares the same cryptographic properties.

The other protocol goes by the name of Megolm and is used in group chats. Conceptually it deviates quite a bit from Olm and OMEMO, as it contains some modifications that make it more suitable for the multi-device use-case. However, those modifications alter its cryptographic properties.

Comparing Cryptographic Building Blocks

ProtocolOlmOMEMO (Signal)
IdentityKeyCurve25519X25519
FingerprintKey⁽¹⁾Ed25519none
PreKeysCurve25519X25519
SignedPreKeys⁽²⁾noneX25519
Key Exchange
Algorithm⁽³⁾
Triple Diffie-Hellman
(3DH)
Extended Triple
Diffie-Hellman (X3DH)
Ratcheting AlgoritmDouble RatchetDouble Ratchet
  1. Signal uses a Curve X25519 IdentityKey, which is capable of both encrypting, as well as creating signatures using the XEdDSA signature scheme. Therefore no separate FingerprintKey is needed. Instead the fingerprint is derived from the IdentityKey. This is mostly a cosmetic difference, as one less key pair is required.
  2. Olm does not distinguish between the concepts of signed and unsigned PreKeys like the Signal protocol does. Instead it only uses one type of PreKey. However, those may be signed with the FingerprintKey upon upload to the server.
  3. OMEMO includes the SignedPreKey, as well as an unsigned PreKey in the handshake, while Olm only uses one PreKey. As a consequence, if the senders Olm IdentityKey gets compromised at some point, the very first few messages that are sent could possibly be decrypted.

In the end Olm and OMEMO are pretty comparable, apart from some simplifications made in the Olm protocol. Those do only marginally affect its security though (as far as I can tell as a layman).

Megolm

The similarities between OMEMO and Matrix’ encryption solution end when it comes to group chat encryption.

OMEMO does not treat chats with more than two parties any other than one-to-one chats. The sender simply has to manage a lot more keys and the amount of required trust decisions grows by a factor roughly equal to the number of chat participants.

Yep, this is a mess but luckily XMPP isn’t a very popular chat protocol so there are no large encrypted group chats ;P

So how does Matrix solve the issue?

When a user joins a group chat, they generate a session for that chat. This session consists of an Ed25519 SigningKey and a single ratchet which gets initialized randomly.

The public part of the signing key and the state of the ratchet are then shared with each participant of the group chat. This is done via an encrypted channel (using Olm encryption). Note, that this session is also shared between the devices of the user. Contrary to Olm, where every device has its own Olm session, there is only one Megolm session per user per group chat.

Whenever the user sends a message, the encryption key is generated by forwarding the ratchet and deriving a symmetric encryption key for the message from the ratchets output. Signing is done using the SigningKey.

Recipients of the message can decrypt it by forwarding their copy of the senders ratchet the same way the sender did, in order to retrieve the same encryption key. The signature is verified using the public SigningKey of the sender.

There are some pros and cons to this approach, which I briefly want to address.

First of all, you may find that this protocol is way less elegant compared to Olm/Omemo/Signal. It poses some obvious limitations and security issues. Most importantly, if an attacker gets access to the ratchet state of a user, they could decrypt any message that is sent from that point in time on. As there is no new randomness introduced, as is the case in the other protocols, the attacker can gain access by simply forwarding the ratchet thereby generating any decryption keys they need. The protocol defends against this by requiring the user to generate a new random session whenever a new user joins/leaves the room and/or a certain number of messages has been sent, whereby the window of possibly compromised messages gets limited to a smaller number. Still, this is equivalent to having a single key that decrypts multiple messages at once.

The Megolm specification lists a number of other caveats.

On the pro side of things, trust management has been simplified as the user basically just has to decide whether or not to trust each group member instead of each participating device – reducing the complexity from a multiple of n down to just n. Also, since there is no new randomness being introduced during ratchet forwarding, messages can be decrypted multiple times. As an effect devices do not need to store the decrypted messages. Knowledge of the session state(s) is sufficient to retrieve the message contents over and over again.

By sharing older session states with own devices it is also possible to read older messages on new devices. This is a feature that many users are missing badly from OMEMO.

On the other hand, if you really need true future secrecy on a message-by-message base and you cannot risk that an attacker may get access to more than one message at a time, you are probably better off taking the bitter pill going through the fingerprint mess and stick to normal Olm/OMEMO (see Updates for remarks on this statement).

Note: End-to-end encryption does not really make sense in big, especially public chat rooms, since an attacker could just simply join the room in order to get access to ongoing communication. Thanks to Florian Schmaus for pointing that out.

I hope I could give a good overview of the different encryption mechanisms in XMPP and Matrix. Hopefully I did not make any errors, but if you find mistakes, please let me know, so I can correct them asap 🙂

Happy Hacking!

Sources

Updates:

Thanks for Matthew Hodgson for pointing out, that Olm/OMEMO is also effectively using a symmetric ratchet when multiple consecutive messages are sent without the receiving device sending an answer. This can lead to loss of future secrecy as discussed in the OMEMO protocol audit.

Also thanks to Hubert Chathi for noting, that Megolm is also used in one-to-one chats, as matrix doesn’t have the same distinction between group and single chats. He also pointed out, that the security level of Megolm (the criteria for regenerating the session) can be configured on a per-chat basis.

11 thoughts on “A look at Matrix.org’s OLM | MEGOLM encryption protocol

  1. Hi Paul,

    Thanks for your look at olm and megolm.

    There are a few things to clarify. First of all, in the first paragraph of “Olm, Megolm — What’s the deal?”, you say that olm is used in one-to-one chats, and megolm is used in group chats. In fact, megolm is used even in one-to-one chats; Matrix doesn’t have the same distinction between one-to-one chats and group chats as XMPP does, and one-to-one chats can become group chats, so we use the same protocol for both.

    Regarding the sentence, “The protocol defends against this by requiring the user to generate a new random session whenever a new user joins/leaves the room and/or a certain number of messages has been sent, whereby the window of possibly compromised messages gets limited to a smaller number”, megolm sessions are also rotated after a certain amount of time. The suggested number of messages or time after which megolm sessions are rotated is configurable per-room, and if you want to avoid having the same key decrypt multiple messages, you could set it so that the megolm session is rotated for every message.

    About “On the pro side of things, trust management has been simplified as the user basically just has to decide whether or not to trust each group member instead of each participating device – reducing the complexity from a multiple of n down to just n”, megolm still encrypts to each device separately, so you still need to trust each device separately. We are currently working on cross-signing (https://github.com/matrix-org/matrix-doc/pull/1756) in order to solve this issue, but it is not complete yet.

    Again, thank you for your blog post.

    1. Hi Hubert!

      “In fact, megolm is used even in one-to-one chats; Matrix doesn’t have the same distinction between one-to-one chats and group chats as XMPP does, and one-to-one chats can become group chats, so we use the same protocol for both.”

      Ah thank you very much for the hint, I couldn’t read that out of the specifications alone, but I’ll correct that soon.

      “megolm still encrypts to each device separately, so you still need to trust each device separately.”

      That’s a bummer, but I’m glad to see that you are working on a solution to simplify trust management. I had a talk at FOSDEM with some of you guys and it’s really interesting to see that the problems of OMEMO and Olm are very similar. I really hope that OMEMO can benefit from some of the solutions you find (and the other way round or course :D)

  2. > On the other hand, if you really need true future secrecy on a message-by-message basis […] stick to normal Olm/OMEMO.

    Doesn’t Olm/OMEMO also use a simple hash ratchet when sending consecutive messages from the same sending device? From memory, it only does X3DH when the flow of conversation changes direction.

    It’s worth noting that if you really do want Megolm to provide full PFS on a message-by-message basis, you can just set the session duration to 1 message – except this will force a X3DH over the full mesh of devices in the room, which isn’t going to be very performant. So it ends up being a trade-off between security and usability, for a change. MLS could be a much nicer way of getting better forward secrecy without the scalability challenges.

    In other news, thank you for relaying that Matrix is fundamentally different to XMPP (in terms of being a conversation history syncing protocol rather than a messaging protocol) – it’s a very welcome change to see the distinction being made by a jabber-head 🙂

    1. “Doesn’t Olm/OMEMO also use a simple hash ratchet when sending consecutive messages from the same sending device?”

      That’s true as far as I remember. This also means that under some circumstances (if a device is not used for some time), OMEMO/Olm lose the property of future secrecy.
      So if you *really* want to have future secrecy, you should not send to inactive devices. Instead only send to the recipients active device. I should add this remark to the post.

      In OMEMO we are currently considering to suggest to implementors to utilize the ratchet message counter to determine staleness of devices in order to force a maximum length of the sending chain to counter this.

      “if you really do want Megolm to provide full PFS on a message-by-message basis, you can just set the session duration to 1 message – except this will force a X3DH over the full mesh of devices in the room”

      So – to make sure that I understand it correctly – in that case you’d fetch a new PreKey from the recipient for each message? I haven’t thought about establishing new sessions using a new PreKey yet, but that might be worth a look, although my first bet would be that at least in OMEMO the sender would run out of recipient PreKeys rather quickly 😀

      “MLS could be a much nicer way of getting better forward secrecy without the scalability challenges.”

      Yeah, looks like I’ll have to look into MLS sooner or later 😉 It would be really *really* nice to have bridgeable e2ee using MLS, although that’ll probably be hard due to different message formatting inside the payload. Maybe we could come up with a XEP/MSC to have a shared message format? 😉

      “In other news, thank you for relaying that Matrix is fundamentally different to XMPP (in terms of being a conversation history syncing protocol rather than a messaging protocol) – it’s a very welcome change to see the distinction being made by a jabber-head”

      Hehe, thank you! I’m really glad to see that matrix is pushing forward in the world of federated messaging in form of a healthy level of competition/completion 🙂 In the end we are working on the same problem of closed silos and I’m very thankful that both XMPP and matrix exist to solve this issue.

      1. > So – to make sure that I understand it correctly – in that case you’d fetch a new PreKey from the recipient for each message?

        Actually, it wouldn’t – thinking it through further, it would effectively generate a random message key (a single use megolm session) for each new message. This session key would then be shared over Olm, and so use the normal Double Ratchet of a combination of hash ratchet and 3DH to encrypt the keyshare. So in practice it would just fall back to being as good/bad as normal Olm/OMEMO. The problem is that each new message would require the new megolm key to be shared over the full mesh (potentially also with a 3DH handshake too), which is much heavier than Megolm’s default behaviour of relying on the session being relatively longlived, and only having to share new Megolm session details every 100 msgs.

        > It would be really *really* nice to have bridgeable e2ee using MLS, although that’ll probably be hard due to different message formatting inside the payload.

        The way to do this might be to tunnel the MLS payloads inside the XMPP or Matrix (or SMTP etc) framing – more like OTR. It means that all the metadata would be unencrypted of course, but at least the plaintext payloads could get through intact. In practice this would be Hard with MLS though, which requires a centralised application server to keep track of which devices are currently in the conversation, which is of course bad news for decentralised or bridged conversations.

  3. “Actually, it wouldn’t – thinking it through further, it would effectively generate a random message key (a single use megolm session) for each new message. This session key would then be shared over Olm, and so use the normal Double Ratchet of a combination of hash ratchet and 3DH to encrypt the keyshare. So in practice it would just fall back to being as good/bad as normal Olm/OMEMO.”

    Ah, I thought you were talking about refreshing the Olm-level ratchets. Yes, resetting the session of the Megolm ratchet would effectively transform the encryption to plain Olm.

    “In practice this would be Hard with MLS though, which requires a centralised application server to keep track of which devices are currently in the conversation, which is of course bad news for decentralised or bridged conversations.”

    Ah, right, MLS has its own infrastructure, didn’t think of that.

    Thank you for the discussion 🙂

  4. hi,I want to know the group chat performance of the megolm protocol. How many people can chat in each group at the same time? Can it support 10,000 people? The signaling protocol encrypts each message individually for each member,so its performance is slow,looking forward to your reply.

Leave a Reply

Your email address will not be published.