OMEMO Specification Sprint

Photo of an orange and white clown fish inside an anemone

The past weekend some members of the XMPP community gathered in Düsseldorf to work on the next iteration of the OMEMO Encryption Specification. All of us agree that the result – version 0.4 of XEP-0384 – is a huge step forward and better than ever!

On Saturday morning we met up at the Chaosdorf, a local Hacker Space who’s members kindly hosted the sprint. Huge thanks to them for having us!

Prior to the sprint we had collected a list of bullet points of topics we wanted to discuss. Among the more urging topics was proper specification of OMEMO for group chats, support for encrypting extension elements other than the body, as well as clarification on how to implement OMEMO without having to use libsignal. While the latter was technically already possible having a clear written documentation on how to do it is very important.

We spent most of the first day discussing several changes, features and problems and later started writing down the solutions we found. In between – in true Düsseldorf fashion – we snacked on some Onigiri and later went for some nice Ramen together. Saturday afternoon we started working in smaller groups on different parts of the specification. I’m amazed by the know-how and technical understanding that people brought to the table!

On the second day I had to leave relatively early after lunchtime due to family commitments, so I could only follow the remaining development of the XEP via git commits on the train.

Apart from further clarification, the updated spec now contains some additional features and tweaks. It is now possible to encrypt near arbitrary contents of messages with the help of Stanza Content Encryption. OMEMO now defines its own SCE profile. This enables workflows like fully end-to-end encrypted read markers and reactions. Thanks to Marvin and Klaus, the specification now also contains a section about how to opt-out of OMEMO encryption, both completely as well as on a per-conversation basis. Now you no longer have to manually disable OMEMO for that one contact on EVERY device you own.

The biggest part of the discussions went into properly specifying the cryptographic primitives for use with the Double Ratchet Algorithm. Tim and Andy did a great job of describing how to use hash functions and cipher algorithms to keep be able to re-implement OMEMO without having to rely on libsignal alone. Klaus and Marvin figured out some sane rules that help to decide when a device becomes active / inactive / stale. This should preserve the cryptographic guarantees of the encryption even if you don’t use one of your devices for a longer time.

Daniel properly described the workflow of recovering from broken sessions. This should improve OMEMO session stability. He also defined the exact form of OMEMO related XML elements. One notable feature from a users perspective are human readable labels for identity keys. This should make it easier for you to distinguish keys from another.

I’m really excited about the changes and can’t wait to see the first implementations in the real world!

One thing that’s left to do for now is to determine a smooth upgrade path. Clients will probably have to use both the new and old OMEMO in parallel for some time, as the changes are not backwards compatible. This would mean that we cannot immediately benefit from stanza content encryption and are bound to body-only encryption for some more time.

How to Implement a XEP for Smack.

Photo of the Earth taken from Space

Smack is a FLOSS XMPP client library for Java and Android app development. It takes away much of the burden a developer of a chat application would normally have to carry, so the developer can spend more time working on nice stuff like features instead of having to deal with the protocol stack.

Many (80+ and counting) XMPP Extension Protocols (XEPs) are already implemented in Smack. Today I want to bring you along with me and add support for one more.

What Smack does very well is to follow the Open-Closed-Principle of software architecture. That means while Smacks classes are closed for modification by the developer, it is pretty easy to extend Smack to add support for custom features. If Smack doesn’t fit your needs, don’t change it, extend it!

The most important class in Smack is probably the XMPPConnection, as this is where messages coming from and going to. However, even more important for the developer is what is being sent.

XMPP’s strength comes from the fact that arbitrary XML elements can be exchanged by clients and servers. Heck, the server doesn’t even have to understand what two clients are sending each other. That means that if you need to send some form of data from one device to another, you can simply use XMPP as the transport protocol, serialize your data as XML elements with a namespace that you control and send if off! It doesn’t matter, which XMPP server software you choose, as the server more or less just forwards the data from the sender to the receiver. Awesome!

So lets see how we can extend Smack to add support for a new feature without changing (and therefore potentially breaking) any existing code!

For this article, I chose XEP-0428: Fallback Indication as an example protocol extension. The goal of Fallback Indication is to explicitly mark <body/> elements in messages as fallback. For example some end-to-end encryption mechanisms might still add a body with an explanation that the message is encrypted, so that older clients that cannot decrypt the message due to lack of support still display the explanation text instead. This enables the user to switch to a better client 😛 Another example would be an emoji in the body as fallback for a reaction.

XEP-0428 does this by adding a fallback element to the message:

<message from="alice@example.org" to="bob@example.net" type="chat">
  <fallback xmlns="urn:xmpp:fallback:0"/>  <-- THIS HERE
  <encrypted xmlns="urn:example:crypto">Rgreavgl vf abg n irel ybat
gvzr nccneragyl.</encrypted>
  <body>This message is encrypted.</body>
</message>

If a client or server encounter such an element, they can be certain that the body of the message is intended to be a fallback for legacy clients and act accordingly. So how to get this feature into Smack?

After the XMPPConnection, the most important types of classes in Smack are the ExtensionElement interface and the ExtensionElementProvider class. The later defines a class responsible for deserializing or parsing incoming XML into the an object of the former class.

The ExtensionElement is itself an empty interface in that it does not provide anything new, but it is composed from a hierarchy of other interfaces from which it inherits some methods. One notable super class is NamedElement, more on that in just a second. If we start our XEP-0428 implementation by creating a class that implements ExtensionElement, our IDE would create this class body for us:

package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.ExtensionElement;
import org.jivesoftware.smack.packet.XmlEnvironment;

public class FallbackIndicationElement implements ExtensionElement {
    
    @Override
    public String getNamespace() {
        return null;
    }

    @Override
    public String getElementName() {
        return null;
    }

    @Override
    public CharSequence toXML(XmlEnvironment xmlEnvironment) {
        return null;
    }
}

The first thing we should do is to change the return type of the toXML() method to XmlStringBuilder, as that is more performant and gains us a nice API to work with. We could also leave it as is, but it is generally recommended to return an XmlStringBuilder instead of a boring old CharSequence.

Secondly we should take a look at the XEP to identify what to return in getNamespace() and getElementName().

<fallback xmlns="urn:xmpp:fallback:0"/>
[   ^    ]      [        ^          ]
element name          namespace

In XML, the part right after the opening bracket is the element name. The namespace follows as the value of the xmlns attribute. An element that has both an element name and a namespace is called fully qualified. That’s why ExtensionElement is inheriting from FullyQualifiedElement. In contrast, a NamedElement does only have an element name, but no explicit namespace. In good object oriented manner, Smacks ExtensionElement inherits from FullyQualifiedElement which in term is inheriting from NamedElement but also introduces the getNamespace() method.

So lets turn our new knowledge into code!

package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.ExtensionElement;
import org.jivesoftware.smack.packet.XmlEnvironment;

public class FallbackIndicationElement implements ExtensionElement {
    
    @Override
    public String getNamespace() {
        return "urn:xmpp:fallback:0";
    }

    @Override
    public String getElementName() {
        return "fallback";
    }

    @Override
    public XmlStringBuilder toXML(XmlEnvironment xmlEnvironment) {
        return null;
    }
}

Hm, now what about this toXML() method? At this point it makes sense to follow good old test driven development practices and create a JUnit test case that verifies the correct serialization of our element.

package tk.jabberhead.blog.wow.nice;

import static org.jivesoftware.smack.test.util.XmlUnitUtils.assertXmlSimilar;
import org.jivesoftware.smackx.pubsub.FallbackIndicationElement;
import org.junit.jupiter.api.Test;

public class FallbackIndicationElementTest {

    @Test
    public void serializationTest() {
        FallbackIndicationElement element = new FallbackIndicationElement();

        assertXmlSimilar("<fallback xmlns=\"urn:xmpp:fallback:0\"/>",
element.toXML());
    }
}

Now we can tweak our code until the output of toXml() is just right and we can be sure that if at some point someone starts messing with the code the test will inform us of any breakage. So what now?

Well, we said it is better to use XmlStringBuilder instead of CharSequence, so lets create an instance. Oh! XmlStringBuilder can take an ExtensionElement as constructor argument! Lets do it! What happens if we return new XmlStringBuilder(this); and run the test case?

<fallback xmlns="urn:xmpp:fallback:0"

Almost! The test fails, but the builder already constructed most of the element for us. It prints an opening bracket, followed by the element name and adds an xmlns attribute with our namespace as value. This is typically the “head” of any XML element. What it forgot is to close the element. Lets see… Oh, there’s a closeElement() method that again takes our element as its argument. Lets try it out!

<fallback xmlns="urn:xmpp:fallback:0"</fallback>

Hm, this doesn’t look right either. Its not even valid XML! (ノಠ益ಠ)ノ彡┻━┻ Normally you’d use such a sequence to close an element which contained some child elements, but this one is an empty element. Oh, there it is! closeEmptyElement(). Perfect!

<fallback xmlns="urn:xmpp:fallback:0"/>
package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.ExtensionElement;
import org.jivesoftware.smack.packet.XmlEnvironment;

public class FallbackIndicationElement implements ExtensionElement {
    
    @Override
    public String getNamespace() {
        return "urn:xmpp:fallback:0";
    }

    @Override
    public String getElementName() {
        return "fallback";
    }

    @Override
    public XmlStringBuilder toXML(XmlEnvironment xmlEnvironment) {
        return new XmlStringBuilder(this).closeEmptyElement();
    }
}

We can now serialize our ExtensionElement into valid XML! At this point we could start sending around FallbackIndications to all our friends and family by adding it to a message object and sending that off using the XMPPConnection. But what is sending without receiving? For this we need to create an implementation of the ExtensionElementProvider custom to our FallbackIndicationElement. So lets start.

package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.XmlEnvironment;
import org.jivesoftware.smack.provider.ExtensionElementProvider;
import org.jivesoftware.smack.xml.XmlPullParser;

public class FallbackIndicationElementProvider
extends ExtensionElementProvider<FallbackIndicationElement> {
    
    @Override
    public FallbackIndicationElement parse(XmlPullParser parser,
int initialDepth, XmlEnvironment xmlEnvironment) {
        return null;
    }
}

Normally implementing the deserialization part in form of a ExtensionElementProvider is tiring enough for me to always do that last, but luckily this is not the case with Fallback Indications. Every FallbackIndicationElement always looks the same. There are no special attributes or – shudder – nested named child elements that need special treating.

Our implementation of the FallbackIndicationElementProvider looks simply like this:

package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.XmlEnvironment;
import org.jivesoftware.smack.provider.ExtensionElementProvider;
import org.jivesoftware.smack.xml.XmlPullParser;

public class FallbackIndicationElementProvider
extends ExtensionElementProvider<FallbackIndicationElement> {
    
    @Override
    public FallbackIndicationElement parse(XmlPullParser parser,
int initialDepth, XmlEnvironment xmlEnvironment) {
        return new FallbackIndicationElement();
    }
}

Very nice! Lets finish the element part by creating a test that makes sure that our provider does as it should by creating another JUnit test. Obviously we have done that before writing any code, right? We can simply put this test method into the same test class as the serialization test.

    @Test
    public void deserializationTest()
throws XmlPullParserException, IOException, SmackParsingException {
        String xml = "<fallback xmlns=\"urn:xmpp:fallback:0\"/>";
        FallbackIndicationElementProvider provider =
new FallbackIndicationElementProvider();
        XmlPullParser parser = TestUtils.getParser(xml);

        FallbackIndicationElement element = provider.parse(parser);

        assertEquals(new FallbackIndicationElement(), element);
    }

Boom! Working, tested code!

But how does Smack learn about our shiny new FallbackIndicationElementProvider? Internally Smack uses a Manager class to keep track of registered ExtensionElementProviders to choose from when processing incoming XML. Spoiler alert: Smack uses Manager classes for everything!

If we have no way of modifying Smacks code base, we have to manually register our provider by calling

ProviderManager.addExtensionProvider("fallback", "urn:xmpp:fallback:0",
new FallbackIndicationElementProvider());

Element providers that are part of Smacks codebase however are registered using an providers.xml file instead, but the concept stays the same.

Now when receiving a stanza containing a fallback indication, Smack will parse said element into an object that we can acquire from the message object by calling

FallbackIndicationElement element = message.getExtension("fallback",
"urn:xmpp:fallback:0");

You should have noticed by now, that the element name and namespace are used and referred to in a number some places, so it makes sense to replace all the occurrences with references to a constant. We will put these into the FallbackIndicationElement where it is easy to find. Additionally we should provide a handy method to extract fallback indication elements from messages.

...

public class FallbackIndicationElement implements ExtensionElement {
    
    public static final String NAMESPACE = "urn:xmpp:fallback:0";
    public static final String ELEMENT = "fallback";

    @Override
    public String getNamespace() {
        return NAMESPACE;
    }

    @Override
    public String getElementName() {
        return ELEMENT;
    }

    ...

    public static FallbackIndicationElement fromMessage(Message message) {
        return message.getExtension(ELEMENT, NAMESPACE);
    }
}

Did I say Smack uses Managers for everything? Where is the FallbackIndicationManager then? Well, lets create it!

package tk.jabberhead.blog.wow.nice;

import java.util.Map;
import java.util.WeakHashMap;

import org.jivesoftware.smack.Manager;
import org.jivesoftware.smack.XMPPConnection;

public class FallbackIndicationManager extends Manager {

    private static final Map<XMPPConnection, FallbackIndicationManager>
INSTANCES = new WeakHashMap<>();

    public static synchronized FallbackIndicationManager
getInstanceFor(XMPPConnection connection) {
        FallbackIndicationManager manager = INSTANCES.get(connection);
        if (manager == null) {
            manager = new FallbackIndicationManager(connection);
            INSTANCES.put(connection, manager);
        }
        return manager;
    }

    private FallbackIndicationManager(XMPPConnection connection) {
        super(connection);
    }
}

Woah, what happened here? Let me explain.

Smack uses Managers to provide the user (the developer of an application) with an easy access to functionality that the user expects. In order to use some feature, the first thing the user does it to acquire an instance of the respective Manager class for their XMPPConnection. The returned instance is unique for the provided connection, meaning a different connection would get a different instance of the manager class, but the same connection will get the same instance anytime getInstanceFor(connection) is called.

Now what does the user expect from the API we are designing? Probably being able to send fallback indications and being notified whenever we receive one. Lets do sending first!

    ...

    private FallbackIndicationManager(XMPPConnection connection) {
        super(connection);
    }

    public MessageBuilder addFallbackIndicationToMessage(
MessageBuilder message, String fallbackBody) {
        return message.setBody(fallbackBody)
                .addExtension(new FallbackIndicationElement());
}

Easy!

Now, in order to listen for incoming fallback indications, we have to somehow tell Smack to notify us whenever a FallbackIndicationElement comes in. Luckily there is a rather nice way of doing this.

    ...

    private FallbackIndicationManager(XMPPConnection connection) {
        super(connection);
        registerStanzaListener();
    }

    private void registerStanzaListener() {
        StanzaFilter filter = new AndFilter(StanzaTypeFilter.MESSAGE, 
                new StanzaExtensionFilter(FallbackIndicationElement.ELEMENT, 
                        FallbackIndicationElement.NAMESPACE));
        connection().addAsyncStanzaListener(stanzaListener, filter);
    }

    private final StanzaListener stanzaListener = new StanzaListener() {
        @Override
        public void processStanza(Stanza packet) 
throws SmackException.NotConnectedException, InterruptedException,
SmackException.NotLoggedInException {
            Message message = (Message) packet;
            FallbackIndicationElement fallbackIndicator =
FallbackIndicationElement.fromMessage(message);
            String fallbackBody = message.getBody();
            onFallbackIndicationReceived(message, fallbackIndicator,
fallbackBody);
        }
    };

    private void onFallbackIndicationReceived(Message message,
FallbackIndicationElement fallbackIndicator, String fallbackBody) {
        // do something, eg. notify registered listeners etc.
    }

Now that’s nearly it. One last, very important thing is left to do. XMPP is known for its extensibility (for the better or the worst). If your client supports some feature, it is a good idea to announce this somehow, so that the other end knows about it. That way features can be negotiated so that the sender doesn’t try to use some feature that the other client doesn’t support.

Features are announced by using XEP-0115: Entity Capabilities, which is based on XEP-0030: Service Discovery. Smack supports this using the ServiceDiscoveryManager. We can announce support for Fallback Indications by letting our manager call

ServiceDiscoveryManager.getInstanceFor(connection)
        .addFeature(FallbackIndicationElement.NAMESPACE);

somewhere, for example in its constructor. Now the world knows that we know what Fallback Indications are. We should however also provide our users with the possibility to check if their contacts support that feature as well! So lets add a method for that to our manager!

    public boolean userSupportsFallbackIndications(EntityBareJid jid) 
            throws XMPPException.XMPPErrorException,
SmackException.NotConnectedException, InterruptedException, 
            SmackException.NoResponseException {
        return ServiceDiscoveryManager.getInstanceFor(connection())
                .supportsFeature(jid, FallbackIndicationElement.NAMESPACE);
    }

Done!

I hope this little article brought you some insights into the XMPP protocol and especially into the development process of protocol libraries such as Smack, even though the demonstrated feature was not very spectacular.

Quick reminder that the next Google Summer of Code is coming soon and the XMPP Standards Foundation got accepted 😉
Check out the project ideas page!

Happy Hacking!

A look at Matrix.org’s OLM | MEGOLM encryption protocol

Everyone who knows and uses XMPP is probably aware of a new player in the game. Matrix.org is often recommended as a young, arising alternative to the aging protocol behind the Jabber ecosystem. However the founders do not see their product as a direct competitor to XMPP as their approach to the problem of message exchanging is quite different.

An open network for secure, decentralized communication.

matrix.org

During his talk at the FOSDEM in Brussels, matrix.org founder Matthew Hodgson roughly compared the concept of matrix to how git works. Instead of passing single messages between devices and servers, matrix is all about synchronization of a shared state. A chat room can be seen as a repository, which is shared between all servers of the participants. As a consequence communication in a chat room can go on, even when the server on which the room was created goes down, as the room simultaneously exists on all the other servers. Once the failed server comes back online, it synchronizes its state with the others and retrieves missed messages.

Matrix in the French State

Olm, Megolm – What’s the deal?

Matrix introduced two different crypto protocols for end-to-end encryption. One is named Olm, which is used in one-to-one chats between two chat partners (this is not quite correct, see Updates for clarifying remarks). It can very well be compared to OMEMO, as it too is an adoption of the Signal Protocol by OpenWhisperSystems. However, due to some differences in the implementation Olm is not compatible with OMEMO although it shares the same cryptographic properties.

The other protocol goes by the name of Megolm and is used in group chats. Conceptually it deviates quite a bit from Olm and OMEMO, as it contains some modifications that make it more suitable for the multi-device use-case. However, those modifications alter its cryptographic properties.

Comparing Cryptographic Building Blocks

ProtocolOlmOMEMO (Signal)
IdentityKeyCurve25519X25519
FingerprintKey⁽¹⁾Ed25519none
PreKeysCurve25519X25519
SignedPreKeys⁽²⁾noneX25519
Key Exchange
Algorithm⁽³⁾
Triple Diffie-Hellman
(3DH)
Extended Triple
Diffie-Hellman (X3DH)
Ratcheting AlgoritmDouble RatchetDouble Ratchet
  1. Signal uses a Curve X25519 IdentityKey, which is capable of both encrypting, as well as creating signatures using the XEdDSA signature scheme. Therefore no separate FingerprintKey is needed. Instead the fingerprint is derived from the IdentityKey. This is mostly a cosmetic difference, as one less key pair is required.
  2. Olm does not distinguish between the concepts of signed and unsigned PreKeys like the Signal protocol does. Instead it only uses one type of PreKey. However, those may be signed with the FingerprintKey upon upload to the server.
  3. OMEMO includes the SignedPreKey, as well as an unsigned PreKey in the handshake, while Olm only uses one PreKey. As a consequence, if the senders Olm IdentityKey gets compromised at some point, the very first few messages that are sent could possibly be decrypted.

In the end Olm and OMEMO are pretty comparable, apart from some simplifications made in the Olm protocol. Those do only marginally affect its security though (as far as I can tell as a layman).

Megolm

The similarities between OMEMO and Matrix’ encryption solution end when it comes to group chat encryption.

OMEMO does not treat chats with more than two parties any other than one-to-one chats. The sender simply has to manage a lot more keys and the amount of required trust decisions grows by a factor roughly equal to the number of chat participants.

Yep, this is a mess but luckily XMPP isn’t a very popular chat protocol so there are no large encrypted group chats ;P

So how does Matrix solve the issue?

When a user joins a group chat, they generate a session for that chat. This session consists of an Ed25519 SigningKey and a single ratchet which gets initialized randomly.

The public part of the signing key and the state of the ratchet are then shared with each participant of the group chat. This is done via an encrypted channel (using Olm encryption). Note, that this session is also shared between the devices of the user. Contrary to Olm, where every device has its own Olm session, there is only one Megolm session per user per group chat.

Whenever the user sends a message, the encryption key is generated by forwarding the ratchet and deriving a symmetric encryption key for the message from the ratchets output. Signing is done using the SigningKey.

Recipients of the message can decrypt it by forwarding their copy of the senders ratchet the same way the sender did, in order to retrieve the same encryption key. The signature is verified using the public SigningKey of the sender.

There are some pros and cons to this approach, which I briefly want to address.

First of all, you may find that this protocol is way less elegant compared to Olm/Omemo/Signal. It poses some obvious limitations and security issues. Most importantly, if an attacker gets access to the ratchet state of a user, they could decrypt any message that is sent from that point in time on. As there is no new randomness introduced, as is the case in the other protocols, the attacker can gain access by simply forwarding the ratchet thereby generating any decryption keys they need. The protocol defends against this by requiring the user to generate a new random session whenever a new user joins/leaves the room and/or a certain number of messages has been sent, whereby the window of possibly compromised messages gets limited to a smaller number. Still, this is equivalent to having a single key that decrypts multiple messages at once.

The Megolm specification lists a number of other caveats.

On the pro side of things, trust management has been simplified as the user basically just has to decide whether or not to trust each group member instead of each participating device – reducing the complexity from a multiple of n down to just n. Also, since there is no new randomness being introduced during ratchet forwarding, messages can be decrypted multiple times. As an effect devices do not need to store the decrypted messages. Knowledge of the session state(s) is sufficient to retrieve the message contents over and over again.

By sharing older session states with own devices it is also possible to read older messages on new devices. This is a feature that many users are missing badly from OMEMO.

On the other hand, if you really need true future secrecy on a message-by-message base and you cannot risk that an attacker may get access to more than one message at a time, you are probably better off taking the bitter pill going through the fingerprint mess and stick to normal Olm/OMEMO (see Updates for remarks on this statement).

Note: End-to-end encryption does not really make sense in big, especially public chat rooms, since an attacker could just simply join the room in order to get access to ongoing communication. Thanks to Florian Schmaus for pointing that out.

I hope I could give a good overview of the different encryption mechanisms in XMPP and Matrix. Hopefully I did not make any errors, but if you find mistakes, please let me know, so I can correct them asap 🙂

Happy Hacking!

Sources

Updates:

Thanks for Matthew Hodgson for pointing out, that Olm/OMEMO is also effectively using a symmetric ratchet when multiple consecutive messages are sent without the receiving device sending an answer. This can lead to loss of future secrecy as discussed in the OMEMO protocol audit.

Also thanks to Hubert Chathi for noting, that Megolm is also used in one-to-one chats, as matrix doesn’t have the same distinction between group and single chats. He also pointed out, that the security level of Megolm (the criteria for regenerating the session) can be configured on a per-chat basis.

Unified Encrypted Payload Elements for XMPP

Letter in an envelope

Requirements on encryption change from time to time. New technologies pop up and crypto protocols get replaced by new ones. There are also different use-cases that require different encryption techniques.

For that reason there is a number of encryption protocols specified for XMPP, amongst them OMEMO and OpenPGP for XMPP.

Most crypto protocols share in common, that they all aim at encrypting certain parts of the message that is being sent, so that only the recipient(s) can read the encrypted content.

OMEMO is currently only capable to encrypt the messages body. For that reason the body of the message is being encrypted and stored in a <payload/> element, which is added to the message. This is inconvenient, as it makes OMEMO quite inflexible. The protocol cannot be used to secure arbitrary extension elements, which might contain sensitive content as well.

<message to='juliet@capulet.lit' from='romeo@montague.lit' id='send1'>
  <encrypted xmlns='eu.siacs.conversations.axolotl'>
    <header>...</header>
    <!-- the payload contains the encrypted content of the body -->
    <payload>BASE64ENCODED</payload>
  </encrypted>
</message>

The modern OpenPGP for XMPP XEP also uses <payload/> elements, but to transport arbitrary extension elements. The difference is, that in OpenPGP, the payload elements contain the actual payload as plaintext. Those <payload/> elements are embedded in either a <crypt/> or <signcrypt/> element, depending on whether or not the message will be signed and then passed through OpenPGP encryption. The resulting ciphertext is then appended to the message element in form of a <openpgp/> element.

<signcrypt xmlns='urn:xmpp:openpgp:0'>
  <to jid='juliet@example.org'/>
  <time stamp='...'/>
  <rpad>...</rpad>
  <payload>
    <body xmlns='jabber:client'>
      This is a secret message.
    </body>
  </payload>
</signcrypt>

<!-- The above element is passed to OpenPGP and the resulting ciphertext is included in the actual message as an <openpgp/> element -->

<message to='juliet@example.org'>
  <openpgp xmlns='urn:xmpp:openpgp:0'>
    BASE64_OPENPGP_MESSAGE
  </openpgp>
</message>

Upon receiving a message containing an <openpgp/> element, the receiver decrypts the content of it, does some verity checks and then replaces the <openpgp/> element of the message with the extension elements contained in the <payload/> element. That way the original, unencrypted message is constructed.

The benefit of this technique is that the <payload/> element can in fact contain any number of arbitrary extension elements. This makes OpenPGP for XMPPs take on encrypting message content way more flexible.

A logical next step would be to take OpenPGP for XMPPs <payload/> elements and move them to a new XEP, which specifies their use in a unified way. This can then be used by OMEMO and any other encryption protocol as well.

The motivation behind this is, that it would broaden the scope of encryption to cover more parts of the message, like read markers and other metadata.

It could also become easier to implement end-to-end encryption in other scenarios such as Jingle file transfer. Even though there is Jingle Encrypted Transports, this protocol only protects the stream itself and leaves the metadata such as filename, size etc. in the clear. A unified <encrypted/> element would make it easier to encrypt such metadata and could be the better approach to the problem.

Future of OMEMO

OMEMO is an XMPP extension protocol, which specifies end-to-end encryption for XMPP clients using the double ratchet algorithm of the Signal protocol. Introduced back in 2015 by GSoC student Andreas Straub in the Conversations client, OMEMO had a lot of press coverage and many privacy and security oriented websites praise XMPP clients that do support it. Its beyond debate, that OMEMO brought many new faces to XMPP. For many users, having end-to-end encryption built into their chat client is a must. Today OMEMO is implemented in a range of clients on different platforms. While Conversations, ChatSecure and Dino support it out of the box, there is a series of plugins that teach OMEMO to other clients such as Gajim, Pidgin and Miranda NG.

However, there is quite a lot of controversy around OMEMO. Part of it are technical discussions, others are more or less of a political nature. Let me list some of them for you.

Some users and client developers see no value in OMEMOs forward secrecy (the fact, that messages can only be decrypted once per device, so new devices do not have access to the chat history of the user). That is a fair point. Especially webclients have a hard time implementing OMEMO in a sensible way. Also the average user is probably having a hard time understanding what exactly forward secrecy is and what the consequences are. Communicating to the user, that not having access to past messages is actually a feature might be a hard task for a client developer.

OMEMOs trust management (still) sucks. One architectural key feature of OMEMO is, that every device does have its own identity key. If a user wants to send a message to one of their contacts, they’re presented with a list of all of their identity keys. Now they have to decide, which keys to trust and which not by comparing fingerprints (seamingly arbitrary strings of 64 characters). This is not a very comfortable thing to do. Some clients encourage the user to verify your contacts devices by scanning QR-Codes, which is way more comfortable, users do however have to meet up in person or share the QR code on another channel.
But what if you get a new device or just reinstall your chat application? Suddenly all your contacts have to decide whether to trust your new fingerprint or not. In the long run this will lead to the user just being annoyed and blindly accepting new fingerprints, ruining the concept of end-to-end encryption.

Daniel Gultsch introduced the concept of BTBV (Blind Trust Before Verification) which can be summed up as “do not bother the user with encryption and hope everything goes well until the user explicitly states that they are interested in having good security”. The principle is, that clients blindly trust any OMEMO identity keys until the user commits to verifying them manually. This makes OMEMO easy to use for those who don’t really care about it, while offering serious users who depend on it the needed security.

But what do you do, if suddenly a rogue fingerprint appears? Do you panic and message all your contacts not to trust the stranger key? In theory any device which has access to the users account (the users server too) can just add another OMEMO identity key to the users list of devices and there is not really anything the user can do about it. OMEMO does not have a “blacklist”-feature or a signed list of trusted keys. It would however be possible to implement such thing in the future by combining OMEMO with OpenPGP for example. Of course, if some stranger has access to your account, it is time to change the password/server anyways.

Another weakness of OMEMO is, that it is currently only usable for encrypting the messages body. Since XMPP is an extensible protocol with other use cases than messaging, it would be nice to have support for arbitrary extension element encryption. There is however the extension protocol “OX” (XEP-0373: OpenPGP for XMPP), which has such capabilities. This feature can be extracted from OX and reused in OMEMO relatively easy.

Lets now focus on the “political” controversies around OMEMO.

In 2016/2017 there has been a lot of discussions, whether or not OMEMO should become a standard in the first place. The problem is, that in it’s current form (which has not really changes since its introduction), OMEMO depends on the wire format used by libsignal (the Signal protocol library used by conversations). That library however is licensed under the GPLv3 license, preventing permissively licensed and closed source applications from implementing OMEMO. While the Signal protocol itself is openly documented, the wire format used by libsignal is not, so any implementations which want to be compatible to current OMEMO clients must implement the same wire format by looking into the libsignal source code, which in turn makes the implementation a derivative of libsignal, which must be licensed under the GPL as well. There has been a pull request against the OMEMO XEP which addressed this issue by specifying an independent wire format for OMEMO, however that pull request was more or less rejected due to inactivity of the author.

During the phases of hot debates around OMEMO, it was discussed to base the protocol on Olm instead of the Signal protocol. Olm is the encryption protocol used by matrix.org. However, up to this point there is no Olm based OMEMO implementation, that I know of, neither have there been any experiments in that direction from people that proposed the idea (again – not that I know of).

Another idea was to completely redesign OMEMOs double ratchet specification as OMEMO-NEXT from ground up without even mentioning a specific library as the foundation. This would obviously be a way more complex XEP, as all the cryptographic operations and primitives which are currently abstracted and hidden away behind the Signal protocol, would have to be written down in that XEP. However, recently the IETF announced that work is done to create MLS (Message Layer Security), which does exactly that. It specifies a completely open version of the double ratchet algorithm along with infrastructure to share key material and so on. I’m not sure whether this is a coincidence, or if some of those who proposed OMEMO-NEXT are directly involved with the development of MLS. We’ll see, when MLS is ready and whether it will compete against OMEMO. I’d really love to see a cross-protocol encryption algorithm btw 😉 #bridges #federateEverything

Now lets talk about the biggest problem of OMEMO. Currently the XEP does not have an active “legal guardian”. The author has been inactive for an extended period of time, ignoring requests for comments, causing a total halt in the development of the standard (making changes to the XEP needs the authors approval). Things like specifying a new wire protocol are possible and reasonably easy to do. However not having changes written down in the XEP makes it nearly impossible to make coordinated changes. I’m sure there is a ton of potential for OMEMO and it is sad to see its (protocol-) development having come to a halt.

I’m sure that many of its current issues can be addressed by changes to the XEP. This is what I think needs to be done to OMEMO to make it more usable:

  • Specify a new wire protocol: This could make OMEMO accesible for commercial applications and allow independent implementations -> broader deployment
  • Specify a general payload encryption scheme: This could benefit other encryption protocols as well and would make it possible to apply end-to-end encryption to a wider variety of use cases.
  • Reuse the payload encryption scheme in OMEMO: Utilize OMEMO for things besides body encryption.
  • Specify a way to sign device lists with “persistent key” algorithms like OpenPGP: This could simplify trust management.
  • Specify a way to backup the identity key: This could reduce the identity key chaos, since the key could be reused on new devices/installations. However clients would have to make it clear to the user, not to use the same key on multiple devices.

This have been my thoughts about OMEMOs current state. What do you think about it?

Happy Hacking!