Install Jitsi-Meet alongside ejabberd

Image of an office conference room with empty chairs

Since the corona virus is forcing many of us into home office there is a high demand for video conference solutions. A popular free and open source tool for creating video conferences similar to Google’s hangouts is Jitsi Meet. It enables you to create a conference room from within your browser for which you can then share a link to your coworkers. No client software is needed at all (except mobile devices).

The installation of Jitsi Meet is super straight forward – if you have a dedicated server sitting around. Simply add the jitsi repository to your package manager and (in case of debian based systems) type

sudo apt-get install jitsi-meet

The installer will guide you through most of the process (setting up nginx / apache, installing dependencies, even do the letsencrypt setup) and in the end you can start video calling! The quick start guide does a better job explaining this than I do.

Jitsi Meet is a suite of different components that all play together (see Jitsi Meet manual). Part of the mix is a prosody XMPP server that is used for signalling. That means if you want to have the simple easy setup experience, your server must not already run another XMPP server. Otherwise you’ll have to do some manual configuration ahead of you.

I did that.

Since I already run a personal ejabberd XMPP server and don’t have any virtualization tools at hands, I wanted to make jitsi-meet use ejabberd instead of prosody. In the end both should be equally suited for the job.

Looking at the prosody configuration file that comes with Jitsi’s bundled prosody we can see that Jitsi Meet requires the XMPP server to serve two different virtual hosts.
The file is located under /etc/prosody/conf.d/meet.example.org.cfg.lua

VirtualHost "meet.example.org"
        authentication = "anonymous"
        ssl = {
                ...
        }
        modules_enabled = {
            "bosh";
            "pubsub";
            "ping";
        }
        c2s_require_encryption = false

Component "conference.meet.example.org" "muc"
    storage = "memory"
admins = { "focus@auth.meet.example.org" }

Component "jitsi-videobridge.meet.example.org"
    component_secret = "SECRET1"

VirtualHost "auth.meet.example.org"
    ssl = {
        ...
    }
    authentication = "internal_plain"

Component "focus.meet.example.org"
    component_secret = "SECRET2"

Remember to replace SECRET1 and SECRET2 with secure secrets! There are also some external components that need to be configured. This is where Jitsi Meet plugs into the XMPP server.

In my case I don’t want to server 3 virtual hosts with my ejabberd, so I decided to replace auth.meet.jabberhead.tk with my already existing main domain jabberhead.tk which already uses internal authentication. So all I had to do is to add the virtual host meet.jabberhead.tk to my ejabberd.yml and configure it to use anonymous authentication.
The ejabberd config file is located under /etc/ejabberd/ejabberd.yml or /opt/ejabberd/conf/ejabberd.yml depending on your ejabberd distribution.

hosts:
    ## serves as main host, as well as auth.meet.jabberhead.tk for focus user
  - "jabberhead.tk"
    ## serves as anonymous authentication host for meet.jabberhead.tk
  - "meet.jabberhead.tk"
...
host_config:
  meet.jabberhead.tk:
    auth_method: anonymous
    allow_multiple_connections: true
    anonymous_protocol: both

The syntax for external components is quite different for ejabberd than it is for prosody, so it took me some time to get it working.

listen:
 -
    port: 5280
    ip: "::"
    module: ejabberd_http
    request_handlers:
      ## Not sure if this is needed, but by default jitsi-meet uses http-bind for bosh
      "/http-bind": mod_bosh
      "/bosh": mod_bosh
    tls: true
    protocol_options: 'TLS_OPTIONS'
  -
    port: 5275
    ip: "::"
    module: ejabberd_service
    access: all
    shaper: fast
    hosts:
      "jitsi-videobridge.jabberhead.tk":
        password: "SECRET1"
  -
    port: 5347
    module: ejabberd_service
    hosts:
      "focus.jabberhead.tk":
        password: "SECRET2"

By re-reading the config files now, I wonder why I ended up placing the focus component under the host focus.jabberhead.tk and not focus.meet.jabberhead.tk, but hey – it works and I’m too scared to touch it again 😛

The configuration of the modules was a bit trickier on ejabberd, as the ejabberd config syntax seems to disallow duplicate entries. In this case I had to configure mod_muc for my main domain different than for the meet.jabberhead.tk domain, so I had to move the original mod_muc and mod_muc_admin configuration out of the modules: block and into an append_host_config: block with different settings per domain.

append_host_config:
  jabberhead.tk:
    modules:
        ## This is the original muc configuration I used before
        mod_muc:
          access:
            - allow
          access_admin:
            - allow: admin
          access_create: muc_create
          access_persistent: muc_create
          access_mam:
            - allow
          default_room_options:
            allow_private_messages: true
            mam: true
            persistent: true
        mod_muc_admin: {}
  meet.jabberhead.tk:
    modules:
      ## This is the config only for meet.jabberhead.tk
      mod_muc:
        host: conference.meet.jabberhead.tk
      mod_muc_admin: {}

mod_pubsub, mod_ping and mod_bosh all have to be enabled, but can stay in the global modules: block.

Last but not least we have to add the focus user as an admin and also generate (not discussed here) and add certificates for the meet.jabberhead.tk subdomain.

certfiles:
  - ...
  - "/etc/ssl/meet.jabberhead.tk/cert.pem"
  - "/etc/ssl/meet.jabberhead.tk/fullchain.pem"
  - "/etc/ssl/meet.jabberhead.tk/privkey.pem"
...
acl:
  admin:
    user:
      - "focus@jabberhead.tk"

That’s it for the ejabberd configuration. Now we have to configure the other Jitsi Meet components. Lets start with jicofo, the Jitsi Conference Focus component.

My /etc/jitsi/jicofo/config file looks as follows.

JICOFO_HOST=jabberhead.tk
JICOFO_HOSTNAME=jabberhead.tk
JICOFO_SECRET=SECRET2
JICOFO_PORT=5347
JICOFO_AUTH_DOMAIN=jabberhead.tk
JICOFO_AUTH_USER=focus
JICOFO_AUTH_PASSWORD=SECRET3
JICOFO_OPTS=""
# Below can be left as is.
JAVA_SYS_PROPS=...

Respectively the videobridge configuration (/etc/jitsi/videobridge/config) looks like this:

JVB_HOSTNAME=jabberhead.tk
JVB_HOST=localhost
JVB_PORT=5275
JVB_SECRET=SECRET1
## Leave below as originally was
JAVA_SYS_PROPS=...

Some changes had to be made to /etc/jitsi/videobridge/sip-communicator.properties:

org.jitsi.videobridge.AUTHORIZED_SOURCE_REGEXP=focus@jabberhead.tk/.*
org.ice4j.ice.harvest.NAT_HARVESTER_LOCAL_ADDRESS=<IP-OF-YOUR-SERVER>
org.ice4j.ice.harvest.NAT_HARVESTER_PUBLIC_ADDRESS=<IP-OF-YOUR-SERVER>
org.jitsi.videobridge.TCP_HARVESTER_PORT=4443

Now we can wire it all together by modifying the Jitsi Meet config file found under /etc/jitsi/meet/meet.example.org-config.js:

var config = {
    hosts: {
        domain: 'jabberhead.tk',
        anonymousdomain: 'meet.jabberhead.tk',
        authdomain: 'jabberhead.tk',
        bridge: 'jitsi-videobridge.meet.jabberhead.tk',
        focus: 'focus.jabberhead.tk',
        muc: 'conference.meet.jabberhead.tk'
    },
    bosh: '//meet.jabberhead.tk/http-bind',
    clientNode: 'http://jitsi.org/jitsimeet',
    focusUserJid: 'focus@jabberhead.tk',

    testing: {
    ...
    }
...
}

Last but not least my nginx host configuration (/etc/nginx/sites-available/meet.example.org.conf) where all I changed was the bosh configuration (went from http to https, I heard people say this is not necessary though).

server {
    listen 80;
    server_name meet.jabberhead.tk;
    return 301 https://$host$request_uri;
}
server {
    listen 443 ssl;
    server_name meet.jabberhead.tk;
    ...
    # BOSH
    location = /http-bind {
        proxy_pass      https://localhost:5280/http-bind;
        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host $http_host;
    }
    ...
}

Finally of course, I also had to register the focus user as an XMPP account:

ejabberdctl register focus jabberhead.tk SECRET3

Remember to use a safe password instead of SECRET3 and also stop and disable the bundled prosody! That’s it!

I hope this lowers the bar for some to deploy Jitsi Meet next to their already existing ejabberd. Lastly please do not ask me for support, as I barely managed to get this working for myself 😛

OMEMO Specification Sprint

Photo of an orange and white clown fish inside an anemone

The past weekend some members of the XMPP community gathered in Düsseldorf to work on the next iteration of the OMEMO Encryption Specification. All of us agree that the result – version 0.4 of XEP-0384 – is a huge step forward and better than ever!

On Saturday morning we met up at the Chaosdorf, a local Hacker Space who’s members kindly hosted the sprint. Huge thanks to them for having us!

Prior to the sprint we had collected a list of bullet points of topics we wanted to discuss. Among the more urging topics was proper specification of OMEMO for group chats, support for encrypting extension elements other than the body, as well as clarification on how to implement OMEMO without having to use libsignal. While the latter was technically already possible having a clear written documentation on how to do it is very important.

We spent most of the first day discussing several changes, features and problems and later started writing down the solutions we found. In between – in true Düsseldorf fashion – we snacked on some Onigiri and later went for some nice Ramen together. Saturday afternoon we started working in smaller groups on different parts of the specification. I’m amazed by the know-how and technical understanding that people brought to the table!

On the second day I had to leave relatively early after lunchtime due to family commitments, so I could only follow the remaining development of the XEP via git commits on the train.

Apart from further clarification, the updated spec now contains some additional features and tweaks. It is now possible to encrypt near arbitrary contents of messages with the help of Stanza Content Encryption. OMEMO now defines its own SCE profile. This enables workflows like fully end-to-end encrypted read markers and reactions. Thanks to Marvin and Klaus, the specification now also contains a section about how to opt-out of OMEMO encryption, both completely as well as on a per-conversation basis. Now you no longer have to manually disable OMEMO for that one contact on EVERY device you own.

The biggest part of the discussions went into properly specifying the cryptographic primitives for use with the Double Ratchet Algorithm. Tim and Andy did a great job of describing how to use hash functions and cipher algorithms to keep be able to re-implement OMEMO without having to rely on libsignal alone. Klaus and Marvin figured out some sane rules that help to decide when a device becomes active / inactive / stale. This should preserve the cryptographic guarantees of the encryption even if you don’t use one of your devices for a longer time.

Daniel properly described the workflow of recovering from broken sessions. This should improve OMEMO session stability. He also defined the exact form of OMEMO related XML elements. One notable feature from a users perspective are human readable labels for identity keys. This should make it easier for you to distinguish keys from another.

I’m really excited about the changes and can’t wait to see the first implementations in the real world!

One thing that’s left to do for now is to determine a smooth upgrade path. Clients will probably have to use both the new and old OMEMO in parallel for some time, as the changes are not backwards compatible. This would mean that we cannot immediately benefit from stanza content encryption and are bound to body-only encryption for some more time.

How to Implement a XEP for Smack.

Photo of the Earth taken from Space

Smack is a FLOSS XMPP client library for Java and Android app development. It takes away much of the burden a developer of a chat application would normally have to carry, so the developer can spend more time working on nice stuff like features instead of having to deal with the protocol stack.

Many (80+ and counting) XMPP Extension Protocols (XEPs) are already implemented in Smack. Today I want to bring you along with me and add support for one more.

What Smack does very well is to follow the Open-Closed-Principle of software architecture. That means while Smacks classes are closed for modification by the developer, it is pretty easy to extend Smack to add support for custom features. If Smack doesn’t fit your needs, don’t change it, extend it!

The most important class in Smack is probably the XMPPConnection, as this is where messages coming from and going to. However, even more important for the developer is what is being sent.

XMPP’s strength comes from the fact that arbitrary XML elements can be exchanged by clients and servers. Heck, the server doesn’t even have to understand what two clients are sending each other. That means that if you need to send some form of data from one device to another, you can simply use XMPP as the transport protocol, serialize your data as XML elements with a namespace that you control and send if off! It doesn’t matter, which XMPP server software you choose, as the server more or less just forwards the data from the sender to the receiver. Awesome!

So lets see how we can extend Smack to add support for a new feature without changing (and therefore potentially breaking) any existing code!

For this article, I chose XEP-0428: Fallback Indication as an example protocol extension. The goal of Fallback Indication is to explicitly mark <body/> elements in messages as fallback. For example some end-to-end encryption mechanisms might still add a body with an explanation that the message is encrypted, so that older clients that cannot decrypt the message due to lack of support still display the explanation text instead. This enables the user to switch to a better client 😛 Another example would be an emoji in the body as fallback for a reaction.

XEP-0428 does this by adding a fallback element to the message:

<message from="alice@example.org" to="bob@example.net" type="chat">
  <fallback xmlns="urn:xmpp:fallback:0"/>  <-- THIS HERE
  <encrypted xmlns="urn:example:crypto">Rgreavgl vf abg n irel ybat
gvzr nccneragyl.</encrypted>
  <body>This message is encrypted.</body>
</message>

If a client or server encounter such an element, they can be certain that the body of the message is intended to be a fallback for legacy clients and act accordingly. So how to get this feature into Smack?

After the XMPPConnection, the most important types of classes in Smack are the ExtensionElement interface and the ExtensionElementProvider class. The later defines a class responsible for deserializing or parsing incoming XML into the an object of the former class.

The ExtensionElement is itself an empty interface in that it does not provide anything new, but it is composed from a hierarchy of other interfaces from which it inherits some methods. One notable super class is NamedElement, more on that in just a second. If we start our XEP-0428 implementation by creating a class that implements ExtensionElement, our IDE would create this class body for us:

package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.ExtensionElement;
import org.jivesoftware.smack.packet.XmlEnvironment;

public class FallbackIndicationElement implements ExtensionElement {
    
    @Override
    public String getNamespace() {
        return null;
    }

    @Override
    public String getElementName() {
        return null;
    }

    @Override
    public CharSequence toXML(XmlEnvironment xmlEnvironment) {
        return null;
    }
}

The first thing we should do is to change the return type of the toXML() method to XmlStringBuilder, as that is more performant and gains us a nice API to work with. We could also leave it as is, but it is generally recommended to return an XmlStringBuilder instead of a boring old CharSequence.

Secondly we should take a look at the XEP to identify what to return in getNamespace() and getElementName().

<fallback xmlns="urn:xmpp:fallback:0"/>
[   ^    ]      [        ^          ]
element name          namespace

In XML, the part right after the opening bracket is the element name. The namespace follows as the value of the xmlns attribute. An element that has both an element name and a namespace is called fully qualified. That’s why ExtensionElement is inheriting from FullyQualifiedElement. In contrast, a NamedElement does only have an element name, but no explicit namespace. In good object oriented manner, Smacks ExtensionElement inherits from FullyQualifiedElement which in term is inheriting from NamedElement but also introduces the getNamespace() method.

So lets turn our new knowledge into code!

package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.ExtensionElement;
import org.jivesoftware.smack.packet.XmlEnvironment;

public class FallbackIndicationElement implements ExtensionElement {
    
    @Override
    public String getNamespace() {
        return "urn:xmpp:fallback:0";
    }

    @Override
    public String getElementName() {
        return "fallback";
    }

    @Override
    public XmlStringBuilder toXML(XmlEnvironment xmlEnvironment) {
        return null;
    }
}

Hm, now what about this toXML() method? At this point it makes sense to follow good old test driven development practices and create a JUnit test case that verifies the correct serialization of our element.

package tk.jabberhead.blog.wow.nice;

import static org.jivesoftware.smack.test.util.XmlUnitUtils.assertXmlSimilar;
import org.jivesoftware.smackx.pubsub.FallbackIndicationElement;
import org.junit.jupiter.api.Test;

public class FallbackIndicationElementTest {

    @Test
    public void serializationTest() {
        FallbackIndicationElement element = new FallbackIndicationElement();

        assertXmlSimilar("<fallback xmlns=\"urn:xmpp:fallback:0\"/>",
element.toXML());
    }
}

Now we can tweak our code until the output of toXml() is just right and we can be sure that if at some point someone starts messing with the code the test will inform us of any breakage. So what now?

Well, we said it is better to use XmlStringBuilder instead of CharSequence, so lets create an instance. Oh! XmlStringBuilder can take an ExtensionElement as constructor argument! Lets do it! What happens if we return new XmlStringBuilder(this); and run the test case?

<fallback xmlns="urn:xmpp:fallback:0"

Almost! The test fails, but the builder already constructed most of the element for us. It prints an opening bracket, followed by the element name and adds an xmlns attribute with our namespace as value. This is typically the “head” of any XML element. What it forgot is to close the element. Lets see… Oh, there’s a closeElement() method that again takes our element as its argument. Lets try it out!

<fallback xmlns="urn:xmpp:fallback:0"</fallback>

Hm, this doesn’t look right either. Its not even valid XML! (ノಠ益ಠ)ノ彡┻━┻ Normally you’d use such a sequence to close an element which contained some child elements, but this one is an empty element. Oh, there it is! closeEmptyElement(). Perfect!

<fallback xmlns="urn:xmpp:fallback:0"/>
package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.ExtensionElement;
import org.jivesoftware.smack.packet.XmlEnvironment;

public class FallbackIndicationElement implements ExtensionElement {
    
    @Override
    public String getNamespace() {
        return "urn:xmpp:fallback:0";
    }

    @Override
    public String getElementName() {
        return "fallback";
    }

    @Override
    public XmlStringBuilder toXML(XmlEnvironment xmlEnvironment) {
        return new XmlStringBuilder(this).closeEmptyElement();
    }
}

We can now serialize our ExtensionElement into valid XML! At this point we could start sending around FallbackIndications to all our friends and family by adding it to a message object and sending that off using the XMPPConnection. But what is sending without receiving? For this we need to create an implementation of the ExtensionElementProvider custom to our FallbackIndicationElement. So lets start.

package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.XmlEnvironment;
import org.jivesoftware.smack.provider.ExtensionElementProvider;
import org.jivesoftware.smack.xml.XmlPullParser;

public class FallbackIndicationElementProvider
extends ExtensionElementProvider<FallbackIndicationElement> {
    
    @Override
    public FallbackIndicationElement parse(XmlPullParser parser,
int initialDepth, XmlEnvironment xmlEnvironment) {
        return null;
    }
}

Normally implementing the deserialization part in form of a ExtensionElementProvider is tiring enough for me to always do that last, but luckily this is not the case with Fallback Indications. Every FallbackIndicationElement always looks the same. There are no special attributes or – shudder – nested named child elements that need special treating.

Our implementation of the FallbackIndicationElementProvider looks simply like this:

package tk.jabberhead.blog.wow.nice;

import org.jivesoftware.smack.packet.XmlEnvironment;
import org.jivesoftware.smack.provider.ExtensionElementProvider;
import org.jivesoftware.smack.xml.XmlPullParser;

public class FallbackIndicationElementProvider
extends ExtensionElementProvider<FallbackIndicationElement> {
    
    @Override
    public FallbackIndicationElement parse(XmlPullParser parser,
int initialDepth, XmlEnvironment xmlEnvironment) {
        return new FallbackIndicationElement();
    }
}

Very nice! Lets finish the element part by creating a test that makes sure that our provider does as it should by creating another JUnit test. Obviously we have done that before writing any code, right? We can simply put this test method into the same test class as the serialization test.

    @Test
    public void deserializationTest()
throws XmlPullParserException, IOException, SmackParsingException {
        String xml = "<fallback xmlns=\"urn:xmpp:fallback:0\"/>";
        FallbackIndicationElementProvider provider =
new FallbackIndicationElementProvider();
        XmlPullParser parser = TestUtils.getParser(xml);

        FallbackIndicationElement element = provider.parse(parser);

        assertEquals(new FallbackIndicationElement(), element);
    }

Boom! Working, tested code!

But how does Smack learn about our shiny new FallbackIndicationElementProvider? Internally Smack uses a Manager class to keep track of registered ExtensionElementProviders to choose from when processing incoming XML. Spoiler alert: Smack uses Manager classes for everything!

If we have no way of modifying Smacks code base, we have to manually register our provider by calling

ProviderManager.addExtensionProvider("fallback", "urn:xmpp:fallback:0",
new FallbackIndicationElementProvider());

Element providers that are part of Smacks codebase however are registered using an providers.xml file instead, but the concept stays the same.

Now when receiving a stanza containing a fallback indication, Smack will parse said element into an object that we can acquire from the message object by calling

FallbackIndicationElement element = message.getExtension("fallback",
"urn:xmpp:fallback:0");

You should have noticed by now, that the element name and namespace are used and referred to in a number some places, so it makes sense to replace all the occurrences with references to a constant. We will put these into the FallbackIndicationElement where it is easy to find. Additionally we should provide a handy method to extract fallback indication elements from messages.

...

public class FallbackIndicationElement implements ExtensionElement {
    
    public static final String NAMESPACE = "urn:xmpp:fallback:0";
    public static final String ELEMENT = "fallback";

    @Override
    public String getNamespace() {
        return NAMESPACE;
    }

    @Override
    public String getElementName() {
        return ELEMENT;
    }

    ...

    public static FallbackIndicationElement fromMessage(Message message) {
        return message.getExtension(ELEMENT, NAMESPACE);
    }
}

Did I say Smack uses Managers for everything? Where is the FallbackIndicationManager then? Well, lets create it!

package tk.jabberhead.blog.wow.nice;

import java.util.Map;
import java.util.WeakHashMap;

import org.jivesoftware.smack.Manager;
import org.jivesoftware.smack.XMPPConnection;

public class FallbackIndicationManager extends Manager {

    private static final Map<XMPPConnection, FallbackIndicationManager>
INSTANCES = new WeakHashMap<>();

    public static synchronized FallbackIndicationManager
getInstanceFor(XMPPConnection connection) {
        FallbackIndicationManager manager = INSTANCES.get(connection);
        if (manager == null) {
            manager = new FallbackIndicationManager(connection);
            INSTANCES.put(connection, manager);
        }
        return manager;
    }

    private FallbackIndicationManager(XMPPConnection connection) {
        super(connection);
    }
}

Woah, what happened here? Let me explain.

Smack uses Managers to provide the user (the developer of an application) with an easy access to functionality that the user expects. In order to use some feature, the first thing the user does it to acquire an instance of the respective Manager class for their XMPPConnection. The returned instance is unique for the provided connection, meaning a different connection would get a different instance of the manager class, but the same connection will get the same instance anytime getInstanceFor(connection) is called.

Now what does the user expect from the API we are designing? Probably being able to send fallback indications and being notified whenever we receive one. Lets do sending first!

    ...

    private FallbackIndicationManager(XMPPConnection connection) {
        super(connection);
    }

    public MessageBuilder addFallbackIndicationToMessage(
MessageBuilder message, String fallbackBody) {
        return message.setBody(fallbackBody)
                .addExtension(new FallbackIndicationElement());
}

Easy!

Now, in order to listen for incoming fallback indications, we have to somehow tell Smack to notify us whenever a FallbackIndicationElement comes in. Luckily there is a rather nice way of doing this.

    ...

    private FallbackIndicationManager(XMPPConnection connection) {
        super(connection);
        registerStanzaListener();
    }

    private void registerStanzaListener() {
        StanzaFilter filter = new AndFilter(StanzaTypeFilter.MESSAGE, 
                new StanzaExtensionFilter(FallbackIndicationElement.ELEMENT, 
                        FallbackIndicationElement.NAMESPACE));
        connection().addAsyncStanzaListener(stanzaListener, filter);
    }

    private final StanzaListener stanzaListener = new StanzaListener() {
        @Override
        public void processStanza(Stanza packet) 
throws SmackException.NotConnectedException, InterruptedException,
SmackException.NotLoggedInException {
            Message message = (Message) packet;
            FallbackIndicationElement fallbackIndicator =
FallbackIndicationElement.fromMessage(message);
            String fallbackBody = message.getBody();
            onFallbackIndicationReceived(message, fallbackIndicator,
fallbackBody);
        }
    };

    private void onFallbackIndicationReceived(Message message,
FallbackIndicationElement fallbackIndicator, String fallbackBody) {
        // do something, eg. notify registered listeners etc.
    }

Now that’s nearly it. One last, very important thing is left to do. XMPP is known for its extensibility (for the better or the worst). If your client supports some feature, it is a good idea to announce this somehow, so that the other end knows about it. That way features can be negotiated so that the sender doesn’t try to use some feature that the other client doesn’t support.

Features are announced by using XEP-0115: Entity Capabilities, which is based on XEP-0030: Service Discovery. Smack supports this using the ServiceDiscoveryManager. We can announce support for Fallback Indications by letting our manager call

ServiceDiscoveryManager.getInstanceFor(connection)
        .addFeature(FallbackIndicationElement.NAMESPACE);

somewhere, for example in its constructor. Now the world knows that we know what Fallback Indications are. We should however also provide our users with the possibility to check if their contacts support that feature as well! So lets add a method for that to our manager!

    public boolean userSupportsFallbackIndications(EntityBareJid jid) 
            throws XMPPException.XMPPErrorException,
SmackException.NotConnectedException, InterruptedException, 
            SmackException.NoResponseException {
        return ServiceDiscoveryManager.getInstanceFor(connection())
                .supportsFeature(jid, FallbackIndicationElement.NAMESPACE);
    }

Done!

I hope this little article brought you some insights into the XMPP protocol and especially into the development process of protocol libraries such as Smack, even though the demonstrated feature was not very spectacular.

Quick reminder that the next Google Summer of Code is coming soon and the XMPP Standards Foundation got accepted 😉
Check out the project ideas page!

Happy Hacking!

Smack: Some more busy nights and 12 bytes of IV

Decorative image of an architecture blueprint

In the last months I stayed up late some nights, so I decided to add some additional features to Smack.

Among the additions is support for some new XEPs, namely:

I also started working on an implementation of XEP-0245: Message Moderation, but that one is not yet finished and needs more work.

Direct MUC invitations are a method to invite users to a group chat. Smack already had support for another similar mechanism, but this one is recommended by the XMPP Compliance Suites 2020.

Message Fastening is a generalized mechanism to add information to messages. That might be a reaction, eg. a thumbs up which is added to a previous message.

Message Retraction is used to retract previously sent messages. Internally it is based on Message Fastening.

The Stanza Content Encryption pull request only teaches Smack what SCE elements are, but it doesn’t yet teach it how to use them. That is partly due to no E2EE specification actually using them yet. That will hopefully change soon 😉

Anu brought up the fact that the OMEMO XEP is not totally clear on the length of initialization vectors used for message encryption. Historically most clients use 16 bytes length, while normally you would want to use 12. Apparently some AES-GCM libraries on iOS only support 12 bytes length, so using 12 bytes is definitely desirable. Most OMEMO implementations already support receiving 12 bytes as well as 16 bytes IV.

That’s why Smack will soon also start sending OMEMO messages with 12 bytes IV.

Re: The Ecosystem is Moving

Moxie Marlinspike, the creator of Signal gave a talk at 36C3 on Saturday titled “The ecosystem is moving”.

The Fahrplan description of that talk reads as follows:

Considerations for distributed and decentralized technologies from the perspective of a product that many would like to see decentralize.

Amongst an environment of enthusiasm for blockchain-based technologies, efforts to decentralize the internet, and tremendous investment in distributed systems, there has been relatively little product movement in this area from the mobile and consumer internet spaces.

This is an exploration of challenges for distributed technologies, as well as some considerations for what they do and don’t provide, from the perspective of someone working on user-focused mobile communication. This also includes a look at how Signal addresses some of the same problems that decentralized and distributed technologies hope to solve.

https://fahrplan.events.ccc.de/congress/2019/Fahrplan/events/11086.html

Basically the talk is a reiteration of some arguments from a blog post with the same title he posted back in 2016.

In his presentation, Marlinspike basically states that federated systems have the issue of being frozen in time while centralized systems are flexible and easy to change.

As an example, Marlinspike names HTTP/1.1, which was released in 1999 and on which we are stuck on ever since. While it is true that a huge part of the internet is currently running on HTTP 1.0 and 1.1, one has to consider that its successor HTTP/2.0 was only released in 2015. 4 / 5 years are not a long time to update the entirety of the internet, especially if you consider the fact that the big browser vendors announced to only make their browsers work with HTTP/2.0 sites when they are TLS encrypted.

Marlinspike then goes on listing 4 expectations that advocates of federated systems have, namely privacy, censorship resistance, availability and control. This is pretty accurate and matches my personal expectations pretty well. He then argues, that Signal as a centralized application can fulfill those expectations as well, if not better than a decentralized system.

Privacy

Privacy is often expected to be provided by the means of data ownership, says Marlinspike. As an example he mentions email. He argues that even though he is self-hosting his emails, “each and every mail has GMail at the other end”.

I agree with this observation and think that this is a real problem. But the answer to this problem would logically be that we need to increase our efforts to change that by reducing the number of GMail accounts and increasing the number of self-hosted email servers, right? This is not really an argument for centralization, where each and every message is guaranteed to have the same service at the other end.

I also agree with his opinion that a more effective tool to gain privacy is good encryption. He obviously brings the point that email encryption is unusable, (hinting to PGP probably), totally ignoring modern approaches to email encryption like autocrypt.

Censorship resistance

Federated systems are censorship resistant. At least that is the expectation that advocates of federated systems have. Every time a server gets blocked, the user just simply switches to another server. The issue that Marlinspike points out is, that every time this happens, the user loses his entire social graph. While this is an issue, there are solutions to this problem, one being nomadic identities. If some server goes down the user simply migrates to another server, taking his contacts with him. Hubzilla does this for example. There are also import/export features present in most services nowadays thanks to the GDPR. XMPP offers such a solution using XEP-0277.

But lets take a look at how Signal circumvents censorship according to Marlinspike. He proudly presents Domain Fronting as the solution. With domain fronting, the client connects to some big service which is costly to block for a censor and uses that as a proxy to connect to the actual server. While this appears to be a very elegant solution, Marlinspike conceals the fact that Google and Amazon pretty quickly intervened and stopped Signal from using their domains.

With Google Cloud and AWS out of the picture, it seems that domain fronting as a censorship circumvention technique is now largely non-viable in the countries where Signal had enabled this feature.

https://signal.org/blog/looking-back-on-the-front/

Notice that above quote was posted by Marlinspike himself more than one and a half years ago. Why exactly he brings this as an argument remains a mystery to me.

Update: Apparently Signal still successfully uses Domain Fronting, just with content delivery networks other than Google and Amazon.

And even if domain fronting was an effective way to circumvent censorship, it could also be applied to federated servers as well, adding an additional layer of protection instead of solely relying on it.

But what if the censor is not a foreign nation, but instead the nation where your servers are located? What if the US decides to shutdown signal.org for some reason? No amount of domain fronting can protect you from police raiding your server center. Police confiscating each and every server of a federated system (or even a considerable fraction of it) on the other hand is unlikely.

Availability

This brings us nicely to the next point on the agenda, availability.

If you have a centralized service than you want to move that centralized service into two different data centers. And the way you did that was by splitting the data up between those data centers and you just halved your availability, because the mean time between failures goes up since you have two different data centers which means that it is more likely to have an outage in one of those data centers in any given moment.

Moxie Marlinspike in his 36c3 talk “The Ecosystem is Moving”

For some reason Marlinspike confuses a decentralized system with a centralized, but distributed system. It even reads “Centralized Service” on his slides… Decentralization does not equal distribution.

A federated system would obviously not be fault free, as servers naturally tend to go down, but an outage only causes a small fraction of the network to collapse, contrary to a total outage of centralized systems. There even are techniques to minimize the loss of functionality further, for example distributed chat rooms in the matrix protocol.

Control

The advocates argument of control says that if a service provider behaves undesirably, you simply switch to another service provider. Marlinspike rightfully asks the question how it then can be that many people still use Yahoo as their mail provider. Indeed that is a good question. I guess the only answer I can come up with is that most people probably don’t care enough about their email to make the switch. To be honest, email is kind of boring anyways 😉

XMPP

Next Marlinspike talks about XMPP. He (rightfully) notes that due to XMPPs extensibility there is a morass of XEPs and that those don’t really feel consistent.

The XMPP community already recognized the problem that comes with having that many XEPs and tries to solve this issue by introducing so called compliance suites. These are annually published documents that contain a list of XEPs that are considered vitally important for clients or servers. These suites act as maps that point a way through the XEP jungle.

Next Marlinspike states that the XMPP protocol still fails to be a suitable option for mobile devices. This statement is plain wrong and was already debunked in a blog post by Daniel Gultsch back in 2016. Gultsch develops an XMPP client for Android which is totally usable and generally has lower battery consumption than Signal has. Conversations implements all of the XEPs listed in the compliance suites to be required for mobile clients. This shows that implementing a decent mobile client for a federated system can be done and there is a recipe for it.

What Marlinspike could have pointed out instead is that the XMPP community struggles to come up with a decent iOS client. That would have been a fair argument, but spreading FUD about the XMPP protocol as a whole is unfair and dishonest.

Luckily the audience of the talk didn’t fully buy into Marlinspikes weaker arguments as demonstrated by some entertaining questions during the QA afterwards.

What Marlinspike is right about though is that developing a federated system is harder than doing a centralized service. You as the developer have control over the whole system and subsequently over the users. However this is actually the reason why we, the community of decentralized systems and federated protocols do what we do. In the words of J.F. Kennedy, we do these things…

…not because they are easy, but because they are hard…

… or simply because they are right.

Pitfalls for OMEMO Implementations – Part 1: Inactive Devices

Smack’s OMEMO implementation received a security audit a while ago (huge thanks to the Guardian Project for providing the funding!). Radically Open Security, a non-profit pentesting group from the Netherlands focused on free software and ethical hacking went through the code in great detail to check its correctness and to search for any vulnerabilities. In the end they made some findings, although I wouldn’t consider them catastrophically bad (full disclosure – its my code, so I might be biased :D). In this post I want to go over two of the finding and discuss, what went wrong and how the issue was fixed.

Finding 001: Inactive Devices break Forward Secrecy

Inactive devices which no longer come online retain old chaining keys, which, if compromised, break
confidentiality of all messages from that key onwards.

Penetration Test Report – Radically Open Security
  • Vulnerability Type: Loss of Forward Secrecy.
  • Threat Level: High

Finding 003: Read-Only Devices Compromise Forward Secrecy

Read-only devices never update their keys and thus forfeit forward security.

Penetration Test Report – Radically Open Security
  • Vulnerability Type: Loss of Forward Secrecy.
  • Threat Level: Elevated

These vulnerabilities have the same cause and can be fixed by the same patch. They can happen, if a contact has a device which doesn’t come online (Finding 001), or just doesn’t respond to messages (Finding 003) for an extended period of time. To understand the issue, we have to look deeper into how the cryptography behind the double ratchet works.

Background

In OMEMO, your device has a separate session with every device it sends messages to. In case of one-to-one chat, your phone may have a session with your computer (you can read messages you sent from your phone on your computer too), with your contacts phone, your contacts computer and so on.

In each session, the cryptography happens in a kind of a ping pong way. Lets say you send two messages, then your contact responds with 4 messages, then its your turn again to send some messages and so on and so forth.

To better illustrate whats happening behind the scenes in the OMEMO protocol, lets compare an OMEMO chat with a discussion which is moderated using a “talking stick” (or “speachers staff”). Every time you want to send a message, you acquire the talking stick. Now you can send as many messages as you want. If another person wants to send you a message, you’ve got to hand them the stick first.

Now, cryptographically OMEMO is based on the double ratchet. As the name suggests, the double ratchet consists of two ratchets; the Diffie-Hellman-Ratchet and the Symmetric-Key-Ratchet.

The Symmetric-Key-Ratchet is responsible to encrypt every message you send with a new key. Roughly speaking, this is done by deriving a fresh encryption key from the key which was used to encrypt the previous message you sent. This procedure is called Key Derivation Function Chain (KDF-Chain). The benefit of this technique is, that if an attacker manages to break the key of a message, they cannot use that key to decrypt any previous messages (this is called “forward secrecy“). They can however use that key to derive the message keys of following messages, which is bad.

Functional Principle of a Key Derivation Function Chain

This is where the Diffie-Hellman-Ratchet comes into play. It replaces the symmetric ratchet key every time the “talking stick” gets handed over. Whenever you pass it to your contact (by them sending a message to you), your device executes a Diffie-Hellman key exchange with your contact and the result is used to generate a new key for the symmetric ratchet. The consequence is, that whenever the talking stick gets handed over, the attacker can no longer decrypt any following messages. This is why the protocol is described as “self healing”. The cryptographic term is called “future secrecy“.

Functional Principle of the Diffie-Hellman Ratchet Algorithm

Vulnerability

So, what’s wrong with this technique?

Well, lets say you have a device that you used only once or twice with OMEMO and which is now laying in the drawer, collecting dust. That device might still have active sessions with other devices. The problem is, that it will never acquire the talking stick (it will never send a message). Therefore the Diffie-Hellman-Ratchet will never be forwarded, which will lead to the base key of the Symmetric-Key-Ratchet never being replaced. The consequence is, that once an attacker manages to break one key of the chain, they can use that key to derive all following keys, which will get them access to all messages that device receives in that session. The property of “forward secrecy” is lost.

Smack-omemo already kind of dealt with that issue by removing inactive devices from the device list after a certain amount of time (two weeks). This does however only include own devices, not inactive devices of a contact. In the past I tried to fix this by ignoring inactive devices of a contact, by refusing to encrypt message for them. However, this lead to deadlocks, where two devices were mutually ignoring each other, resulting in a situation where no device could send a message to the other one to recover. As a consequence, Smack omitted deactivating stale contacts devices, which resulted in the vulnerability.

The Patch

This problem could not really be solved without changing the XEP (I’d argue). In my opinion the XEP must specify an explicit way of dealing with inactive devices. Therefore I proposed to use message counters, which gets rid of the deadlock problem. A device is considered as read-only, if the other party sent n messages without getting a reply. That way we can guarantee, that in a worst case scenario, an attacker can get access to n messages. That magical number n would need to be determined though.
I opened a pull request against XEP-0384 which introduces these changes.

Mitigation for the issue has been merged into Smack’s source code already 🙂

As another prevention against this vulnerability, clients could send empty ping messages to forward the Diffie-Hellman-Ratchet. This does however require devices to take action and does not work with clients that are permanently offline though. Both changes would however go hand in hand to improve the OMEMO user experience and security. Smack-omemo already offers the client developer a method to send ping messages 🙂

I hope I could help any fellow developers to avoid this pitfall and spark some discussion around how to address this issue in the XEP.

Happy (ethical) Hacking!

On “Clean Architecture”

Some skyscrapers photographed from the bottom to the top.

I recently did what I rarely do: buy and read an educational book. Shocking, I know, but I can assure you that I’m fine 😉

The book I ordered is Clean Architecture – A Craftsman’s Guide to Software Structure and Design by Robert C. Martin. As the title suggests it is about software architecture.

I’ve barely read half of the book, but I’ve already learned a ton! I find it curious that as a halfway decent programmer, I often more or less know what Martin is talking about when he is describing a certain architectural pattern, but I often didn’t know said pattern had a name, or what consequences using said pattern really implied. It is really refreshing to see the bigger picture and having all the pros and cons of certain design decisions layed out in an overview.

One important point the book is trying to put across is how important it is to distinguish between important things like business rule, and not so important things as details. Let me try to give you an example.

Lets say I want to build a reactive Android XMPP chat application using Smack (foreshadowing? 😉 ). Lets identify the details. Surely Smack is a detail. Even though I’d be using Smack for some of the core functionalities of the app, I could just as well chose another XMPP library like babbler to get the job done. But there are even more details, Android for example.

In fact, when you strip out all the details, you are left with reactive chat application. Even XMPP is a detail! A chat application doesn’t care, what protocol you use to send and receive messages, heck it doesn’t even care if it is run on Android or on any other device (that can run java).

I’m still not quite sure, if the keyword reactive is a detail, as I’d say it is a more a programming paradigm. Details are things that can easily be switched out and/or extended and I don’t think you can easily replace a programming paradigm.

The book does a great job of identifying and describing simple rules that, when applied to a project lead to a cleaner, more structured architecture. All in all it teaches how important software architecture in general is.

There is however one drawback with the book. It constantly wants to make you want to jump straight into your next big project with lots of features, so it is hard to keep reading while all that excited ;P

If you are a software developer – no matter whether you work on small hobby projects or big enterprise products, whether or not you pursue to become a Software Architect – I can only recommend reading this book!

One more thought: If you want to support a free software project, maybe donating books like this is a way to contribute?

Happy Hacking!

Dealing with Colors in lower Android versions

I’m currently working on a project which requires me to dynamically set the color of certain UI elements to random RGB values.

Unfortunately and surprisingly, handy methods for dealing with colors in Android are only available since API level 26 (Android O). Luckily though, the developer reference specifies, how colors are encoded internally in the Android operating system, so I was able to create a class with the most important color-related (for my use-case) methods which is able to function in lower Android versions as well.

Link to the gist

I hope I can save some of you from headaches by sharing the code. Feel free to reuse as you please 🙂

Happy Hacking!

PS: Is there a way to create Github-like gists with Gitea?

Closer Look at the Double Ratchet

In the last blog post, I took a closer look at how the Extended Triple Diffie-Hellman Key Exchange (X3DH) is used in OMEMO and which role PreKeys are playing. This post is about the other big algorithm that makes up OMEMO. The Double Ratchet.

The Double Ratchet algorithm can be seen as the gearbox of the OMEMO machine. In order to understand the Double Ratchet, we will first have to understand what a ratchet is.

Before we start: This post makes no guarantees to be 100% correct. It is only meant to explain the inner workings of the Double Ratchet algorithm in a (hopefully) more or less understandable way. Many details are simplified or omitted for sake of simplicity. If you want to implement this algorithm, please read the Double Ratchet specification.

A ratchet tool can only turn in one direction, hence it is eponymous for the algorithm.
Image by Benedikt.Seidl [Public domain]

A ratchet is a tool used to drive nuts and bolts. The distinctive feature of a ratchet tool over an ordinary wrench is, that the part that grips the head of the bolt can only turn in one direction. It is not possible to turn it in the opposite direction as it is supposed to.

In OMEMO, ratchet functions are one-way functions that basically take input keys and derives a new keys from that. Doing it in this direction is easy (like turning the ratchet tool in the right direction), but it is impossible to reverse the process and calculate the original key from the derived key (analogue to turning the ratchet in the opposite direction).

Symmetric Key Ratchet

One type of ratchet is the symmetric key ratchet (abbrev. sk ratchet). It takes a key and some input data and produces a new key, as well as some output data. The new key is derived from the old key by using a so called Key Derivation Function. Repeating the process multiple times creates a Key Derivation Function Chain (KDF-Chain). The fact that it is impossible to reverse a key derivation is what gives the OMEMO protocol the property of Forward Secrecy.

A Key Derivation Function Chain or Symmetric Ratchet

The above image illustrates the process of using a KDF-Chain to generate output keys from input data. In every step, the KDF-Chain takes the input and the current KDF-Key to generate the output key. Then it derives a new KDF-Key from the old one, replacing it in the process.

To summarize once again: Every time the KDF-Chain is used to generate an output key from some input, its KDF-Key is replaced, so if the input is the same in two steps, the output will still be different due to the changed KDF-Key.

One issue of this ratchet is, that it does not provide future secrecy. That means once an attacker gets access to one of the KDF-Keys of the chain, they can use that key to derive all following keys in the chain from that point on. They basically just have to turn the ratchet forwards.

Diffie-Hellman Ratchet

The second type of ratchet that we have to take a look at is the Diffie-Hellman Ratchet. This ratchet is basically a repeated Diffie-Hellman Key Exchange with changing key pairs. Every user has a separate DH ratcheting key pair, which is being replaced with new keys under certain conditions. Whenever one of the parties sends a message, they include the public part of their current DH ratcheting key pair in the message. Once the recipient receives the message, they extract that public key and do a handshake with it using their private ratcheting key. The resulting shared secret is used to reset their receiving chain (more on that later).

Once the recipient creates a response message, they create a new random ratchet key and do another handshake with their new private key and the senders public key. The result is used to reset the sending chain (again, more on that later).

Principle of the Diffie-Hellman Ratchet.
Image by OpenWhisperSystems (modified by author)

As a result, the DH ratchet is forwarded every time the direction of the message flow changes. The resulting keys are used to reset the sending-/receiving chains. This introduces future secrecy in the protocol.

The Diffie-Hellman Ratchet

Chains

A session between two devices has three chains – a root chain, a sending chain and a receiving chain.

The root chain is a KDF chain which is initialized with the shared secret which was established using the X3DH handshake. Both devices involved in the session have the same root chain. Contrary to the sending and receiving chains, the root chain is only initialized/reset once at the beginning of the session.

The sending chain of the session on device A equals the receiving chain on device B. On the other hand, the receiving chain on device A equals the sending chain on device B. The sending chain is used to generate message keys which are used to encrypt messages. The receiving chain on the other hand generates keys which can decrypt incoming messages.

Whenever the direction of the message flow changes, the sending and receiving chains are reset, meaning their keys are replaced with new keys generated by the root chain.

The full Double Ratchet Algorithms Ratchet Architecture

An Example

I think this rather complex protocol is best explained by an example message flow which demonstrates what actually happens during message sending / receiving etc.

In our example, Obi-Wan and Grievous have a conversation. Obi-Wan starts by establishing a session with Grievous and sends his initial message. Grievous responds by sending two messages back. Unfortunately the first of his replies goes missing.

Session Creation

In order to establish a session with Grievous, Obi-Wan has to first fetch one of Grievous key bundles. He uses this to establish a shared secret S between him and Grievous by executing a X3DH key exchange. More details on this can be found in my previous post. He also extracts Grievous signed PreKey ratcheting public key. S is used to initialize the root chain.

Obi-Wan now uses Grievous public ratchet key and does a handshake with his own ratchet private key to generate another shared secret which is pumped into the root chain. The output is used to initialize the sending chain and the KDF-Key of the root chain is replaced.

Now Obi-Wan established a session with Grievous without even sending a message. Nice!

The session initiator prepares the sending chain.
The initial root key comes from the result of the X3DH handshake.
Original image by OpenWhisperSystems (modified by author)

Initial Message

Now the session is established on Obi-Wans side and he can start composing a message. He decides to send a classy “Hello there!” as a greeting. He uses his sending chain to generate a message key which is used to encrypt the message.

Principle of generating message keys from the a KDF-Chain.
In our example only one message key is derived though.
Image by OpenWhisperSystems

Note: In the above image a constant is used as input for the KDF-Chain. This constant is defined by the protocol and isn’t important to understand whats going on.

Now Obi-Wan sends over the encrypted message along with his ratcheting public key and some information on what PreKey he used, the current sending key chain index (1), etc.

When Grievous receives Obi-Wan’s message, he completes his X3DH handshake with Obi-Wan in order to calculate the same exact shared secret S as Obi-Wan did earlier. He also uses S to initialize his root chain.

Now Grevious does a full ratchet step of the Diffie-Hellman Ratchet: He uses his private and Obi-Wans public ratchet key to do a handshake and initialize his receiving chain with the result. Note: The result of the handshake is the same exact value that Obi-Wan earlier calculated when he initialized his sending chain. Fantastic, isn’t it? Next he deletes his old ratchet key pair and generates a fresh one. Using the fresh private key, he does another handshake with Obi-Wans public key and uses the result to initialize his sending chain. This completes the full DH ratchet step.

Full Diffie-Hellman Ratchet Step
Image by OpenWhisperSystems

Decrypting the Message

Now that Grievous has finalized his side of the session, he can go ahead and decrypt Obi-Wans message. Since the message contains the sending chain index 1, Grievous knows, that he has to use the first message key generated from his receiving chain to decrypt the message. Because his receiving chain equals Obi-Wans sending chain, it will generate the exact same keys, so Grievous can use the first key to successfully decrypt Obi-Wans message.

Sending a Reply

Grievous is surprised by bold actions of Obi-Wan and promptly goes ahead to send two replies.

He advances his freshly initialized sending chain to generate a fresh message key (with index 1). He uses the key to encrypt his first message “General Kenobi!” and sends it over to Obi-Wan. He includes his public ratchet key in the message.

Unfortunately though the message goes missing and is never received.

He then forwards his sending chain a second time to generate another message key (index 2). Using that key he encrypt the message “You are a bold one.” and sends it to Obi-Wan. This message contains the same public ratchet key as the first one, but has the sending chain index 2. This time the message is received.

Receiving the Reply

Once Obi-Wan receives the second message and does a full ratchet step in order to complete his session with Grevious. First he does a DH handshake between his private and the Grevouos’ public ratcheting key he got from the message. The result is used to setup his receiving chain. He then generates a new ratchet key pair and does a second handshake. The result is used to reset his sending chain.

Obi-Wan notices that the sending chain index of the received message is 2 instead of 1, so he knows that one message must have been missing or delayed. To deal with this problem, he advances his receiving chain twice (meaning he generates two message keys from the receiving chain) and caches the first key. If later the missing message arrives, the cached key can be used to successfully decrypt the message. For now only one message arrived though. Obi-Wan uses the generated message key to successfully decrypt the message.

Conclusions

What have we learned from this example?

Firstly, we can see that the protocol guarantees forward secrecy. The KDF-Chains used in the three chains can only be advanced forwards, and it is impossible to turn them backwards to generate earlier keys. This means that if an attacker manages to get access to the state of the receiving chain, they can not decrypt messages sent prior to the moment of attack.

But what about future messages? Since the Diffie-Hellman ratchet introduces new randomness in every step (new random keys are generated), an attacker is locked out after one step of the DH ratchet. Since the DH ratchet is used to reset the symmetric ratchets of the sending and receiving chain, the window of the compromise is limited by the next DH ratchet step (meaning once the other party replies, the attacker is locked out again).

On top of this, the double ratchet algorithm can deal with missing or out-of-order messages, as keys generated from the receiving chain can be cached for later use. If at some point Obi-Wan receives the missing message, he can simply use the cached key to decrypt its contents.

This self-healing property was eponymous to the Axolotl protocol (an earlier name of the Signal protocol, the basis of OMEMO).

Acknowledgements

Thanks to syndace and paul for their feedback and clarification on some points.

Shaking Hands With OMEMO: X3DH Key Exchange

This is the first part of a small series about the cryptographic building blocks of OMEMO. This post is about the Extended Triple Diffie Hellman Key Exchange Algorithm (X3DH) which is used to establish a session between OMEMO devices.
Part 2: Closer Look at the Double Ratchet

In the past I have written some posts about OMEMO and its future and how it does compare to the Olm encryption protocol used by matrix.org. However, some readers requested a closer, but still straightforward look at how OMEMO and the underlying algorithms work. To get started, we first have to take a look at its past.

OMEMO was implemented in the Android Jabber Client Conversations as part of a Google Summer of Code project by Andreas Straub in 2015. The basic idea was to utilize the encryption library used by Signal (formerly TextSecure) for message encryption. So basically OMEMO borrows almost all the cryptographic mechanisms including the Double Ratchet and X3DH from Signals encryption protocol, which is appropriately named Signal Protocol. So to begin with, lets look at it first.

The Signal Protocol

The famous and ingenious protocol that drives the encryption behind Signal, OMEMO, matrix.org, WhatsApp and a lot more was created by Trevor Perrin and Moxie Marlinspike in 2013. Basically it consists of two parts that we need to further investigate:

  • The Extended Triple-Diffie-Hellman Key Exchange (X3DH)
  • The Double Ratchet Algorithm

One core principle of the protocol is to get rid of encryption keys as soon as possible. Almost every message is encrypted with another fresh key. This is a huge difference to other protocols like OpenPGP, where the user only has one key which can decrypt all messages ever sent to them. The later can of course also be seen as an advantage OpenPGP has over OMEMO, but it all depends on the situation the user is in and what they have to protect against.

A major improvement that the Signal Protocol introduced compared to encryption protocols like OTRv3 (Off-The-Record Messaging) was the ability to start a conversation with a chat partner in an asynchronous fashion, meaning that the other end didn’t have to be online in order to agree on a shared key. This was not possible with OTRv3, since both parties had to actively send messages in order to establish a session. This was okay back in the days where people would start their computer with the intention to chat with other users that were online at the same time, but it’s no longer suitable today.

Note: The recently worked on OTRv4 will not come with this handicap anymore.

The X3DH Key Exchange

Let’s get to it already!

X3DH is a key agreement protocol, meaning it is used when two parties establish a session in order to agree on a shared secret. For a conversation to be confidential we require, that only sender and (intended) recipient of a message are able to decrypt it. This is possible when they share a common secret (eg. a password or shared key). Exchanging this key with one another has long been kind of a hen and egg problem: How do you get the key from one end to the other without an adversary being able to get a copy of the key? Well, obviously by encrypting it, but how? How do you get that key to the other side? This problem has only been solved after the second world war.

The solution is a so called Diffie-Hellman-Merkle Key Exchange. I don’t want to go into too much detail about this, as there are really great resources about how it works available online, but the basic idea is that each party possesses an asymmetric key pair consisting of a public and a private key. The public key can be shared over insecure networks while the
private key must be kept secret. A Diffie-Hellman key exchange (DH) is the process of combining a public key A with a private key b in order to generate a shared secret. The essential trick is, that you get the same exact secret if you combine the secret key a with the public key B. Wikipedia does a great job at explaining this using an analogy of mixing colors.

Deniability and OTR

In normal day to day messaging you don’t always want to commit to what you said. Especially under oppressive regimes it may be a good idea to be able to deny that you said or wrote something specific. This principle is called deniability.

Note: It is debatable, whether cryptographic deniability ever saved someone from going to jail, but that’s not scope of this blog post.

At the same time you want to be absolutely sure that you are really talking to your chat partner and not to a so called man in the middle. These desires seem to be conflicting at first, but the OTR protocol featured both. The user has an IdentityKey, which is used to identify the user by means of a fingerprint. The (massively and horribly simplified) procedure of creating a OTR session is as follows: Alice generates a random session key and signs the public key with her IdentityKey. She then sends that public key over to Bob, who generates another random session key with which he executes his half of the DH handshake. He then sends the public part of that key (again, signed) back to Alice, who does another DH to acquire the same shared secret as Bob. As you can see, in order to establish a session, both parties had to be online. Note: The signing part has been oversimplified for sake of readability.

Normal Diffie-Hellman Key Exchange

From DH to X3DH

Perrin and Marlinspike improved upon this model by introducing the concept of PreKeys. Those basically are the first halves of a DH-handshake, which can – along with some other keys of the user – be uploaded to a server prior to the beginning of a conversation. This way another user can initiate a session by fetching one half-completed handshake and completing it.

Basically the Signal protocol comprises of the following set of keys per user:

IdentityKey (IK)Acts as the users identity by providing a stable fingerprint
Signed PreKey (SPK)Acts as a PreKey, but carries an additional signature of IK
Set of PreKeys ({OPK})Unsigned PreKeys

If Alice wants to start chatting, she can fetch Bobs IdentityKey, Signed PreKey and one of his PreKeys and use those to create a session. In order to preserve cryptographic properties, the handshake is modified like follows:

DH1 = DH(IK_A, SPK_B)
DH2 = DH(EK_A, IK_B)
DH3 = DH(EK_A, SPK_B)
DH4 = DH(EK_A, OPK_B)

S = KDF(DH1 || DH2 || DH3 || DH4)

EK_A denotes an ephemeral, random key which is generated by Alice on the fly. Alice can now derive an encryption key to encrypt her first message for Bob. She then sends that message (a so called PreKeyMessage) over to Bob, along with some additional information like her IdentityKey IK, the public part of the ephemeral key EK_A and the ID of the used PreKey OPK.

Visual representation of the X3DH handshake

Once Bob logs in, he can use this information to do the same calculations (just with swapped public and private keys) to calculate S from which he derives the encryption key. Now he can decrypt the message.

In order to prevent the session initiation from failing due to lost messages, all messages that Alice sends over to Bob without receiving a first message back are PreKeyMessages, so that Bob can complete the session, even if only one of the messages sent by Alice makes its way to Bob. The exact details on how OMEMO works after the X3DH key exchange will be discussed in part 2 of this series 🙂

X3DH Key Exchange TL;DR

X3DH utilizes PreKeys to allow session creation with offline users by doing 4 DH handshakes between different keys.

A subtle but important implementation difference between OMEMO and Signal is, that the Signal server is able to manage the PreKeys for the user. That way it can make sure, that every PreKey is only used once. OMEMO on the other hand solely relies on the XMPP servers PubSub component, which does not support such behavior. Instead, it hands out a bundle of around 100 PreKeys. This seems like a lot, but in reality the chances of a PreKey collision are pretty high (see the birthday problem).

OMEMO does come with some counter measures for problems and attacks that arise from this situation, but it makes the protocol a little less appealing than the original Signal protocol.

Clients should for example keep used PreKeys around until the end of catch -up of missed message to allow decryption of messages that got sent in sessions that have been established using the same PreKey.