PGPainless 0.2 Released!

I’m very proud and excited to announce the release of PGPainless version 0.2! Since the last stable release of my OpenPGP library for Java and Android 9 months ago, a lot has changed and improved! Most importantly development on PGPainless is being financially sponsored by FlowCrypt, so I was able to focus a lot more energy into working on the library. I’m very grateful for this opportunity 🙂

A red letter, sealed with a wax seal
Photo by Natasya Chen on Unsplash

PGPainless is using Bouncycastle, but aims to save developers from the pain of writing lots of boilerplate code, while at the same time using the BC API properly. The new release is available on maven central, the source code can be found on Github and Codeberg.

PGPainless is now depending on Bouncycastle 1.68 and the minimum Android API level has been raised to 10 (Android 2.3.3). Let me bring you up to speed about some of its features and the recent changes!

Inspect Keys!

Back in the last stable release, PGPainless could already be used to generate keys. It offered some shortcut methods to quickly generate archetypal keys, such as simple RSA keys, or key rings based on elliptic curves. In version 0.2, support for additional key algorithms was added, such as EdDSA or XDH.

The new release introduces PGPainless.inspectKeyRing(keyRing) which allows you to quickly access information about a key, such as its user-ids, which subkeys are encryption capable and which can be used to sign data, their expiration dates, algorithms etc.

Furthermore this feature can be used to evaluate a key at a certain point in time. That way you can quickly check, which key flags or algorithm preferences applied to the key 3 weeks ago, when that key was used to create that signature you care about. Or you can check, which user-ids your key had 5 years ago.

Edit Keys!

Do you already have a key, but want to extend its expiration date? Do you have a new Email address and need to add it as a user-id to your key? PGPainless.modifyKeyRing(keyRing) allows basic modification of a key. You can add additional user-ids, adopt subkeys into your key, or expire/revoke existing subkeys.

secretKeys = PGPainless.modifyKeyRing(secretKeys)
                       .setExpirationDate(expirationDate, keyRingProtector)
                       .addSubKey(subkey, subkeyProtector, keyRingProtector)
                       .addUserId(UserId.onlyEmail("alice@pgpainless.org"), keyRingProtector)
                       .deleteUserId("alice@pgpainful.org", keyRingProtector)
                       .revokeSubkey(subKeyId, keyRingProtector)
                       .revokeUserId("alice@pgpaintrain.org", keyRingProtector)
                       .changePassphraseFromOldPassphrase(oldPass)
                           .withSecureDefaultSettings().toNewPassphrase(newPass)
                       .done();

Encrypt and Sign Effortlessly!

PGPainless 0.2 comes with an improved, simplified encryption/signing API. While the old API was already quite intuitive, I was focusing too much on the code being “autocomplete-friendly”. My vision was that the user could encrypt a message without ever having to read a bit of documentation, simply by typing PGPainless and then following the autocomplete suggestions of the IDE. While the result was successful in that, the code was not very friendly to bind to real-world applications, as there was not one builder class, but several (one for each “step”). As a result, if a user wanted to configure the encryption dynamically, they would have to keep track of different builder objects and cope with casting madness.

// Old API
EncryptionStream encryptionStream = PGPainless.createEncryptor()
        .onOutputStream(targetOuputStream)
        .toRecipient(aliceKey)
        .and()
        .toRecipient(bobsKey)
        .and()
        .toPassphrase(Passphrase.fromPassword("password123"))
        .usingAlgorithms(SymmetricKeyAlgorithm.AES_192, HashAlgorithm.SHA256, CompressionAlgorithm.UNCOMPRESSED)
        .signWith(secretKeyDecryptor, aliceSecKey)
        .asciiArmor();

Streams.pipeAll(plaintextInputStream, encryptionStream);
encryptionStream.close();

The new API is still intuitive, but at the same time it is flexible enough to be modified with future features. Furthermore, since the builder has been divided it is now easier to integrate PGPainless dynamically.

// New shiny 0.2 API
EncryptionStream encryptionStream = PGPainless.encryptAndOrSign()
        .onOutputStream(outputStream)
        .withOptions(
                ProducerOptions.signAndEncrypt(
                        new EncryptionOptions()
                                .addRecipient(aliceKey)
                                .addRecipient(bobsKey)
                                // optionally encrypt to a passphrase
                                .addPassphrase(Passphrase.fromPassword("password123"))
                                // optionally override symmetric encryption algorithm
                                .overrideEncryptionAlgorithm(SymmetricKeyAlgorithm.AES_192),
                        new SigningOptions()
                                // Sign in-line (using one-pass-signature packet)
                                .addInlineSignature(secretKeyDecryptor, aliceSecKey, signatureType)
                                // Sign using a detached signature
                                .addDetachedSignature(secretKeyDecryptor, aliceSecKey, signatureType)
                                // optionally override hash algorithm
                                .overrideHashAlgorithm(HashAlgorithm.SHA256)
                ).setAsciiArmor(true) // Ascii armor
        );

Streams.pipeAll(plaintextInputStream, encryptionStream);
encryptionStream.close();

Verify Signatures Properly!

The biggest improvement to PGPainless 0.2 is improved, proper signature verification. Prior to this release, PGPainless was doing what probably every other Bouncycastle-based OpenPGP library was doing:

PGPSignature signature = [...];
// Initialize the signature with the public signing key
signature.init(pgpContentVerifierBuilderProvider, signingKey);

// Update the signature with the signed data
int read;
while ((read = signedDataInputStream.read()) != -1) {
    signature.update((byte) read);
}

// Verify signature correctness
boolean signatureIsValid = signature.verify();

The point is that the code above only verifies signature correctness (that the signing key really made the signature and that the signed data is intact). It does however not check if the signature is valid.

Signature validation goes far beyond plain signature correctness and entails a whole suite of checks that need to be performed. Is the signing key expired? Was it revoked? If it is a subkey, is it bound to its primary key correctly? Has the primary key expired or revoked? Does the signature contain unknown critical subpackets? Is it using acceptable algorithms? Does the signing key carry the SIGN_DATA flag? You can read more about why signature verification is hard in my previous blog post.

After implementing all those checks in PGPainless, the library now scores second place on Sequoia’s OpenPGP Interoperability Test Suite!

Lastly, support for verification of cleartext-signed messages such as emails was added.

New SOP module!

Also included in the new release is a shiny new module: pgpainless-sop

This module is an implementation of the Stateless OpenPGP Command Line Interface specification. It basically allows you to use PGPainless as a command line application to generate keys, encrypt/decrypt, sign and verify messages etc.

$ # Generate secret key
$ java -jar pgpainless-sop-0.2.0.jar generate-key "Alice <alice@pgpainless.org>" > alice.sec
$ # Extract public key
$ java -jar pgpainless-sop-0.2.0.jar extract-cert < alice.sec > alice.pub
$ # Sign some data
$ java -jar pgpainless-sop-0.2.0.jar sign --armor alice.sec < message.txt > message.txt.asc
$ # Verify signature
$ java -jar pgpainless-sop-0.2.0.jar verify message.txt.asc alice.pub < message.txt 
$ # Encrypt some data
$ java -jar pgpainless-sop-0.2.0.jar encrypt --sign-with alice.sec alice.pub < message.txt > message.txt.asc
$ # Decrypt ciphertext
$ java -jar pgpainless-sop-0.2.0.jar decrypt --verify-with alice.pub --verify-out=verif.txt alice.sec < message.txt.asc > message.txt

The primary reason for creating this module though was that it enables PGPainless to be plugged into the interoperability test suite mentioned above. This test suite uncovered a ton of bugs and shortcomings and helped me massively to understand and interpret the OpenPGP specification. I can only urge other developers who work on OpenPGP libraries to implement the SOP specification!

Upstreamed changes

Even if you are not yet convinced to switch to PGPainless and want to keep using vanilla Bouncycastle, you might still benefit from some bugfixes that were upstreamed to Bouncycastle.

Every now and then for example, BC would fail to do some internal conversions of elliptic curve encryption keys. The source of this issue was that BC was converting keys from BigIntegers to byte arrays, which could be of invalid length when the encoding was having leading zeros, thus omitting one byte. Fixing this was easy, but finding the bug was taking quite some time.

Another bug caused decryption of messages which were encrypted for more than one key/passphrase to fail, when BC tried to decrypt a Symmetrically Encrypted Session Key Packet with the wrong key/passphrase first. The cause of this issue was that BC was not properly rewinding the decryption stream after reading a checksum, thus corrupting decryption for subsequent attempts with the correct passphrase or key. The fix was to mark and rewind the stream properly before the next decryption attempt.

Lastly some methods in BC have been modernized by adding generics to Iterators. Chances are if you are using BC 1.68, you might recognize some changes once you bump the dependency to BC 1.69 (once it is released of course).

Thank you!

I would like to thank anyone who contributed to the new release in any way or form for their support. Special thanks goes to my sponsor FlowCrypt for giving me the opportunity to spend so much time on the library! Furthermore I’d like to thank the all the amazing folks over at Sequoia-PGP for their efforts of improving the OpenPGP ecosystem and patiently helping me understand the (at times at bit muddy) OpenPGP specification.

Why Signature Verification in OpenPGP is hard

An Enigma cipher machine which is probably easier to understand than OpenPGP
An Enigma cipher machine which is less secure but also easier to understand than OpenPGP.
Photo by Mauro Sbicego on Unsplash.

When I first thought about signature verification in OpenPGP I thought “well, it cannot be that hard, right?”. In the end all you got to do is check if a signature was made by the given key and if that signature checks out (is cryptographically correct). Oh boy, was I wrong.

The first major realization that struck me was that there are more than just two factors involved in the signature verification process. While the first two are pretty obvious – the signature itself and the key that created it – another major factor is played by the point in time at which a signature is being verified. OpenPGP keys are changing over the course of their lifespan. Subkeys may expire or be revoked for different reasons. A subkey might be eligible to create valid signatures until its binding signature expires. From that point in time all new signatures created with that key must be considered invalid. Keys can be rebound, so an expired key might become valid again at some point, which would also make (new) signatures created with it valid once again.

But what does it mean to rebind a key? How are expiration dates set on keys and what role plays the reason of a revocation?

The answer to the first two questions is – Signatures!

OpenPGP keys consist of a set of keys and subkeys, user-id packets (which contain email addresses and names and so forth) and lastly a bunch of signatures which tie all this information together. The root of this bunch is the primary key – a key with the ability to create signatures, or rather certifications. Signatures and certifications are basically the same, they just have a different semantic meaning, which is quite an important detail. More to that later.

The main use of the primary key is to bind additional data to it. First and foremost user-id packets, which can (along with the primary keys key-ID) be used to identify the key. Alice might for example have a user-id packet on her key which contains her name and email address. Keys can have more than one user-id, so Alice might also have an additional user-id packet with her work-email or her chat address added to the key.

But simply adding the packet to the key is not enough. An attacker might simply take her key, change the email address and hand the modified key to Bob, right? Wrong. Signatures Certifications to the rescue!

   Primary-Key
      [Revocation Self Signature]
      [Direct Key Signature...]
      [User ID [Signature ...] ...]
      [User Attribute [Signature ...] ...]
      [[Subkey [Binding-Signature-Revocation]
              Subkey-Binding-Signature ...] ...]

Information is not just loosely appended to the key. Instead it is cryptographically bound to it by the help of a certification. Certifications are signatures which can only be created by a key which is allowed to create certifications. If you take a look at any OpenPGP (v4) key, you will see that most likely every single primary key will be able to create certifications. So basically the primary key is used to certify that a piece of information belongs to the key.

The same goes for subkeys. They are also bound to the primary key with the help of a certification. Here, the certification has a special type and is called “subkey binding signature”. The concept though is mostly the same. The primary key certifies that a subkey belongs to it by help of a signature.

Now it slowly becomes complicated. As you can see, up to this point the binding relations have been uni-directional. The primary key claims to be the dominant part of the relation. This might however introduce the risk of an attacker using a primary key to claim ownership of a subkey which was used to make a signature over some data. It would then appear as if the attacker is also the owner of that signature. That’s the reason why a signing-capable subkey must somehow prove that it belongs to its primary key. Again, signatures to the rescue! A subkey binding signature that binds a signing capable subkey MUST contain a primary key binding signature made by the subkey over the primary key. Now the relationship is bidirectional and attacks such as the one mentioned above are mitigated.

So, about certifications – when is a key allowed to create certifications? How can we specify what capabilities a key has?

The answer are Signature… Subpackets!
Those are some pieces of information that are added into a signature that give it more semantic meaning aside from the signature type. Examples for signature subpackets are key flags, signature/key creation/expiration times, preferred algorithms and many more. Those subpackets can reside in two areas of the signature. The unhashed area is not covered by the signature itself, so here packets can be added/removed without breaking the signature. The hashed area on the other hand gets its name from the fact that subpackets placed here are taken into consideration when the signature is being calculated. They cannot be modified without invalidating the signature.

So the unhashed area shall only contain advisory information or subpackets which are “self-authenticating” (meaning information which is validated as a side-effect of validating the signature). An example of a self-authenticating subpacket would be the issuers-id packet, which contains the key-id of the key that created the signature. This piece of information can be verified by checking if the denominated key really created the signature. There is no need to cover this information by the signature itself.

Another really important subpacket type is the key flags packet. It contains a bit-mask that declares what purpose a key can be used for, or rather what purpose the key is ALLOWED to be used for. Such purposes are encryption of data at rest, encryption of data in transit, signing data, certifying data, authentication. Additionally there are key flags indicating that a key has been split by a key-splitting mechanism or that a key is being shared by more than one entity.

Each signature MUST contain a signature creation time subpacket, which states at which data and time a signature was created. Optionally a signature might contain a signature expiration time subpacket which denotes at which point in time a signature expires and becomes invalid. So far so good.

Now, those subpackets can also be placed on certifications, eg. subkey binding signatures. If a subkey binding signature contains a key expiration time subpacket, this indicates that the subkey expires at a certain point in time. An expired subkey must not be used anymore and signatures created by it after it has been expired must be considered invalid. It gets even more complicated if you consider, that a subkey binding signature might contain a key expiration time subpacket, along with a signature expiration time subpacket. That could lead to funny situations. For example a subkey might have two subkey binding signatures. One simply binds the key indefinitely, while the second one has an expiration time. Here the latest binding signature takes precedence, meaning the subkey might expire at lets say t+3, while at t+5 the signature itself expires, meaning that the key regains validity, as now the former binding signature is “active” again.

Not yet complicated enough? Consider this: Whether or not a key is eligible to create signatures is denoted by the key flags subpacket which again is placed in a signature. So when verifying a signature, you have to consult self-signatures on the signing key to see if it carries the sign-data key flag. Furthermore you have to validate that self-signature and check if it was created by a key carrying the certify-other key flag. Now again you have to check if that signature was created by a key carrying the certify-other key flag (given it is not the same (primary) key). Whew.

Lastly there are key revocations. If a key gets lost or stolen or is simply retired, it can be revoked. Now it depends on the revocation reason, what impact the revocation has on past and/or future signatures. If the key was revoked using a “soft” revocation reason (key has not been compromised), the revocation is mostly handled as if it were an expiration. Past signatures are still good, but the key must no longer be used anymore. If it however has a “hard” revocation reason (or no reason at all) this could mean that the key has been lost or compromised. This means that any signature (future and past) that was made by this key has now to be considered invalid, since an attacker might have forged it.

Now, a revocation can only be created by a certification capable key, so in order to check if a revocation is valid, we have to check if the revoking key is allowed to revoke this specific subkey. Permitted revocation keys are either the primary key, or an external key denoted in a revocation key subpacket on a self-signature. Can you see why this introduces complexity?

Revocation signatures have to be handled differently from other signatures, since if the primary key is revoked, is it eligible to create revocation signatures in the first place? What if an external revocation key has been revoked and is now used to revoke another key?

I believe the correct way to tackle signature validity is to first evaluate the key (primary and subkeys) at signature creation time. Evaluating the key at a given point in time tn means we reject all signatures made after tn (except hard revocations) as those are not yet valid. Furthermore we reject all signatures that are expired a tn as those are no longer valid. Furthermore we remove all signatures that are superseded by another more recent signature. We do this for all signatures on all keys in the “correct” order. What we are left with is a canonicalized key ring, which we can now use to verify the signature in question with.

So lets try to summarize every step that we have to take in order to verify a signatures validity.

  • First we have to check if the signature contains a creation time subpacket. If it does not, we can already reject it.
  • Next we check if the signature is expired by now. If it is, we can again reject.
  • Now we have to evaluate the key ring that contains the signatures signing key at the time at which the signature was created.
    • Is the signing key properly bound to the key ring?
      • Was is created before the signature?
      • Was it bound to the key ring before the signature was made?
      • Is the binding signature not expired?
      • Is the binding signature not revoked?
      • Is the subkey binding signature carrying a valid primary key binding signature?
      • Are the binding signatures using acceptable algorithms?
    • Is the subkey itself not expired?
    • Is the primary key not expired?
    • Is the primary key not revoked?
    • Is the subkey not revoked?
  • Is the signing key capable of creating signatures?
  • Was the signature created with acceptable algorithms? Reject weak algorithms like SHA-1.
  • Is the signature correct?

Lastly of course, the user has to decide if the signing key is trustworthy or not, but luckily we can leave this decision up to the user.

As you can see, this is not at all trivial and I’m sure I missed some steps and/or odd edge cases. What makes implementing this even harder is that the specification is deliberately sparse in places. What subpackets are allowed to be placed in the unhashed area? What MUST be placed in the hashed area instead? Furthermore the specification contains errors which make it even harder to get a good picture of what is allowed and what isn’t. I believe what OpenPGP needs is a document that acts as a guide to implementors. That guide needs to specify where and where not certain subpackets are to be expected, how a certain piece of semantic meaning can be represented and how signature verification is to be conducted. It is not desirable that each and every implementor has to digest the whole specification multiple times in order to understand what steps are necessary to verify a signature or to select a valid key for signature creation.

Did you implement signature verification in OpenPGP? What are your thoughts on this? Did you go through the same struggles that I do?

Lastly I want to give a shout-out to the devs of Sequoia-PGP, which have a pretty awesome test suite that covers lots and lots of edge-cases and interoperability concerns of implementations. I definitely recommend everyone who needs to work with OpenPGP to throw their implementation against the suite to see if there are any shortcomings and problems with it.

PGPainless 0.1.0 released

Image of a lock

After two years and a dozen alpha versions I am very glad to announce the first stable release of PGPainless! The release is available on maven central.

PGPainless aims to make using OpenPGP with Bouncycastle fun again by abstracting away most of the complexity and overhead that normally comes with it. At the same time PGPainless remains configurable by making heavy use of the builder pattern for almost everything.

Lets take a look at how to create a fresh OpenPGP key:

        PGPKeyRing keyRing = PGPainless.generateKeyRing()
                .simpleEcKeyRing("alice@wonderland.lit", "password123");

That is all it takes to generate an OpenPGP keypair that uses ECDH+ECDSA keys for encryption and signatures! You can of course also configure a more complex key pair with different algorithms and attributes:

        PGPainless.generateKeyRing()
                .withSubKey(KeySpec.getBuilder(RSA_ENCRYPT.withLength(RsaLength._4096))
                        .withKeyFlags(KeyFlag.ENCRYPT_COMMS, KeyFlag.ENCRYPT_STORAGE)
                        .withDefaultAlgorithms())
                .withSubKey(KeySpec.getBuilder(ECDH.fromCurve(EllipticCurve._P256))
                        .withKeyFlags(KeyFlag.ENCRYPT_COMMS, KeyFlag.ENCRYPT_STORAGE)
                        .withDefaultAlgorithms())
                .withSubKey(KeySpec.getBuilder(RSA_SIGN.withLength(RsaLength._4096))
                        .withKeyFlags(KeyFlag.SIGN_DATA)
                        .withDefaultAlgorithms())
                .withMasterKey(KeySpec.getBuilder(RSA_SIGN.withLength(RsaLength._8192))
                        .withKeyFlags(KeyFlag.CERTIFY_OTHER)
                        .withDetailedConfiguration()
                        .withPreferredSymmetricAlgorithms(SymmetricKeyAlgorithm.AES_256)
                        .withPreferredHashAlgorithms(HashAlgorithm.SHA512)
                        .withPreferredCompressionAlgorithms(CompressionAlgorithm.BZIP2)
                        .withFeature(Feature.MODIFICATION_DETECTION)
                        .done())
                .withPrimaryUserId("alice@wonderland.lit")
                .withPassphrase(new Passphrase("password123".toCharArray()))
                .build();

The API is designed in a way so that the user can very hardly make mistakes. Inputs are typed, so that as an example the user cannot input a wrong key length for an RSA key. The “shortcut” methods (eg. withDefaultAlgorithms()) uses sane, secure defaults.

Now that we have a key, lets encrypt some data!

        byte[] secretMessage = message.getBytes(UTF8);
        ByteArrayOutputStream envelope = new ByteArrayOutputStream();

        EncryptionStream encryptor = PGPainless.createEncryptor()
                .onOutputStream(envelope)
                .toRecipients(recipientPublicKey)
                .usingSecureAlgorithms()
                .signWith(keyDecryptor, senderSecretKey)
                .asciiArmor();

        Streams.pipeAll(new ByteArrayInputStream(secretMessage), encryptor);
        encryptor.close();
        byte[] encryptedSecretMessage = envelope.toByteArray();

As you can see there is almost no boilerplate code! At the same time, above code will create a stream that will encrypt and sign all the data that is passed through. In the end the envelope stream will contain an ASCII armored message that can only be decrypted by the intended recipients and that is signed using the senders secret key.

Decrypting data and/or verifying signatures works very similar:

        ByteArrayInputStream envelopeIn = new ByteArrayInputStream(encryptedSecretMessage);
        DecryptionStream decryptor = PGPainless.createDecryptor()
                .onInputStream(envelopeIn)
                .decryptWith(keyDecryptor, recipientSecretKey)
                .verifyWith(senderPublicKey)
                .ignoreMissingPublicKeys()
                .build();

        ByteArrayOutputStream decryptedSecretMessage = new ByteArrayOutputStream();

        Streams.pipeAll(decryptor, decryptedSecretMessage);
        decryptor.close();
        OpenPgpMetadata metadata = decryptor.getResult();

The decryptedSecretMessage stream now contains the decrypted message. The metadata object can be used to get information about the message, eg. which keys and algorithms were used to encrypt/sign the data and if those signatures were valid.

In summary, PGPainless is now able to create different types of keys, read encrypted and unencrypted keys, encrypt and/or sign data streams as well as decrypt and/or verify signatures. The latest additions to the API contain support for creating and verifying detached signatures.

PGPainless is already in use in Smacks OpenPGP module which implements XEP-0373: OpenPGP for XMPP and it has been designed primarily with the instant messaging use case in mind. So if you want to add OpenPGP support to your application, feel free to give PGPainless a try!

Install Jitsi-Meet alongside ejabberd

Image of an office conference room with empty chairs

Since the corona virus is forcing many of us into home office there is a high demand for video conference solutions. A popular free and open source tool for creating video conferences similar to Google’s hangouts is Jitsi Meet. It enables you to create a conference room from within your browser for which you can then share a link to your coworkers. No client software is needed at all (except mobile devices).

The installation of Jitsi Meet is super straight forward – if you have a dedicated server sitting around. Simply add the jitsi repository to your package manager and (in case of debian based systems) type

sudo apt-get install jitsi-meet

The installer will guide you through most of the process (setting up nginx / apache, installing dependencies, even do the letsencrypt setup) and in the end you can start video calling! The quick start guide does a better job explaining this than I do.

Jitsi Meet is a suite of different components that all play together (see Jitsi Meet manual). Part of the mix is a prosody XMPP server that is used for signalling. That means if you want to have the simple easy setup experience, your server must not already run another XMPP server. Otherwise you’ll have to do some manual configuration ahead of you.

I did that.

Since I already run a personal ejabberd XMPP server and don’t have any virtualization tools at hands, I wanted to make jitsi-meet use ejabberd instead of prosody. In the end both should be equally suited for the job.

Looking at the prosody configuration file that comes with Jitsi’s bundled prosody we can see that Jitsi Meet requires the XMPP server to serve two different virtual hosts.
The file is located under /etc/prosody/conf.d/meet.example.org.cfg.lua

VirtualHost "meet.example.org"
        authentication = "anonymous"
        ssl = {
                ...
        }
        modules_enabled = {
            "bosh";
            "pubsub";
            "ping";
        }
        c2s_require_encryption = false

Component "conference.meet.example.org" "muc"
    storage = "memory"
admins = { "focus@auth.meet.example.org" }

Component "jitsi-videobridge.meet.example.org"
    component_secret = "<VIDEOBRIDGE_COMPONENT_SECRET>"

VirtualHost "auth.meet.example.org"
    ssl = {
        ...
    }
    authentication = "internal_plain"

Component "focus.meet.example.org"
    component_secret = "<JICOFO_COMPONENT_SECRET>"

There are also some external components that need to be configured. This is where Jitsi Meet plugs into the XMPP server.

In my case I don’t want to server 3 virtual hosts with my ejabberd, so I decided to replace auth.meet.jabberhead.tk with my already existing main domain jabberhead.tk which already uses internal authentication. So all I had to do is to add the virtual host meet.jabberhead.tk to my ejabberd.yml.
The ejabberd config file is located under /etc/ejabberd/ejabberd.yml or /opt/ejabberd/conf/ejabberd.yml depending on your ejabberd distribution.

hosts:
    ## serves as main host, as well as auth.meet.jabberhead.tk for focus user
  - "jabberhead.tk"
    ## serves as anonymous authentication host for meet.jabberhead.tk
  - "meet.jabberhead.tk"

The syntax for external components is quite different for ejabberd than it is for prosody, so it took me some time to get it working.

listen:
  -
    port: 5280
    ip: "::"
    module: ejabberd_http
    request_handlers:
      "/http-bind": mod_bosh
    tls: true
    protocol_options: 'TLS_OPTIONS'
  -
    port: 5347
    module: ejabberd_service
    hosts:
      "focus.meet.jabberhead.tk":
        password: "<JICOFO_COMPONENT_SECRET>"

The configuration of the modules was a bit trickier on ejabberd, as the ejabberd config syntax seems to disallow duplicate entries. I ended up moving everything from the existing main modules: block into a separate host_config: for my existing domain. That way I could separate the configuration of my main domain from the config of the meet subdomain.

Update: I migrated my server (and this tutorial) to the new videobridge2 component. This means that we no longer need to configure the videobridge as an XMPP component in ejabberd’s configuration.

host_config:
  ## Already existing vhost.
  jabberhead.tk:
    s2s_access: s2s
    ## former main modules block, now further indented
    modules:
      mod_adhoc: {}
      mod_admin_extra: {}
      ...

  ## ADD THIS: New meeting host with anonymous authentication and no s2s
  meet.jabberhead.tk:
    ## Disable s2s to prevent spam
    s2s_access: none
    auth_method: anonymous
    allow_multiple_connections: true
    anonymous_protocol: both
    modules:
      mod_bosh: {}
      mod_disco: {}
      mod_muc:
        access: all
        access_create: local
        access_persistent: local
        access_admin: admin
      mod_muc_admin: {}
      mod_ping: {}
      mod_pubsub:
        access_createnode: local

As you can see I only enabled required modules for the meet.jabberhead.tk service and even disabled s2s to prevent the anonymous Jitsi Meet users from contacting users on other servers.

Last but not least we have to add the focus user as an admin and also generate (not discussed here) and add certificates for the meet.jabberhead.tk subdomain. This step is not necessary if the meet domain is already covered by the certificate in use.

certfiles:
  - ...
  - "/etc/ssl/meet.jabberhead.tk/cert.pem"
  - "/etc/ssl/meet.jabberhead.tk/fullchain.pem"
  - "/etc/ssl/meet.jabberhead.tk/privkey.pem"
...
acl:
  admin:
    user:
      - "focus@jabberhead.tk"
      ...

That’s it for the ejabberd configuration. Now we have to configure the other Jitsi Meet components. Lets start with jicofo, the Jitsi Conference Focus component.

My /etc/jitsi/jicofo/config file looks as follows.

JICOFO_HOST=localhost
JICOFO_HOSTNAME=meet.jabberhead.tk
JICOFO_SECRET=<JICOFO_COMPONENT_SECRET>
JICOFO_PORT=5347
JICOFO_AUTH_DOMAIN=jabberhead.tk
JICOFO_AUTH_USER=focus
JICOFO_AUTH_PASSWORD=<FOCUS_USER_SECRET>
JICOFO_OPTS=""
# Below can be left as is.
JAVA_SYS_PROPS=...

Respectively the videobridge configuration (/etc/jitsi/videobridge/config) looks like follows.

JVB_SECRET=<VIDEOBRIDGE_COMPONENT_SECRET>
## Leave below as originally was
JAVA_SYS_PROPS=...

Note that since the update to videobridge2, most config options have become obsolete and were replaced by options in the sip-communicator.properties file. So lets change /etc/jitsi/videobridge/sip-communicator.properties:

org.ice4j.ice.harvest.DISABLE_AWS_HARVESTER=true
org.ice4j.ice.harvest.STUN_MAPPING_HARVESTER_ADDRESSES=meet-jit-si-turnrelay.jitsi.net:443
org.jitsi.videobridge.ENABLE_STATISTICS=true
org.jitsi.videobridge.STATISTICS_TRANSPORT=muc
org.jitsi.videobridge.xmpp.user.shard.HOSTNAME=localhost
org.jitsi.videobridge.xmpp.user.shard.DOMAIN=jabberhead.tk
org.jitsi.videobridge.xmpp.user.shard.USERNAME=jvb
org.jitsi.videobridge.xmpp.user.shard.PASSWORD=<JVB_USER_SECRET>
org.jitsi.videobridge.xmpp.user.shard.MUC_JIDS=JvbBrewery@conference.meet.jabberhead.tk
org.jitsi.videobridge.xmpp.user.shard.MUC_NICKNAME=<some UUID, just leave the original value>

Now we can wire it all together by modifying the Jitsi Meet config file found under /etc/jitsi/meet/meet.example.org-config.js:

var config = {
    hosts: {
        domain: 'meet.jabberhead.tk',
        anonymousdomain: 'meet.jabberhead.tk',
        authdomain: 'jabberhead.tk',
        focus: 'focus.meet.jabberhead.tk',
        muc: 'conference.meet.jabberhead.tk'
    },
    bosh: '//meet.jabberhead.tk/http-bind', // NOTE: or /bosh, depending on your setup
    clientNode: 'http://jitsi.org/jitsimeet',
    focusUserJid: 'focus@jabberhead.tk',

    testing: {
    ...
    }
...
}

Finally of course, I also had to register the focus and jvb users as XMPP accounts:

ejabberdctl register focus jabberhead.tk <FOCUS_USER_SECRET>
ejabberdctl register jvb jabberhead.tk <JVB_USER_SECRET>

Remember to replace the placeholders with strong password and also stop and disable the bundled prosody! That’s it!

I hope this lowers the bar for some to deploy Jitsi Meet next to their already existing ejabberd. Lastly please do not ask me for support, as I barely managed to get this working for myself 😛

Update (11.04.2020)

With feedback from Holger I reworked my ejabberd config and disabled s2s for the meet vhost, see above.

Someone also pointed out that it may be a good idea to substitute prosody with a dummy package to save disk space and possible attack surface.

Update (16.06.2020)

I migrated this guide to use the new videobridge2 package. The changes in a gist:

  • Remove videobridge component configuration from ejabberd config
  • Register new jvb@yourdomain user for the videobridge
  • Update videobridge config and sip-communicator.properties files (see further up)

What I might look into next is how to use ejabberd’s bundled (since 20.04) STUN/TURN server instead of the default one…

Note: The Jitsi Meet components do some regular health checks. They do this by (as far as I understand) creating temporary MUC rooms every few seconds. Since ejabberd stores entity capabilities indefinitely, this may cause the caps_features table of ejabberd’s database to quickly fill up with lots of junk. Apparently this was fixed in later versions of ejabberd, however, if you still need a workaround, you may try deleting unused caps entries periodically.

Smack: Some more busy nights and 12 bytes of IV

Decorative image of an architecture blueprint

In the last months I stayed up late some nights, so I decided to add some additional features to Smack.

Among the additions is support for some new XEPs, namely:

I also started working on an implementation of XEP-0245: Message Moderation, but that one is not yet finished and needs more work.

Direct MUC invitations are a method to invite users to a group chat. Smack already had support for another similar mechanism, but this one is recommended by the XMPP Compliance Suites 2020.

Message Fastening is a generalized mechanism to add information to messages. That might be a reaction, eg. a thumbs up which is added to a previous message.

Message Retraction is used to retract previously sent messages. Internally it is based on Message Fastening.

The Stanza Content Encryption pull request only teaches Smack what SCE elements are, but it doesn’t yet teach it how to use them. That is partly due to no E2EE specification actually using them yet. That will hopefully change soon 😉

Anu brought up the fact that the OMEMO XEP is not totally clear on the length of initialization vectors used for message encryption. Historically most clients use 16 bytes length, while normally you would want to use 12. Apparently some AES-GCM libraries on iOS only support 12 bytes length, so using 12 bytes is definitely desirable. Most OMEMO implementations already support receiving 12 bytes as well as 16 bytes IV.

That’s why Smack will soon also start sending OMEMO messages with 12 bytes IV.

Re: The Ecosystem is Moving

Moxie Marlinspike, the creator of Signal gave a talk at 36C3 on Saturday titled “The ecosystem is moving”.

The Fahrplan description of that talk reads as follows:

Considerations for distributed and decentralized technologies from the perspective of a product that many would like to see decentralize.

Amongst an environment of enthusiasm for blockchain-based technologies, efforts to decentralize the internet, and tremendous investment in distributed systems, there has been relatively little product movement in this area from the mobile and consumer internet spaces.

This is an exploration of challenges for distributed technologies, as well as some considerations for what they do and don’t provide, from the perspective of someone working on user-focused mobile communication. This also includes a look at how Signal addresses some of the same problems that decentralized and distributed technologies hope to solve.

https://fahrplan.events.ccc.de/congress/2019/Fahrplan/events/11086.html

Basically the talk is a reiteration of some arguments from a blog post with the same title he posted back in 2016.

In his presentation, Marlinspike basically states that federated systems have the issue of being frozen in time while centralized systems are flexible and easy to change.

As an example, Marlinspike names HTTP/1.1, which was released in 1999 and on which we are stuck on ever since. While it is true that a huge part of the internet is currently running on HTTP 1.0 and 1.1, one has to consider that its successor HTTP/2.0 was only released in 2015. 4 / 5 years are not a long time to update the entirety of the internet, especially if you consider the fact that the big browser vendors announced to only make their browsers work with HTTP/2.0 sites when they are TLS encrypted.

Marlinspike then goes on listing 4 expectations that advocates of federated systems have, namely privacy, censorship resistance, availability and control. This is pretty accurate and matches my personal expectations pretty well. He then argues, that Signal as a centralized application can fulfill those expectations as well, if not better than a decentralized system.

Privacy

Privacy is often expected to be provided by the means of data ownership, says Marlinspike. As an example he mentions email. He argues that even though he is self-hosting his emails, “each and every mail has GMail at the other end”.

I agree with this observation and think that this is a real problem. But the answer to this problem would logically be that we need to increase our efforts to change that by reducing the number of GMail accounts and increasing the number of self-hosted email servers, right? This is not really an argument for centralization, where each and every message is guaranteed to have the same service at the other end.

I also agree with his opinion that a more effective tool to gain privacy is good encryption. He obviously brings the point that email encryption is unusable, (hinting to PGP probably), totally ignoring modern approaches to email encryption like autocrypt.

Censorship resistance

Federated systems are censorship resistant. At least that is the expectation that advocates of federated systems have. Every time a server gets blocked, the user just simply switches to another server. The issue that Marlinspike points out is, that every time this happens, the user loses his entire social graph. While this is an issue, there are solutions to this problem, one being nomadic identities. If some server goes down the user simply migrates to another server, taking his contacts with him. Hubzilla does this for example. There are also import/export features present in most services nowadays thanks to the GDPR. XMPP offers such a solution using XEP-0277.

But lets take a look at how Signal circumvents censorship according to Marlinspike. He proudly presents Domain Fronting as the solution. With domain fronting, the client connects to some big service which is costly to block for a censor and uses that as a proxy to connect to the actual server. While this appears to be a very elegant solution, Marlinspike conceals the fact that Google and Amazon pretty quickly intervened and stopped Signal from using their domains.

With Google Cloud and AWS out of the picture, it seems that domain fronting as a censorship circumvention technique is now largely non-viable in the countries where Signal had enabled this feature.

https://signal.org/blog/looking-back-on-the-front/

Notice that above quote was posted by Marlinspike himself more than one and a half years ago. Why exactly he brings this as an argument remains a mystery to me.

Update: Apparently Signal still successfully uses Domain Fronting, just with content delivery networks other than Google and Amazon.

And even if domain fronting was an effective way to circumvent censorship, it could also be applied to federated servers as well, adding an additional layer of protection instead of solely relying on it.

But what if the censor is not a foreign nation, but instead the nation where your servers are located? What if the US decides to shutdown signal.org for some reason? No amount of domain fronting can protect you from police raiding your server center. Police confiscating each and every server of a federated system (or even a considerable fraction of it) on the other hand is unlikely.

Availability

This brings us nicely to the next point on the agenda, availability.

If you have a centralized service than you want to move that centralized service into two different data centers. And the way you did that was by splitting the data up between those data centers and you just halved your availability, because the mean time between failures goes up since you have two different data centers which means that it is more likely to have an outage in one of those data centers in any given moment.

Moxie Marlinspike in his 36c3 talk “The Ecosystem is Moving”

For some reason Marlinspike confuses a decentralized system with a centralized, but distributed system. It even reads “Centralized Service” on his slides… Decentralization does not equal distribution.

A federated system would obviously not be fault free, as servers naturally tend to go down, but an outage only causes a small fraction of the network to collapse, contrary to a total outage of centralized systems. There even are techniques to minimize the loss of functionality further, for example distributed chat rooms in the matrix protocol.

Control

The advocates argument of control says that if a service provider behaves undesirably, you simply switch to another service provider. Marlinspike rightfully asks the question how it then can be that many people still use Yahoo as their mail provider. Indeed that is a good question. I guess the only answer I can come up with is that most people probably don’t care enough about their email to make the switch. To be honest, email is kind of boring anyways 😉

XMPP

Next Marlinspike talks about XMPP. He (rightfully) notes that due to XMPPs extensibility there is a morass of XEPs and that those don’t really feel consistent.

The XMPP community already recognized the problem that comes with having that many XEPs and tries to solve this issue by introducing so called compliance suites. These are annually published documents that contain a list of XEPs that are considered vitally important for clients or servers. These suites act as maps that point a way through the XEP jungle.

Next Marlinspike states that the XMPP protocol still fails to be a suitable option for mobile devices. This statement is plain wrong and was already debunked in a blog post by Daniel Gultsch back in 2016. Gultsch develops an XMPP client for Android which is totally usable and generally has lower battery consumption than Signal has. Conversations implements all of the XEPs listed in the compliance suites to be required for mobile clients. This shows that implementing a decent mobile client for a federated system can be done and there is a recipe for it.

What Marlinspike could have pointed out instead is that the XMPP community struggles to come up with a decent iOS client. That would have been a fair argument, but spreading FUD about the XMPP protocol as a whole is unfair and dishonest.

Luckily the audience of the talk didn’t fully buy into Marlinspikes weaker arguments as demonstrated by some entertaining questions during the QA afterwards.

What Marlinspike is right about though is that developing a federated system is harder than doing a centralized service. You as the developer have control over the whole system and subsequently over the users. However this is actually the reason why we, the community of decentralized systems and federated protocols do what we do. In the words of J.F. Kennedy, we do these things…

…not because they are easy, but because they are hard…

… or simply because they are right.

On “Clean Architecture”

Some skyscrapers photographed from the bottom to the top.

I recently did what I rarely do: buy and read an educational book. Shocking, I know, but I can assure you that I’m fine 😉

The book I ordered is Clean Architecture – A Craftsman’s Guide to Software Structure and Design by Robert C. Martin. As the title suggests it is about software architecture.

I’ve barely read half of the book, but I’ve already learned a ton! I find it curious that as a halfway decent programmer, I often more or less know what Martin is talking about when he is describing a certain architectural pattern, but I often didn’t know said pattern had a name, or what consequences using said pattern really implied. It is really refreshing to see the bigger picture and having all the pros and cons of certain design decisions layed out in an overview.

One important point the book is trying to put across is how important it is to distinguish between important things like business rule, and not so important things as details. Let me try to give you an example.

Lets say I want to build a reactive Android XMPP chat application using Smack (foreshadowing? 😉 ). Lets identify the details. Surely Smack is a detail. Even though I’d be using Smack for some of the core functionalities of the app, I could just as well chose another XMPP library like babbler to get the job done. But there are even more details, Android for example.

In fact, when you strip out all the details, you are left with reactive chat application. Even XMPP is a detail! A chat application doesn’t care, what protocol you use to send and receive messages, heck it doesn’t even care if it is run on Android or on any other device (that can run java).

I’m still not quite sure, if the keyword reactive is a detail, as I’d say it is a more a programming paradigm. Details are things that can easily be switched out and/or extended and I don’t think you can easily replace a programming paradigm.

The book does a great job of identifying and describing simple rules that, when applied to a project lead to a cleaner, more structured architecture. All in all it teaches how important software architecture in general is.

There is however one drawback with the book. It constantly wants to make you want to jump straight into your next big project with lots of features, so it is hard to keep reading while all that excited ;P

If you are a software developer – no matter whether you work on small hobby projects or big enterprise products, whether or not you pursue to become a Software Architect – I can only recommend reading this book!

One more thought: If you want to support a free software project, maybe donating books like this is a way to contribute?

Happy Hacking!

Closer Look at the Double Ratchet

In the last blog post, I took a closer look at how the Extended Triple Diffie-Hellman Key Exchange (X3DH) is used in OMEMO and which role PreKeys are playing. This post is about the other big algorithm that makes up OMEMO. The Double Ratchet.

The Double Ratchet algorithm can be seen as the gearbox of the OMEMO machine. In order to understand the Double Ratchet, we will first have to understand what a ratchet is.

Before we start: This post makes no guarantees to be 100% correct. It is only meant to explain the inner workings of the Double Ratchet algorithm in a (hopefully) more or less understandable way. Many details are simplified or omitted for sake of simplicity. If you want to implement this algorithm, please read the Double Ratchet specification.

A ratchet tool can only turn in one direction, hence it is eponymous for the algorithm.
Image by Benedikt.Seidl [Public domain]

A ratchet is a tool used to drive nuts and bolts. The distinctive feature of a ratchet tool over an ordinary wrench is, that the part that grips the head of the bolt can only turn in one direction. It is not possible to turn it in the opposite direction as it is supposed to.

In OMEMO, ratchet functions are one-way functions that basically take input keys and derives a new keys from that. Doing it in this direction is easy (like turning the ratchet tool in the right direction), but it is impossible to reverse the process and calculate the original key from the derived key (analogue to turning the ratchet in the opposite direction).

Symmetric Key Ratchet

One type of ratchet is the symmetric key ratchet (abbrev. sk ratchet). It takes a key and some input data and produces a new key, as well as some output data. The new key is derived from the old key by using a so called Key Derivation Function. Repeating the process multiple times creates a Key Derivation Function Chain (KDF-Chain). The fact that it is impossible to reverse a key derivation is what gives the OMEMO protocol the property of Forward Secrecy.

A Key Derivation Function Chain or Symmetric Ratchet

The above image illustrates the process of using a KDF-Chain to generate output keys from input data. In every step, the KDF-Chain takes the input and the current KDF-Key to generate the output key. Then it derives a new KDF-Key from the old one, replacing it in the process.

To summarize once again: Every time the KDF-Chain is used to generate an output key from some input, its KDF-Key is replaced, so if the input is the same in two steps, the output will still be different due to the changed KDF-Key.

One issue of this ratchet is, that it does not provide future secrecy. That means once an attacker gets access to one of the KDF-Keys of the chain, they can use that key to derive all following keys in the chain from that point on. They basically just have to turn the ratchet forwards.

Diffie-Hellman Ratchet

The second type of ratchet that we have to take a look at is the Diffie-Hellman Ratchet. This ratchet is basically a repeated Diffie-Hellman Key Exchange with changing key pairs. Every user has a separate DH ratcheting key pair, which is being replaced with new keys under certain conditions. Whenever one of the parties sends a message, they include the public part of their current DH ratcheting key pair in the message. Once the recipient receives the message, they extract that public key and do a handshake with it using their private ratcheting key. The resulting shared secret is used to reset their receiving chain (more on that later).

Once the recipient creates a response message, they create a new random ratchet key and do another handshake with their new private key and the senders public key. The result is used to reset the sending chain (again, more on that later).

Principle of the Diffie-Hellman Ratchet.
Image by OpenWhisperSystems (modified by author)

As a result, the DH ratchet is forwarded every time the direction of the message flow changes. The resulting keys are used to reset the sending-/receiving chains. This introduces future secrecy in the protocol.

The Diffie-Hellman Ratchet

Chains

A session between two devices has three chains – a root chain, a sending chain and a receiving chain.

The root chain is a KDF chain which is initialized with the shared secret which was established using the X3DH handshake. Both devices involved in the session have the same root chain. Contrary to the sending and receiving chains, the root chain is only initialized/reset once at the beginning of the session.

The sending chain of the session on device A equals the receiving chain on device B. On the other hand, the receiving chain on device A equals the sending chain on device B. The sending chain is used to generate message keys which are used to encrypt messages. The receiving chain on the other hand generates keys which can decrypt incoming messages.

Whenever the direction of the message flow changes, the sending and receiving chains are reset, meaning their keys are replaced with new keys generated by the root chain.

The full Double Ratchet Algorithms Ratchet Architecture

An Example

I think this rather complex protocol is best explained by an example message flow which demonstrates what actually happens during message sending / receiving etc.

In our example, Obi-Wan and Grievous have a conversation. Obi-Wan starts by establishing a session with Grievous and sends his initial message. Grievous responds by sending two messages back. Unfortunately the first of his replies goes missing.

Session Creation

In order to establish a session with Grievous, Obi-Wan has to first fetch one of Grievous key bundles. He uses this to establish a shared secret S between him and Grievous by executing a X3DH key exchange. More details on this can be found in my previous post. He also extracts Grievous signed PreKey ratcheting public key. S is used to initialize the root chain.

Obi-Wan now uses Grievous public ratchet key and does a handshake with his own ratchet private key to generate another shared secret which is pumped into the root chain. The output is used to initialize the sending chain and the KDF-Key of the root chain is replaced.

Now Obi-Wan established a session with Grievous without even sending a message. Nice!

The session initiator prepares the sending chain.
The initial root key comes from the result of the X3DH handshake.
Original image by OpenWhisperSystems (modified by author)

Initial Message

Now the session is established on Obi-Wans side and he can start composing a message. He decides to send a classy “Hello there!” as a greeting. He uses his sending chain to generate a message key which is used to encrypt the message.

Principle of generating message keys from the a KDF-Chain.
In our example only one message key is derived though.
Image by OpenWhisperSystems

Note: In the above image a constant is used as input for the KDF-Chain. This constant is defined by the protocol and isn’t important to understand whats going on.

Now Obi-Wan sends over the encrypted message along with his ratcheting public key and some information on what PreKey he used, the current sending key chain index (1), etc.

When Grievous receives Obi-Wan’s message, he completes his X3DH handshake with Obi-Wan in order to calculate the same exact shared secret S as Obi-Wan did earlier. He also uses S to initialize his root chain.

Now Grevious does a full ratchet step of the Diffie-Hellman Ratchet: He uses his private and Obi-Wans public ratchet key to do a handshake and initialize his receiving chain with the result. Note: The result of the handshake is the same exact value that Obi-Wan earlier calculated when he initialized his sending chain. Fantastic, isn’t it? Next he deletes his old ratchet key pair and generates a fresh one. Using the fresh private key, he does another handshake with Obi-Wans public key and uses the result to initialize his sending chain. This completes the full DH ratchet step.

Full Diffie-Hellman Ratchet Step
Image by OpenWhisperSystems

Decrypting the Message

Now that Grievous has finalized his side of the session, he can go ahead and decrypt Obi-Wans message. Since the message contains the sending chain index 1, Grievous knows, that he has to use the first message key generated from his receiving chain to decrypt the message. Because his receiving chain equals Obi-Wans sending chain, it will generate the exact same keys, so Grievous can use the first key to successfully decrypt Obi-Wans message.

Sending a Reply

Grievous is surprised by bold actions of Obi-Wan and promptly goes ahead to send two replies.

He advances his freshly initialized sending chain to generate a fresh message key (with index 1). He uses the key to encrypt his first message “General Kenobi!” and sends it over to Obi-Wan. He includes his public ratchet key in the message.

Unfortunately though the message goes missing and is never received.

He then forwards his sending chain a second time to generate another message key (index 2). Using that key he encrypt the message “You are a bold one.” and sends it to Obi-Wan. This message contains the same public ratchet key as the first one, but has the sending chain index 2. This time the message is received.

Receiving the Reply

Once Obi-Wan receives the second message and does a full ratchet step in order to complete his session with Grevious. First he does a DH handshake between his private and the Grevouos’ public ratcheting key he got from the message. The result is used to setup his receiving chain. He then generates a new ratchet key pair and does a second handshake. The result is used to reset his sending chain.

Obi-Wan notices that the sending chain index of the received message is 2 instead of 1, so he knows that one message must have been missing or delayed. To deal with this problem, he advances his receiving chain twice (meaning he generates two message keys from the receiving chain) and caches the first key. If later the missing message arrives, the cached key can be used to successfully decrypt the message. For now only one message arrived though. Obi-Wan uses the generated message key to successfully decrypt the message.

Conclusions

What have we learned from this example?

Firstly, we can see that the protocol guarantees forward secrecy. The KDF-Chains used in the three chains can only be advanced forwards, and it is impossible to turn them backwards to generate earlier keys. This means that if an attacker manages to get access to the state of the receiving chain, they can not decrypt messages sent prior to the moment of attack.

But what about future messages? Since the Diffie-Hellman ratchet introduces new randomness in every step (new random keys are generated), an attacker is locked out after one step of the DH ratchet. Since the DH ratchet is used to reset the symmetric ratchets of the sending and receiving chain, the window of the compromise is limited by the next DH ratchet step (meaning once the other party replies, the attacker is locked out again).

On top of this, the double ratchet algorithm can deal with missing or out-of-order messages, as keys generated from the receiving chain can be cached for later use. If at some point Obi-Wan receives the missing message, he can simply use the cached key to decrypt its contents.

This self-healing property was eponymous to the Axolotl protocol (an earlier name of the Signal protocol, the basis of OMEMO).

Acknowledgements

Thanks to syndace and paul for their feedback and clarification on some points.

Shaking Hands With OMEMO: X3DH Key Exchange

This is the first part of a small series about the cryptographic building blocks of OMEMO. This post is about the Extended Triple Diffie Hellman Key Exchange Algorithm (X3DH) which is used to establish a session between OMEMO devices.
Part 2: Closer Look at the Double Ratchet

In the past I have written some posts about OMEMO and its future and how it does compare to the Olm encryption protocol used by matrix.org. However, some readers requested a closer, but still straightforward look at how OMEMO and the underlying algorithms work. To get started, we first have to take a look at its past.

OMEMO was implemented in the Android Jabber Client Conversations as part of a Google Summer of Code project by Andreas Straub in 2015. The basic idea was to utilize the encryption library used by Signal (formerly TextSecure) for message encryption. So basically OMEMO borrows almost all the cryptographic mechanisms including the Double Ratchet and X3DH from Signals encryption protocol, which is appropriately named Signal Protocol. So to begin with, lets look at it first.

The Signal Protocol

The famous and ingenious protocol that drives the encryption behind Signal, OMEMO, matrix.org, WhatsApp and a lot more was created by Trevor Perrin and Moxie Marlinspike in 2013. Basically it consists of two parts that we need to further investigate:

  • The Extended Triple-Diffie-Hellman Key Exchange (X3DH)
  • The Double Ratchet Algorithm

One core principle of the protocol is to get rid of encryption keys as soon as possible. Almost every message is encrypted with another fresh key. This is a huge difference to other protocols like OpenPGP, where the user only has one key which can decrypt all messages ever sent to them. The later can of course also be seen as an advantage OpenPGP has over OMEMO, but it all depends on the situation the user is in and what they have to protect against.

A major improvement that the Signal Protocol introduced compared to encryption protocols like OTRv3 (Off-The-Record Messaging) was the ability to start a conversation with a chat partner in an asynchronous fashion, meaning that the other end didn’t have to be online in order to agree on a shared key. This was not possible with OTRv3, since both parties had to actively send messages in order to establish a session. This was okay back in the days where people would start their computer with the intention to chat with other users that were online at the same time, but it’s no longer suitable today.

Note: The recently worked on OTRv4 will not come with this handicap anymore.

The X3DH Key Exchange

Let’s get to it already!

X3DH is a key agreement protocol, meaning it is used when two parties establish a session in order to agree on a shared secret. For a conversation to be confidential we require, that only sender and (intended) recipient of a message are able to decrypt it. This is possible when they share a common secret (eg. a password or shared key). Exchanging this key with one another has long been kind of a hen and egg problem: How do you get the key from one end to the other without an adversary being able to get a copy of the key? Well, obviously by encrypting it, but how? How do you get that key to the other side? This problem has only been solved after the second world war.

The solution is a so called Diffie-Hellman-Merkle Key Exchange. I don’t want to go into too much detail about this, as there are really great resources about how it works available online, but the basic idea is that each party possesses an asymmetric key pair consisting of a public and a private key. The public key can be shared over insecure networks while the
private key must be kept secret. A Diffie-Hellman key exchange (DH) is the process of combining a public key A with a private key b in order to generate a shared secret. The essential trick is, that you get the same exact secret if you combine the secret key a with the public key B. Wikipedia does a great job at explaining this using an analogy of mixing colors.

Deniability and OTR

In normal day to day messaging you don’t always want to commit to what you said. Especially under oppressive regimes it may be a good idea to be able to deny that you said or wrote something specific. This principle is called deniability.

Note: It is debatable, whether cryptographic deniability ever saved someone from going to jail, but that’s not scope of this blog post.

At the same time you want to be absolutely sure that you are really talking to your chat partner and not to a so called man in the middle. These desires seem to be conflicting at first, but the OTR protocol featured both. The user has an IdentityKey, which is used to identify the user by means of a fingerprint. The (massively and horribly simplified) procedure of creating a OTR session is as follows: Alice generates a random session key and signs the public key with her IdentityKey. She then sends that public key over to Bob, who generates another random session key with which he executes his half of the DH handshake. He then sends the public part of that key (again, signed) back to Alice, who does another DH to acquire the same shared secret as Bob. As you can see, in order to establish a session, both parties had to be online. Note: The signing part has been oversimplified for sake of readability.

Normal Diffie-Hellman Key Exchange

From DH to X3DH

Perrin and Marlinspike improved upon this model by introducing the concept of PreKeys. Those basically are the first halves of a DH-handshake, which can – along with some other keys of the user – be uploaded to a server prior to the beginning of a conversation. This way another user can initiate a session by fetching one half-completed handshake and completing it.

Basically the Signal protocol comprises of the following set of keys per user:

IdentityKey (IK)Acts as the users identity by providing a stable fingerprint
Signed PreKey (SPK)Acts as a PreKey, but carries an additional signature of IK
Set of PreKeys ({OPK})Unsigned PreKeys

If Alice wants to start chatting, she can fetch Bobs IdentityKey, Signed PreKey and one of his PreKeys and use those to create a session. In order to preserve cryptographic properties, the handshake is modified like follows:

DH1 = DH(IK_A, SPK_B)
DH2 = DH(EK_A, IK_B)
DH3 = DH(EK_A, SPK_B)
DH4 = DH(EK_A, OPK_B)

S = KDF(DH1 || DH2 || DH3 || DH4)

EK_A denotes an ephemeral, random key which is generated by Alice on the fly. Alice can now derive an encryption key to encrypt her first message for Bob. She then sends that message (a so called PreKeyMessage) over to Bob, along with some additional information like her IdentityKey IK, the public part of the ephemeral key EK_A and the ID of the used PreKey OPK.

Visual representation of the X3DH handshake

Once Bob logs in, he can use this information to do the same calculations (just with swapped public and private keys) to calculate S from which he derives the encryption key. Now he can decrypt the message.

In order to prevent the session initiation from failing due to lost messages, all messages that Alice sends over to Bob without receiving a first message back are PreKeyMessages, so that Bob can complete the session, even if only one of the messages sent by Alice makes its way to Bob. The exact details on how OMEMO works after the X3DH key exchange will be discussed in part 2 of this series 🙂

X3DH Key Exchange TL;DR

X3DH utilizes PreKeys to allow session creation with offline users by doing 4 DH handshakes between different keys.

A subtle but important implementation difference between OMEMO and Signal is, that the Signal server is able to manage the PreKeys for the user. That way it can make sure, that every PreKey is only used once. OMEMO on the other hand solely relies on the XMPP servers PubSub component, which does not support such behavior. Instead, it hands out a bundle of around 100 PreKeys. This seems like a lot, but in reality the chances of a PreKey collision are pretty high (see the birthday problem).

OMEMO does come with some counter measures for problems and attacks that arise from this situation, but it makes the protocol a little less appealing than the original Signal protocol.

Clients should for example keep used PreKeys around until the end of catch -up of missed message to allow decryption of messages that got sent in sessions that have been established using the same PreKey.

A look at Matrix.org’s OLM | MEGOLM encryption protocol

Everyone who knows and uses XMPP is probably aware of a new player in the game. Matrix.org is often recommended as a young, arising alternative to the aging protocol behind the Jabber ecosystem. However the founders do not see their product as a direct competitor to XMPP as their approach to the problem of message exchanging is quite different.

An open network for secure, decentralized communication.

matrix.org

During his talk at the FOSDEM in Brussels, matrix.org founder Matthew Hodgson roughly compared the concept of matrix to how git works. Instead of passing single messages between devices and servers, matrix is all about synchronization of a shared state. A chat room can be seen as a repository, which is shared between all servers of the participants. As a consequence communication in a chat room can go on, even when the server on which the room was created goes down, as the room simultaneously exists on all the other servers. Once the failed server comes back online, it synchronizes its state with the others and retrieves missed messages.

Matrix in the French State

Olm, Megolm – What’s the deal?

Matrix introduced two different crypto protocols for end-to-end encryption. One is named Olm, which is used in one-to-one chats between two chat partners (this is not quite correct, see Updates for clarifying remarks). It can very well be compared to OMEMO, as it too is an adoption of the Signal Protocol by OpenWhisperSystems. However, due to some differences in the implementation Olm is not compatible with OMEMO although it shares the same cryptographic properties.

The other protocol goes by the name of Megolm and is used in group chats. Conceptually it deviates quite a bit from Olm and OMEMO, as it contains some modifications that make it more suitable for the multi-device use-case. However, those modifications alter its cryptographic properties.

Comparing Cryptographic Building Blocks

ProtocolOlmOMEMO (Signal)
IdentityKeyCurve25519X25519
FingerprintKey⁽¹⁾Ed25519none
PreKeysCurve25519X25519
SignedPreKeys⁽²⁾noneX25519
Key Exchange
Algorithm⁽³⁾
Triple Diffie-Hellman
(3DH)
Extended Triple
Diffie-Hellman (X3DH)
Ratcheting AlgoritmDouble RatchetDouble Ratchet
  1. Signal uses a Curve X25519 IdentityKey, which is capable of both encrypting, as well as creating signatures using the XEdDSA signature scheme. Therefore no separate FingerprintKey is needed. Instead the fingerprint is derived from the IdentityKey. This is mostly a cosmetic difference, as one less key pair is required.
  2. Olm does not distinguish between the concepts of signed and unsigned PreKeys like the Signal protocol does. Instead it only uses one type of PreKey. However, those may be signed with the FingerprintKey upon upload to the server.
  3. OMEMO includes the SignedPreKey, as well as an unsigned PreKey in the handshake, while Olm only uses one PreKey. As a consequence, if the senders Olm IdentityKey gets compromised at some point, the very first few messages that are sent could possibly be decrypted.

In the end Olm and OMEMO are pretty comparable, apart from some simplifications made in the Olm protocol. Those do only marginally affect its security though (as far as I can tell as a layman).

Megolm

The similarities between OMEMO and Matrix’ encryption solution end when it comes to group chat encryption.

OMEMO does not treat chats with more than two parties any other than one-to-one chats. The sender simply has to manage a lot more keys and the amount of required trust decisions grows by a factor roughly equal to the number of chat participants.

Yep, this is a mess but luckily XMPP isn’t a very popular chat protocol so there are no large encrypted group chats ;P

So how does Matrix solve the issue?

When a user joins a group chat, they generate a session for that chat. This session consists of an Ed25519 SigningKey and a single ratchet which gets initialized randomly.

The public part of the signing key and the state of the ratchet are then shared with each participant of the group chat. This is done via an encrypted channel (using Olm encryption). Note, that this session is also shared between the devices of the user. Contrary to Olm, where every device has its own Olm session, there is only one Megolm session per user per group chat.

Whenever the user sends a message, the encryption key is generated by forwarding the ratchet and deriving a symmetric encryption key for the message from the ratchets output. Signing is done using the SigningKey.

Recipients of the message can decrypt it by forwarding their copy of the senders ratchet the same way the sender did, in order to retrieve the same encryption key. The signature is verified using the public SigningKey of the sender.

There are some pros and cons to this approach, which I briefly want to address.

First of all, you may find that this protocol is way less elegant compared to Olm/Omemo/Signal. It poses some obvious limitations and security issues. Most importantly, if an attacker gets access to the ratchet state of a user, they could decrypt any message that is sent from that point in time on. As there is no new randomness introduced, as is the case in the other protocols, the attacker can gain access by simply forwarding the ratchet thereby generating any decryption keys they need. The protocol defends against this by requiring the user to generate a new random session whenever a new user joins/leaves the room and/or a certain number of messages has been sent, whereby the window of possibly compromised messages gets limited to a smaller number. Still, this is equivalent to having a single key that decrypts multiple messages at once.

The Megolm specification lists a number of other caveats.

On the pro side of things, trust management has been simplified as the user basically just has to decide whether or not to trust each group member instead of each participating device – reducing the complexity from a multiple of n down to just n. Also, since there is no new randomness being introduced during ratchet forwarding, messages can be decrypted multiple times. As an effect devices do not need to store the decrypted messages. Knowledge of the session state(s) is sufficient to retrieve the message contents over and over again.

By sharing older session states with own devices it is also possible to read older messages on new devices. This is a feature that many users are missing badly from OMEMO.

On the other hand, if you really need true future secrecy on a message-by-message base and you cannot risk that an attacker may get access to more than one message at a time, you are probably better off taking the bitter pill going through the fingerprint mess and stick to normal Olm/OMEMO (see Updates for remarks on this statement).

Note: End-to-end encryption does not really make sense in big, especially public chat rooms, since an attacker could just simply join the room in order to get access to ongoing communication. Thanks to Florian Schmaus for pointing that out.

I hope I could give a good overview of the different encryption mechanisms in XMPP and Matrix. Hopefully I did not make any errors, but if you find mistakes, please let me know, so I can correct them asap 🙂

Happy Hacking!

Sources

Updates:

Thanks for Matthew Hodgson for pointing out, that Olm/OMEMO is also effectively using a symmetric ratchet when multiple consecutive messages are sent without the receiving device sending an answer. This can lead to loss of future secrecy as discussed in the OMEMO protocol audit.

Also thanks to Hubert Chathi for noting, that Megolm is also used in one-to-one chats, as matrix doesn’t have the same distinction between group and single chats. He also pointed out, that the security level of Megolm (the criteria for regenerating the session) can be configured on a per-chat basis.