A bit over a year ago I started working on smack-omemo as part of my bachelor thesis. Looking back at the past year, I can say there could have hardly been a better topic for my thesis. Working with Smack brought me deep into the XMPP world, got me in contact with a lot of cool people and taught me a lot. Especially the past Google Summer of Code improved my skills substantially. During said event, I took a break from working on smack-omemo, while focussing on a Jingle implementation instead. After the 3 months were over, I dedicated my time to smack-omemo again and realized, that there were some points that needed improvements.
One major issue was, that my “OmemoStore” class, which is responsible for storing keys, sessions, etc. was not having access to the users data before the user logged in. The reason for that is, that my implementation allows multiple OMEMO instances to be running on the same connection. That requires the OmemoStore to store keys for multiple instances (devices), which I distinguished based on the Jid and deviceId of the user. The problem here is, that the Jid is unknown before the user logged in (they might use a burner jid for example, or use an authentication system with username and password which differ from the jid).
While this is an edgecase, it lead to issues. I implemented a workaround for that problem (using the username instead of BareJid in case the connection is not authenticated), which caused numerous problems.
I thought about replacing the Jid as an identifier with something else, but nothing was suitable, so I started a major rework of the implementation as a whole. One important aspect I wanted to preserve is that smack-omemo should still be somewhat usable even when the connection is not authenticated (ie. the user should still be able to scan qr codes and make trust decisions).
The result of my work (so far) is a diff of “+6,300 −5,361”, and a modified API (sorry to all those who already use smack-omemo :O). One major change is, that the OmemoStore no longer stores trust decisions. Instead those decisions are now made by the client itself, who must implement a OmemoTrustCallback. That way trust decisions can be made while the OmemoManager is offline. Everything else what remained in the OmemoStore is only needed when the connection is authenticated and messages are received.
Furthermore I got rid of the OmemoSession class. Session handling is done in libsignal already, so why would I want to have a session related class as well (especially since libsignal doesn’t give you any feedback about what happens with the session, so you have to keep sessions in sync manually)? I recommend everyone who wants to implement OMEMO themselves not to create a “OmemoSession” class and instead rely on libsignals session management.
OMEMO sessions are somewhat brittle. You can never know, whether a recipient received your message, or if it failed to decrypt for some reason. There is no signalling to provide feedback about the sessions state. Because of the fact that even message encryption can go wrong, the old API was very ugly. Originally I first checked, whether there are devices which still need a trust decision to be made and threw an exception if that was the case. Then I tried to build sessions for devices without session and threw an exception when session negotiation failed. Then I tried to encrypt the message for all recipients and threw an exception if something went wrong… Oh and the exception I threw when sessions could not be negotiated contained a list of all devices with intact sessions, so the user could retry to encrypt the message, only for all devices which had a session.
The new API is much cleaner. I still throw an exception when there are undecided devices, but otherwise I always return an OmemoMessage object. That object has a map of OmemoDevices for which message encryption failed, alongside the respective exceptions, so the client can check if and what went wrong.
Also sessions are now “completed” whenever a preKeyMessage arrives.
Prior to this change it could happen, that two senders chose the same PreKey from a bundle in order to create a session. That could cause on of both session to break which lead to message loss. Now whenever smack-omemo receives a preKeyMessage, it instantly responds with an empty message to make the session stable.
This was proposed by Philipp Hörist.
Other changes include a new OmemoStore implementation, the CachingOmemoStore, which can either wrap other OmemoStores to provide a caching layer, or can be used standalone as an ephemeral store for testing purposes.
Also the integration tests were improved and are much simpler and more readable now.
All in all the code got much cleaner now and I hope that at some point it will be audited to find all the bugs I oversaw 😀 (everyone who wants to take a look for themselves, the code can currently be found at Smacks Repository. I’m always thankful for any types of feedback)
I hope this changes will make it to Smack 4.2.3, even though here are still some things I have to do, but all in all I’m already pretty satisfied with how smack-omemo turned out so far.