Getting on the syncing ship

For a couple of years now we’ve been interested in trying this Firefox sync thing. It always sounded like a pretty nice idea, and being able to host your own storage server that holds all your data sounded very nice indeed. Unfortunately the same can not be said about the accounts server that is also needed to get sync working: while we can keep control of our data just fine, the metadata would always live on Mozilla servers in the Mozilla accounts system. This obviously would not do.

Fortunately Mozilla publishes their code for the accounts server complex in their fxa repository. It’s pretty easy to clone this repository to look at what the accounts server does (although it’s still not obvious, the accounts server being a massive pile of javascript), and the repository even contains a number of dockerfiles for self-hosters. Awesome! Right?

Yeah, not so much. Mozilla, as we’ve learned, does not have any resources to implement features for a self-hosted setup. While this may not sound bad at first, consider that fxa is built to handle every Firefox on the planet. It’s made to be very scalable, and to this end integrates pretty tightly with both Google and Amazon cloud services. Obviously neither of those are necessary for development of fxa because emulators for all those cloud services are available, but fxa very much assumes that if it is deployed “properly”, it’ll be deployed in a cloud.

This is obviously not great, and a weak point of the sole remaining self-hosting guide we know of. There have been other guides in the past, all using heaps of docker containers (and this one is no different), but all except this one have faded or gone away. And even using this guide has its problems: fxa just remains a huge thing, requiring at least 2GB of memory to run at all. It won’t even run “properly” by some definition of the word since using cloud emulators in production setups is not expected and only possible in test mode.

This obviously would not do.

if you don’t like the $X, make your own $X!

After having spent a couple of days trying to figure out how all this sync stuff goes together and reading through the (actually pretty good) documentation of the protocols we decided to just make our own. It’s just a REST API that manages very little data, the cryptography is fully specified, we don’t even have to implement most of the endpoints. It should be easy, right? Right?

Right.

insert rant about how mozilla violates their own specs a lot

Getting something to demonstrate that going down this badger den is feasible wasn’t all that hard. The javascript auth client from fxa is sufficiently disconnected from the rest of fxa that we could just use it with no modification, and the cryptography used is well specified. Creating accounts and having Firefox log in wasn’t hard to achieve. Pretty soon there was support for sending tabs between Firefox instances. It should’ve been easy to add the rest.

And then the fun started.

As it turns out we have an Android phone, and wanted to connect Fenix (the Android version of Firefox) to sync as well. But it couldn’t connect to sync. At all. We double- and triple-checked that the webchannel protocol was correctly implemented, and it was! Desktop Firefox worked just fine, but Fenix could not even query the current account status. Something was wrong.

After some questioning and searching we found out that what was wrong is that Fenix still has a bug that was allegedly fixed, but either the fix was incomplete from day one or it was broken again in the interim. It’s hard to tell and doesn’t much matter, because now we had to figure out how to interact with the login system at all when the only documented way to do so did not work.

Luckily that same bug report, and the bugfix pull request it linked to, provided a clue: the protocol can be implemented, since there’s an extension that handles the protocol and translates it into a different protocol that Fenix actually uses internally. And as it turns out it is possible to send and receive the necessary messages within that extension. Problem solved, right?

Not really, since both sending and receiving requires debug access to the protocol translator extension. This is of course achievable with adb and a desktop Firefox instance, but it makes for a very bad user experience to require two debuggers to log in to your sync account on a phone.

Naturally that is how it works to this day because the bug remains unfixed (and possibly ignored).

UPDATE (7.8.2022): the above is not a truthful representation of things. The bug has indeed been fixed with the pull request linked in the issue. The problem we had lies somewhere else entirely. Fenix does not seem to have an fxa-over-http mode like desktop Firefox has and also seems to reject fxa over HTTPS if the certificate is not permanently trusted. When run “properly”, with a certificate signed by a trusted authority, it still replies to every account management request with an error. After that error it replies with the expected response, which totally breaks an application that treats an error as fatal on Fenix because it is also fatal on Firefox. As it turns out all that’s necessary is to ignore the error, and things just work as expected.

so login works now!

Except it didn’t. Sending the login messages worked, but login itself still failed. Instead of logging in properly Fenix just crashed. Digging through the debug logs of the phone for a bit we found out that Fenix absolutely requires a certain field in a certain fxa API response to be set, even though this field is marked as optional in the API specification. If the field is not present, Fenix crashes with a null reference exception. So we just send some placeholder data with no actual information content to make Fenix not crash now.

pushing through

With logins now working (albeit very painfully) we could finally test tab sending from Firefox to Fenix. Which worked fine, but didn’t immediately pop up a notification on the phone in the way it did when sending to a desktop browser. Receiving the notification (and thus tab data) required opening the tab menu on the phone, which should not actually be necessary.

As it turns out, Fenix does not support push notifications for tab sending, at least when run from an ungoogled phone. Technically it should support this, but after more digging through the logs we found out that it simply doesn’t register itself for push notifications because it’s missing some google cloud API config, presumably because that will only be available with an active google account on the phone. No idea, at that point we just stopped asking and accepted that push notifications will not work. (Which naturally also includes all other push notifications fxa sends, such as “display name has changed” or “your device has been removed from the account”.)

at least it’s not an NFT

After having figured out how do stuff that works fully within the database of the account server the natural next step is to attach the storage server and see what happens there. And, naturally, nothing worked. And, naturally, it was because Mozilla does not implement exactly what they specified.

One of the great features about Firefox sync in particular is that Firefox encrypts all encryptable data before it leaves your computer. All data put into the sync storage server is encrypted with a key only your browsers know, command payloads like which url the tab you just sent to another device has is encrypted with a key only the target browser knows, it’s actually quite good. There’s even a specification on how to derive key material for services, and one would expect that the specified procedure would also be used for sync itself. Right?

… Right.

As it turns out, the procedure for deriving sync keys is very different from the procedure for deriving all other keys. But that, too, is a problem that can be solved with sufficient amounts of special-casing.

… and other problems

Needless to say that it did not end there. Instead there are a surprisingly large number of other bugs and problems. Some because Mozilla does not follow their own specifications, some probably due to configurations Mozilla does not think are possible, some because they seem to not implement other specifications correctly either (relying on their own servers behaving as they expect instead). It’s all reasonable from the position of Mozilla, but it does not shine the best light on this part of the ecosystem.

it lives!

But after all this trouble we now have a small API server that implements just enough of fxa to make sync possible. It does not implement any payment processing for VPN services nobody asked for. It can’t integrate with third-party to-do list applications. It just syncs. And nothing more. If you’re interested and want to try self-hosting a complete sync stack you can check out the code.

But beware: this is unsupported software. We can’t provide support for this if something breaks, use it at your own risk. It may not even work for you if the one thing you need for your sync usage hasn’t made it into the code yet. It hasn’t been tested on anything older than Firefox 101 and almost certainly won’t render the UI correctly on anything older than 98. Emailed patches may or may not be accepted.

Best of luck!