I Want Decentralized Version Control for Structured Data!
How do I synchronize application data between devices and people? That's the question I was unable to answer when thinking about ideas for new applications I could develop. I wanted the synchronization to be
- decentralized: All participants have a complete copy of the data set
- offline-first: You can work offline for as long as you want and then resync when you're online again
- reliable: Conflicts are handled properly, there is no random data loss1
- private: All data is end-to-end-encrypted, if desired
- efficient: Only changes to the data set are transmitted between participants
- collaborative: Multiple people can work on the same data set
Many applications nowadays go the easy SaaS-route. There is one central database behind a web service and every frontend only displays an instantaneous view of some part of the data set. This is kind of efficient and collaborative, but breaks all the other requirements. Other protocols like CalDAV/CardDAV seem very ad-hoc and unprincipled and basically just work properly if one is online all the time, avoiding any conflicts.
On the other hand there are great decentralized version control systems like Pijul/Git/Fossil and many more that check every requirement that I have, but they are built to work with textual data and are therefore unsuited to be a database backend of a graphical application.
I basically want a DVCS that doesn't operate on text files, but on a proper data model like relational algebra or algebraic datatypes. I started searching but couldn't find anything like it, so I decided to build my own. What inspired me to do this and what made me feel like I could even accomplish this task was the DVCS Pijul. Pijul applies a new perspective on patches taken from category theory2. This results in a beautiful model of how history and conflicts are represented and handled. This insight is going to be the foundation of my explorations in this area.
The first thing I did was modelling such a system for relational algebra because that is a nice and simple way to model data. I was surprised how fast I came up with a data model for patches and relations based on Pijul's patch theory, but that is a story for another blog post ;)
- I'm looking at you, CalDAV 😠 ↩
- Relevant paper; this blog series does a great job at explaining how Pijul works on a practical level. ↩