Stream of Consciousness (SOC) #2018-05-17


There are three experience-related problems with fully peer-to-peer approaches (e.g., scuttlebutt, etc.):

  1. Discovery/availability: They rely on discovery mechanisms that are not as performant as centralised discovery. Availability is not guaranteed as nodes exist on sometimes-connected clients.

  2. Greedy replication: They’re either based on data types that require a complete merge/feed or attempt (in the background) to replicate as much of the full dataset on every node as possible. This leads to long (especially initial) sync times, and can lead to huge amounts of bandwidth usage (My Scuttlebutt client was transferring gigabytes of data within the first few hours.)

  3. Ignoring social aspects: e.g., Lack of ability to delete posts. While tombstone-based systems are easier to replicate, they do lead to social issues. When you share something on Scuttlebutt, for example, that’s it. You cannot undo.

A fourth issue (related to #3) with some systems (e.g., DAT, IPFS), is that their problem domain is more about protecting against censorship (a worthy and important goal) than ensuring individual sovereignty / protecting privacy.

How Indie Site will attempt to address these issues:

  1. Discovery/availabiltiy: Of course, we don’t want centralised discovery in a peer to peer system so a core tenet of Indie is that everyone should have their own highly available and easily findable node on the Internet. Let’s call it an Always On Node (AON). Within the context of the current web, you can think of it as a web site. And, indeed, it will fulfil the role of a personal web site. As much as possible, this node should not be privileged over other nodes in its capabilities beyond having the two traits mentioned above. Addressability via domain name makes it findable within the mainstream Internet and third-party hosting makes it highly available. I see this as a necessary bridge between the client/server (centralised) web and a fully peer-to-peer topology in the future.

  2. Lazy/partial replication: When you visit a web site, you do not download all of the data from that web site. The experience of the Web would be markedly worse if this was the case. Instead, you only download the content you’re interested in at that time. P2P systems should be the same. The difference being that every node is a web server. (This is possible to implement even between two web clients using WebRTC.) The always-on node (AON) would be the fallback source of truth and, by its highly available and always findable nature, will most likely – but not necessarily – contain the most information about the system at any one time (especially when we factor in server-to-server federation). We should embrace the fact that different nodes will most likely rarely if ever converge to contain a complete copy of all of the data but that any node can diverge on the same view of the data for whatever transaction it is currently involved in (e.g., viewing the latest posts or a private conversation).

  3. Focus on social aspects: We must be aware of and sensitive to the social aspects of the system we’re designing. Eugen is a shining example of this with his approach to Mastodon. When it comes to issues like “delete”, for example, while we cannot violate the laws of physics (what is made public cannot provably be made private again), we can be as sensitive as possible in how we design and expose such functionality.

My research over the last week into CRDTs (conflict-free/commutative data types) as well as Directed Acyclic Graph / Merkle Tree-based data types, etc., and some of the latest turn-key replicated data type solutions out there (like Automerge) as well as the more Web-centric/offline-first approaches to solve similar problems (e.g., Logux) lead me to believe that we need our own data types and protocols for Indie Site as they, more than anything else, will determine the “shape” of this platform and what it can do (and whether it will be fit for purpose both today as a federated personal web site system and, possibly, tomorrow as a bridge to an entirely p2p system).

Some initial quick thoughts on the core principles of our data type(s)/replication strategy, authorisation, etc:

  • All nodes do not have to have the same data (messages) but, given the same data, will converge on the same ordering (commute) and views that are rendered on the same subset of the data should match.

  • Views are never exchanged, only messages (the loading of a server-side rendered web view might be seen as an exception to this as the view is reduced from the data on the server-side node and communicated to the client but it is really, if anything, an experience-based optimisation on the first render before the view is cached by the service worker. After which, every further render doesn’t violate this principle.)

  • Conflicts cannot happen (we use conflict-free data types)

  • Support paging and streaming

  • Work offline

  • All messages are signed

  • Messages have partial order and refer to the hash of their parent (messages may only have a single parent). A special root node marks the start of each tree.

  • Messages contain their own list of authorised appenders (via their public keys)

  • Access control is managed via end-to-end encryption of the messages.

  • Messages are both URL-addressable (via the AON’s domain) and content addressable by their hash.

Some implementation thoughts:

  • The persistence/replication interface for the view layer should be entirely transparent and expose reactive model objects. For the web client, initial views should be server-side rendered. (See Nuxt/VueJS)

  • Messages received that reference parent messages that have not been received are kept in a pool (e.g, see the WOOT implementation). Owners may or may not be alerted to the presence of these messages depending on use case.

  • Domain-specific (non-replicated) objects should be accessible via API (e.g., posts for embedding in other sites). These collections should be considered rendered views of the original messages and should not otherwise be exchanged between nodes.

  • Regarding the conflict-free/commutative structure of the data type(s), there is much to be liked about the approaches of Logoot/LSEQ (see LSEQTree) and Casual Trees.

Note: some of these thoughts should probably be cleaned up and transferred to at some point.