Async Data Discovery and Interchange (ADDI), but first some background...
This is the first in a series of blog posts chronicling the creation of a new protocol for linking and exchanging data across the digital world. That protocol is called ADDI - Async Data Discovery and Interchange.
Despite many great advances in data interchange protocols our data is still stored in silos. They may be semantic silos, silos with good APIs, or silos in the cloud...but they're still silos. Well, that's if you go by what I mean by a silo. For me a silo is any system whose boundaries are not transparent to the end-user. Ok, they're silos..so what, they work now?
That's just it. They work now. There is an ever-increasing amount of data we have available. Also increasing is the amount of processing that organizations need to do in order to distill knowledge from that data. Also with '* As A Service' models we are becoming more connected in our data flows. Yet many of us are still doing the client-server request-response model.
Some enlightened souls are doing REST, more are doing 'RESTful', whatever that is (see a discussion of the Richardson maturity model of REST systems and why levels 1 and 2 aren't really REST, and Roy Fielding on why REST APIs must be hypertext driven for more). REST is not married to HTTP according to Roy Fielding's REST defining thesis. In practice though it is implemented that way by just about everybody. Well, maybe not me..I had to be different and do REST + SPARQL/SPARUL over XMPP (Yes, virginia, it IS more than a chat protocol), but that is the subject of a future blog post. HTTP is great if you are doing synchronous data exchanges.
People have invented all sorts of things to approximate async over HTTP (long-polling, HTTP streaming, Web-sockets), but none of them are really async. By that I mean all of them have two traits:
- The protocol is connection-oriented
- The protocol simulates async communication between two parties.
To me async data exchange requires at least the following traits:
- The protocol is not connection-oriented, or at least does not require it (i.e. the UDP test - can you implement the protocol in UDP without going through hoops to re-invent TCP over UDP?).
- The protocol can communicate with N parties, whatever the value of N (implementation limitations not being considered here -i.e., no I have NOT invented a way to communicate with 1 gazillion nodes simultaneously).
- Uses exclusively an architectural style derived from the EBI or C2 (mostly as described by fielding), and ideally also derived from the Actor model
Now the REST proponents (I'm one btw, IF the constraints warrant it) might be saying "Pfft..EBI is dead, Roy said so". Well, he makes a good case against EBI and has some good arguments, but things have changed since 2008. Our systems are more networked, languages using the Actor model are routinely handling millions of concurrent requests, and the benefits of graph based models have been shown. Also ADDI data exchanges would be governed by link contracts, with fine grained control on what is shared, what isn't, and event triggering rules.
The kind reader (i.e, you): Ok, so when are you going to explain what ADDI is?
I'm almost there. The protocol that I think will come closest is one that is still in development, called XRI Data Interchange (XDI). XDI is being developed by the XDI OASIS Technical Committee. XDI has a lot of benefits, but is still designed to be mostly request-response (though Phil Windley is adding some evented support). It also is heavily designed to align with both REST and HTTP, though other protocols will hopefully be possible, and it is not REST by the definition. Instead XDI attempts to be as close as it can to REST while still achieving its other goals. XDI has done some great work in creating a graph-based model and the access control mechanisms to secure it, which they call link contracts. I've been on the XDI TC since it founded 9 years ago in January, 2004. I've tried to support both async and sync design goals in XDI and have now realized it would be better to support async in a different protocol, and let XDI continue to support synchronous style interchange, which it will do well.
There's also OData. OData is very similar in some ways to XDI but it is tied to HTTP, does not have link contracts, does not have XRI addressing, and is not bidirectional. OData is also a Microsoft creation, and while I think Microsoft has been changing greatly for the better recently I still have to be honest...I think it's a concern.
Async Data Discovery and Interchange (ADDI), for real this time...
This brings us to ADDI. ADDI is intended to co-exist with XDI, once XDI comes out. Where XDI targets primarily the REST, HTTP style of systems (i.e., synchronous), ADDI targets systems designed to be asynchronous. ADDI will leverage several of the concepts developed by XDI, including link contracts, some of the graph model concepts, and some of the data dictionary. A separate spec from ADDI will describe how ADDI and XDI can exchange data, and how they can be used together. ADDI will not use all of the same design concepts as XDI, because those design concepts are geared towards XDI's design constraints, some of which differ from ADDI's.
The goals of ADDI
Updates : These goals are still evolving. As they evolve I will update them, and note the updates here.
- must not be connection-oriented
- must have an architectural lineage derived from EBI as well as from the Actor model (and document it like Roy Fielding did for REST)
- must have a reference implementation in Erlang
- must be capable of supporting millions of concurrent data interchanges
- must model interchange as as a bi-directional state exchange from between two graphs or sub-graphs
- must enable addressing with an XRI (an by extension URI) any data in its graph model
- must enable access control of every addressable piece of data
- must initially be developed and implemented by myself, then I will open it up for collaboration, possibly in a new OASIS TC
- must enable my vision of the datasea - the entire digital world interlinked securely and asynchronously into one vast ocean of data (think datapool++) that is made available to every living human - young or old, rich or broke, first-world or third.
