2/13/2013

Announcing ADDI

Async Data Discovery and Interchange (ADDI), but first some background...

This is the first in a series of blog posts chronicling the creation of a new protocol for linking and exchanging data across the digital world. That protocol is called ADDI - Async Data Discovery and Interchange.

Despite many great advances in data interchange protocols our data is still stored in silos. They may be semantic silos, silos with good APIs, or silos in the cloud...but they're still silos. Well, that's if you go by what I mean by a silo. For me a silo is any system whose boundaries are not transparent to the end-user. Ok, they're silos..so what, they work now?

That's just it. They work now. There is an ever-increasing amount of data we have available. Also increasing is the amount of processing that organizations need to do in order to distill knowledge from that data. Also with '* As A Service' models we are becoming more connected in our data flows. Yet many of us are still doing the client-server request-response model.

Some enlightened souls are doing REST, more are doing 'RESTful', whatever that is (see a discussion of the Richardson maturity model of REST systems and why levels 1 and 2 aren't really REST, and Roy Fielding on why REST APIs must be hypertext driven for more). REST is not married to HTTP according to Roy Fielding's REST defining thesis. In practice though it is implemented that way by just about everybody. Well, maybe not me..I had to be different and do REST + SPARQL/SPARUL over XMPP (Yes, virginia, it IS more than a chat protocol), but that is the subject of a future blog post. HTTP is great if you are doing synchronous data exchanges.

People have invented all sorts of things to approximate async over HTTP (long-polling, HTTP streaming, Web-sockets), but none of them are really async. By that I mean all of them have two traits:

  1. The protocol is connection-oriented
  2. The protocol simulates async communication between two parties.

To me async data exchange requires at least the following traits:

  1. The protocol is not connection-oriented, or at least does not require it (i.e. the UDP test - can you implement the protocol in UDP without going through hoops to re-invent TCP over UDP?).
  2. The protocol can communicate with N parties, whatever the value of N (implementation limitations not being considered here -i.e., no I have NOT invented a way to communicate with 1 gazillion nodes simultaneously).
  3. Uses exclusively an architectural style derived from the EBI or C2 (mostly as described by fielding), and ideally also derived from the Actor model

Now the REST proponents (I'm one btw, IF the constraints warrant it) might be saying "Pfft..EBI is dead, Roy said so". Well, he makes a good case against EBI and has some good arguments, but things have changed since 2008. Our systems are more networked, languages using the Actor model are routinely handling millions of concurrent requests, and the benefits of graph based models have been shown. Also ADDI data exchanges would be governed by link contracts, with fine grained control on what is shared, what isn't, and event triggering rules.

The kind reader (i.e, you): Ok, so when are you going to explain what ADDI is?

I'm almost there. The protocol that I think will come closest is one that is still in development, called XRI Data Interchange (XDI). XDI is being developed by the XDI OASIS Technical Committee. XDI has a lot of benefits, but is still designed to be mostly request-response (though Phil Windley is adding some evented support). It also is heavily designed to align with both REST and HTTP, though other protocols will hopefully be possible, and it is not REST by the definition. Instead XDI attempts to be as close as it can to REST while still achieving its other goals. XDI has done some great work in creating a graph-based model and the access control mechanisms to secure it, which they call link contracts. I've been on the XDI TC since it founded 9 years ago in January, 2004. I've tried to support both async and sync design goals in XDI and have now realized it would be better to support async in a different protocol, and let XDI continue to support synchronous style interchange, which it will do well.

There's also OData. OData is very similar in some ways to XDI but it is tied to HTTP, does not have link contracts, does not have XRI addressing, and is not bidirectional. OData is also a Microsoft creation, and while I think Microsoft has been changing greatly for the better recently I still have to be honest...I think it's a concern.

Async Data Discovery and Interchange (ADDI), for real this time...

This brings us to ADDI. ADDI is intended to co-exist with XDI, once XDI comes out. Where XDI targets primarily the REST, HTTP style of systems (i.e., synchronous), ADDI targets systems designed to be asynchronous. ADDI will leverage several of the concepts developed by XDI, including link contracts, some of the graph model concepts, and some of the data dictionary. A separate spec from ADDI will describe how ADDI and XDI can exchange data, and how they can be used together. ADDI will not use all of the same design concepts as XDI, because those design concepts are geared towards XDI's design constraints, some of which differ from ADDI's.

The goals of ADDI

Updates : These goals are still evolving. As they evolve I will update them, and note the updates here.

  1. must not be connection-oriented
  2. must have an architectural lineage derived from EBI as well as from the Actor model (and document it like Roy Fielding did for REST)
  3. must have a reference implementation in Erlang
  4. must be capable of supporting millions of concurrent data interchanges
  5. must model interchange as as a bi-directional state exchange from between two graphs or sub-graphs
  6. must enable addressing with an XRI (an by extension URI) any data in its graph model
  7. must enable access control of every addressable piece of data
  8. must initially be developed and implemented by myself, then I will open it up for collaboration, possibly in a new OASIS TC
  9. must enable my vision of the datasea - the entire digital world interlinked securely and asynchronously into one vast ocean of data (think datapool++) that is made available to every living human - young or old, rich or broke, first-world or third.

1/30/2013

The Data-Centric Spectrum

The Question

I was recently was asked the question of ‘What is the relationship between data and services?’. The answer is bigger than it might seem at first.

The Data-Centricity Spectrum

Essentially there is a axis of data exchange patterns from completely service-centric to completely data-centric. What follows is an attempt to outline that spectrum. It does not capture every pattern, but does capture the more common ones. Each of these patterns represent a different POC on the answer to the above question.

Service-Centric

On one end you have completely service-centric, where the focus is on the service location, the service (and possibly method) being called, and the parameters of the call. Everything is in terms of a service call. This type of data exchange is often called Remote-Procedure Calls, or RPC. This can be a binary service call, such as CORBA, or a Web Service using the RPC-literal or RPC-encoded SOAP message formats. This could be summarized into “Calling a service’s method with a set of parameters”. The relationship here is that data is either what the service operates on behind the scenes, a parameter to a service method, or the result of invoking a service method. Data is not addressable except as a URL to external data.

Envelope, Method, and Document

Along this axis a bit more towards the data-centric are envelope-based data exchange patterns, where the data exchanged is split into two parts. The first part is an envelope that specifies the service and method. The second part is a document in some format that represents the data to be passed to the service’s method. The key difference here is that the data exchange is no longer viewed as calling the service with parameters, but with a document model. An example is a Web Service using the Document Literal SOAP message format. This could be summarized as “Calling a service’s method with a model”. The relationship here is that data is either what the service operates on behind the scenes, the model input into a service method, or the model output by a service method invocation. Data is still not addressable except as a URL to external data.

Document Based Services

Further along the axis towards data-centric are services using the Document-Literal Bare SOAP message format. These services no longer really use the service and method model, everything is simply services. This can be summarized as “Passing a model to a service in order to perform an operation and getting the resulting model”. The relationship here is that data is either what the service operates on behind the scenes, the model input to the service, or the model output by the service. Data is still not addressable except as a URL to external data.

REST

Even further along towards data-centricity is Representational State Transfer (REST), created by Roy Fielding. REST is not just about moving toward a data-centric view, but also involves (a) viewing a system as a set of collections of addressable resource models; (b) interacting with those models using a common mapping of HTTP methods to the SCRUD operations (Search, Create, Read, Update, Delete); (c) using HTTP URLs to change system state. This can be summarized as “Changing state by operating on state using synchronous HTTP methods on resources”. The relationship here is that there is no difference between services and resources, i.e. that all services are accessed via one of the SCRUD operations on a resource (or resource collection), so data is either a representation of current state, representation of a new state, or a representation of a state change. Data is addressable now, at the document level, through resource URLs. It’s worth noting that there are many interpretations of what REST is, and people have tried translating it to non-HTTP message exchange transports. Also, REST does not support the event-driven style of services, where service execution is modeled as triggering events, side effects, and fired events.

Data-Centric (aka Data-First)

Furthest along the service-centric/data-centric axis are new specifications, called data-first, being developed such as OpenData, Extensible Data Interchange, and others. The key features of these are that every piece of data is addressable, at the document level and within the document, and that the event-driven style of services are supported. This can be summarized as “Publishing data and services operating on that data when it matches a filter of interest, possibly then publishing new data”. The relationship here is that data containers listen to events, which can address any piece of data, and operate on that data according to the event data. Data is addressable at every level.

So Which Should You Use?

Different points along the axis provide different benefits.

Any generalization won't hold for all cases, in other words you will always need to decide for yourself in light of what you need and when you need it. I'll still support some generalizations. For example Service-centric tends to be easier to secure out of the box for your average developer, but data-centric tends to be more scalable and flexible. That said, either can be secured, either can scale - it is just a matter of how much work it will take and how complex it will be to accomplish.

My personal preference is for data-first. That is because I like constructing systems as pipelines of secure flowing data that together make a large graph (or web) of data, with addresses representing taps into the data web. This was one of the core concepts behind the founding of the OASIS XDI Technical Committee, and is one of the key principles behind OpenData.

1/23/2013

Planning for the next six months

I need to love it when a plan comes together

I need to get several things done over the next six months:

  • Finish the XDI spec, or an alternative of my own making
  • Finish my non-fiction book “Where’s the spec?”, which describes a new process for creating quality standards documents
  • Finish my fiction book “Wonderland Sprawl”
  • Finish my fiction story submission for Heroes in Hell
  • Finish the initial Communitivity product line: a mobile app, and two web apps. More details on these later.

Oh...and continue doing quality work at my day job.

That means I need to be spending more time outside of work focused on the above goals.

The plan

To do that I am starting on the following, albeit ambitious, plan:

  • 1 hr each night for non-fiction writing
  • 1 hr each night for fiction writing
  • 1 hr each night for Communitivity development
  • 1 hr each night for spec work
  • In addition, each night choosing a focus from the above and working an extra -1-2 hours on that

All of the above, except for development, I can do while sitting downstairs and periodically talking with my wife. For me development involves building up a mental model, almost like building a house of cards, and then realizing, as close as I can, that model in the code. For that I need to hyper-focus. It’s an ambitious plan, and means family time will mostly be on the weekends, but it is what I need to do to get things done.

Wishful thinking?

Also, I expect to be done with the spec work and the non-fiction writing well-short of the six months. Those two hours will likely then go toward Communitivity development.

10/07/2012

Top 10 Resources For Getting Started with Erlang

These are the top ten resources that helped me when I started learning Erlang.  I've not ordered them within the list because they complement each other. Your mileage may vary.  Other Erlangers opinions may differ. These helped me greatly though. I cannot stress enough that there is no substitute for making.  The below will help you much more if you start out trying to solve with Erlang some problem you know about.  Decompose it into very small pieces and take each piece one at a time.  For some more info on that see the post introducing the concept of Deliberate Practice.

1. Learn You Some Erlang

I started learning Erlang back in 2009 at about the same time Learn You Some Erlang chapters started being posted by Fred Trottier-Hebert.  This is the first place I'd recommend someone go when they are learning Erlang, especially if they have good prior software writing experience.  I found LYWSE easy to understand and packed with detail.  Sometimes it goes into too advanced details for a beginner, but you can skip these and come back to them when you are ready.

2. Joe Armstrong's Thesis

Reading this thesis is drinking direct from the source.  Joe Armstrong is one of the three fathers of Erlang.  The other two are Bjarne Da ̈cker and Robert Virding.  Reading A History of Erlang (PDF) by Joe Armstrong won't teach you Erlang, but it's a very interesting read.

3. The Erlang Web Site

Erlang.org has some good examplesthe OTP Design Principles User Guide, the Erlang Reference Manual, the documentation for the Erlang OTP libraries (Erlang's stdlib), and an online self-paced course.

4. The Erlang-Questions Mailing List

The people on this list are very helpful.  Chances are that your question has been asked before, so search the archives first.  As with any mailing list it can take time to get an answer, so I tend to just use the archive.  The FAQ, a link to the archives and instructions for subscribing are on their web site:
http://erlang.org/mailman/listinfo/erlang-questions.

5. Free E-Books and Other Web Resources

The following books are good to use for times when you don't have time to get into the coding zone, or when you want to deep dive into a particular topic.
I remember going through some others, but it's been a long time and I don't remember them.  If you read this and know of a good (and legal) e-book link then please post it in the comments and I'll add it below here, with credit.

6. Commercial Books and Podcasts

I bought some of the books on this list, borrowed others.  They are all recommended, but in this economy it's important to stress you don't need them to learn Erlang, but they will make your learning Erlang easier.

7. Online Q&A sites

If you have a question while learning Erlang the chances are someone else has had it as well.  It's also good when you don't have a specific question but you have some time to spend on learning.  When that happens go to one of these Q&A sites and search for unanswered Erlang questions, pick one, and then research it until you have an answer.   Once you have an answer you can go back to the question and post your answer if there isn't one yet.


Stackoverflow is probably the best known Q&A site.  It is a great source for detailed Erlang information in their Erlang questions. They also have a decent number of unanswered Erlang questions at any one time.

Another less well-known site is Quora (requires login via Twitter or Facebook).  They are getting more popular and are more focused on social connections than score, whereas StackOverflow is focused more on the score.  They a tag so you can find Erlang questions, but I've not found a link for unanswered questions, only for open Erlang questions.

8. Twitter

Twitter is always a great source for information, and for a dialogue with people that may be able to answer your questions.  Feel free to follow me and tweet me if you have a problem.  I also try to retweet anything on Erlang that I find interesting.  My Twitter id is BillBarnhill

9. Source Code

The best way to learn a language is by making useful software using that language.  The second best way is to read good code, trace how it runs, and re-read until you understand what it does.  Often this will lead into writing code to get the software to scratch a particular itch you have.  The best source code to read in my opinion is the OTP sources, for the sole reason that they will be what you interact with the most.  The second best is the Github account of Basho and the repos in it, because these folks know their stuff.

10. Erlang Projects I Recommend

This one isn't a resource as much as some recommendations from me on projects to learn about byt looking at the materials the developers publish, the source code, and building.

Web servers and frameworks

For web serving I recommend Cowboy.  I started out on Yaws, then switched to Misultin for most things, Mochiweb for some others.  When Misultin went away I switched to Cowboy and haven't looked back.   There's some caveats though.  Riak uses Webmachine and Mochiweb, and if you want to use Erlang professionally you need to know Riak. So you need to at least be comfortable with WebMachine and Mochiweb.  The web framework Nitrogen lets you use Mochiweb or Yaws, and you should learn Nitrogen.  Once you learn Nitrogen I suggest you learn Zotonic, which is a Content Management System (CMS) like Drupal and is built on top of Nitrogen.  If you are coming from Rails, or want something that feels similar, then I suggest checking out the Chicago Boss web framework.

I'll add subsections here later as I get time.


9/19/2012

Quicky and dirty fix for wrong architecture error from node-waf

I needed to use Cloud9 behind a firewall tonight.  I followed the instructions, using Node v0.8.9 and npm 1.1.61, on a MacBook Air running 10.7.3.  They didn't work. I tried both cloning from git then running sm,  and just running sm.  So I set about trying to build from scratch.

Everything was chugging along until I hit libxml, a dependency of jsDAV.  Node extensions used to be built with node-waf, which has lots of problems building the right architecture on Macs. The recommended procedure now is to use node-gyp.  But...libxml does not.

So of course I got the dreaded error:
Error: dlopen(/Users/foo/cloud9/node_modules/jsDAV/node_modules/libxml/lib/libxml/o3.node, 1): no suitable image found.  Did find:
/Users/foo/cloud9/node_modules/jsDAV/node_modules/libxml/lib/libxml/o3.node: mach-o, but wrong architecture
cd /Users/foo/cloud9

Create the following file: ./node_modules/jsDAV/node_modules/libxml/support/o3/build/build_raw.sh

Enter the following into it:
/usr/bin/g++ -g -O3 -msse2 -ffast-math -fPIC -compatibility_version 1 -current_version 1 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -arch x86_64 -IRelease/include -I../include -IRelease/hosts -I../hosts -IRelease/modules -I../modules -IRelease/deps -I../deps -I/usr/local/include/node ../hosts/node-o3/sh_node.cc -c -o Release/hosts/node-o3/sh_node_1.o 
/usr/bin/g++ -g -O3 -msse2 -ffast-math -fPIC -compatibility_version 1 -current_version 1 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -arch x86_64 -IRelease/include -I../include -IRelease/hosts -I../hosts -IRelease/modules -I../modules -IRelease/deps -I../deps -I/usr/local/include/node ../hosts/node-o3/sh_node_libs.cc -c -o Release/hosts/node-o3/sh_node_libs_1.o 
/usr/bin/g++ Release/hosts/node-o3/sh_node_1.o Release/hosts/node-o3/sh_node_libs_1.o -o /Users/foo/cloud9/node_modules/jsDAV/node_modules/libxml/support/o3/build/Release/o3.node -arch x86_64 -bundle -undefined dynamic_lookup -L/usr/local/lib -lxml2

Make it executable:
chmod +x ./node_modules/jsDAV/node_modules/libxml/support/o3/build/build_raw.sh
Change to the build directory:
pushd ./node_modules/jsDAV/node_modules/libxml/support/o3/build
Build it:
./build_raw.sh
Copy it so it's usable:
cp node_modules/jsDAV/node_modules/libxml/support/o3/build/Release/o3.node /Users/foo/cloud9/node_modules/jsDAV/node_modules/libxml/lib/libxml/o3.node
Now you can move on.  This weekend I'll post the commands to build cloud9 one module at a time, in case you really need to.  Remember to always try the well-travelled path first, it might work for your configuration.

8/25/2012

How to wrangle cowboy_client

Cowboy is a great networking framework written in Erlang from the folks at Nine Nine's. Cowboy is primarily designed for creating a web server. It is easy to use, robust, and fast. However it doesn't have enough documentation to suit me, and I don't have the time to fix that myself (yet). Despite that I heartily recommend Cowboy for anyone doing web server work in Erlang.

cowboy_client is a little known, and undocumented, module within Cowboy that allows you to send HTTP requests and parse results. It works well but without the documentation it took me a little bit of experimentation in conjunction with source code diving to figure out. The module below shows how to get the status, headers, and body resulting from an HTTP GET request. There is no error handling in the code, it's just an experiment.

1/28/2012

How to start over in a Github repo

So I am cleaning up some source code and putting it onto Github. I made some mistakes when I made my first commit, so I needed to back that out. The rest of this short post describes how I did it.

But first, a warning: If other people are using the code in your repo already do not do this. It will completely mess them up and should only be done when you are first committing a project, or at least only when no one has forked the project.

First you scrap your commits in your local repo...

git init git add .

Next make sure there are no extraneous files...

git status

If there are files you don't mean to add the remove them...

git rm --cached FileToRemove

Next do the commit and force it onto the master...

git commit -a -m 'first commit' git remote add origin git@github.com:Owner/Repo.git git push origin +master