Since this article takes the scenic route in getting to the point, I thought I'd excerpt a preview of that point here at the top of the article, so that you could be sure what it is you were going to be reading about:
The goal is provider agnosticism, enabling users to manipulate their digital lives without much care as to what specific services and data sources they're dealing with. In the future, this intervening software layer will master all other web-based services users traditionally patronized in a more manual fashion, by giving the user the power to view, describe, and publish one's digital life in ways the user gets to choose independently of which service providers they choose. People should be able to discuss, blog, post photos, post bookmarks, manipulate social graph data, and any other arbitrary data, in a pure, notional form, facilitated by an intervening layer that based on the user's preferences and settings, ought to be able to automatically figure out which endpoints to publish the user's data to, and what algorithms to use to reconcile potential discrepancies between the capabilities of those different services.And now, on with the show...
The issues of data portability and decentralization have been on my mind for a lot of years. I've focussed a great deal of my time on trying to find an elegant solution to the problems of ubiquitous casual censorship, as well as the surprisingly related problem of information overload.
I'd like to take a moment to distinguish between casual and systematic censorship: one is an attempt at eradication of information, whereas the other is more often seen as an attempt to mitigate the challenge viewers face of wading through uselessness to get to what they really want to see. (Comment "moderation" is a good example of where this problem shows up.)
One obvious challenge of attempting to decentralize data storage and retrieval is that there will always be agents (be it an e-pharmacy, Google search results, or a neo-nazi who has just discovered The Internet) out there, misleading you or wasting your time, either deliberately or inadvertently.
Another big reason data decentralization hasn't
happened yet is due to the economic forces which typically challenge existing centralized data services: the goal is often to
recentralize a data service. That is to say, the enemy to many of these efforts has been existing monoliths of data services, not merely
the popularity of monolithic services in general.I've probably been pretty vague until now, so here's what I consider a very good example:
Gabbly is a service in a long lineage of services that provides a forum or chat room associated with an arbitrary URL. For *any* chat or forum service allowing sufficient channel/thread descriptor precision (so that a URL can, in some form or encoding, be made to name/handle/identifier of the channel/thread you wish to use) could fulfill this basic premise. If you use an external URL->channel mapping service, the need for channel descriptor precision would scale only with usage (think
tinyurl.)
But as I already mentioned, Gabbly isn't the only service like this, and it doesn't cooperate with other chat systems like it. It's worth noting that Gabbly, like many of its competitors, provides client-side software for facilitating Gabbly discussions, and a means for developers to embed that software in their websites. The ability to embed the software in your website provides something very important, but it isn't ideal.
Embedding Gabbly or one of its competitors' chat widgets on your pages provides
endorsement of Gabbly as the means of chatting about the contents of that page. It's important to note, however, that this endorsement effectively exists on almost every web-based publishing platform today. But if you want to comment on the article I'm writing right now, you'll have to make a choice, and my implicit endorsement of Blogger's commenting subsystem is what will ultimately make that choice for 99% of visitors to any website.
But of course, content providers don't always know what's best for users. Even if they make a
great choice, there are still
so many choices available, it's unlikely that the choice is what every visitor will prefer; we're
still centralized, and still locking our users into the monolith we happen to have stumbled into.
Naturally, then, some
Gabblies attempt to mitigate the pains of data centralization by wrangling together data from other sources:
co.mments.com is a service that will monitor the progress of online discussions that can be pulled via RSS (read more about it at techcrunch
here.)
Unfortunately, co.mments.com isn't much other than an aggregator: a user interface for
pulling, and viewing the pulled data. co.mments.com could be a great means of decentralizing the task of
reading the web, but presently it does nothing for writing.
What's worse is this: the decentralization offered by co.mments.com is only skin deep: it's only a place to view
monolithic discussions in a
recentralized way. Because co.mments.com's
notion of a discussion is only what can be pulled from a single source, escaping the monolith is
dependent on the sources that co.mments.com pulls from. co.mments.com does nothing to liberate commentors from the "moderation" of the discussion, nor does it do anything to wrangle the discussion
of the same content as provided by different service providers.
Granted: I don't really think this would be a good feature, yet.
Let me explain a little more clearly: Just what is it that I am proposing as the first improvement co.mments.com could make? That co.mments.com notion of a conversation needs to be abstracted away from the monolithic model that fates one data source per discussion, and become capable of merging multiple data sources into a single conversation? Yes, but I don't even believe that 0.001% of discussions tracked by co.mments.com users would benefit from this feature.
So why would I suggest such a costly architectural "improvement" if I didn't think anyone would use it? Because we have a chicken-and-egg problem: the first and last word influencing the decision of
which monolithic discussion service readers will choose when they want to participate in discussion of something on the web is the
endorsement I mentioned earlier. Users will only use the endorsed discussion system, no matter how monolithic, limiting, or even censored it is (think of YouTube videos uploaded by multinational corporations,) because they're afraid they won't get lumped into the same system as the other 99.999% of users who are participating in the discussion. The second reason co.mments.com and similar services should abstract discussions away from data sources is that sometimes content and discussions legitimately get published to multiple endpoints (URLs) and if you're going to offer push services (I'll get to that later) you'll need a means of preventing discussions from getting fractured by the inevitable event that users click "discuss" or "post comment" when viewing the same content through different URLs.
So now you're starting to get the picture (I hope.) Finally, we can focus on the subject that inspired the title of this article "The Opposite of Aggregation," and the future model of data portability on the web.
I've made it clear that in order for co.mments.com to really fight centralization and all that centralization carries with it, it needs to abstract
notional conversations away from
conversation data sources so that there is no longer the limitation of a one-to-one mapping between the two. Like I also said, there won't be a user base for this immediately because there exists no one-stop-shop for
viewing this strange meta-conversation that must be pulled and stitched together from multiple data sources. But that's exactly what co.mments.com is: a user interface for viewing conversations from different sources.
The next step, however, for true discussion decentralization (ignoring, for now, automatic discussion data soruce discovery, and assuming that each user will define the data sources he deems worth for pulling and stitching together discussions,) is a
push mechanism. That is, co.mments.com should implement the capability of
posting comments as well. co.mments.com users ought to be able to post to a notional conversation stitched together from multiple data sources, and in doing so, have their post appear on multiple discussion services. (There are some obvious
gotchas that will invariably crop up from this involving feedback loops whenever, some day, there exists more than one co.mments.com with
comment push capability, but that's a discussion for another day.)
Now dealing with censorship may be one of my personal goals, however this article goes far beyond that, and thus, far beyond moderated discussions and co.mments.com. The greater goal here is
monolith agnosticism, enabling users to manipulate their digital lives without much care as to what specific services and data sources they're dealing with. People should be able to discuss, blog, post photos, post bookmarks, manipulate social graph data, and any other arbitrary data, in a pure, notional form, facilitated by an intervening software and user interface layer similar to my future vision for co.mments.com. That layer then, based on the user's preferences and settings, ought to be able to automatically figure out which endpoints to publish the user's data to, and what algorithms to use to reconcile potential discrepancies between the capabilities of those different services.
In case it's unclear, a very naive and simple example of those sorts of discrepancies is the difference between rich text and plain text. By far the most common way that software applies automatic "downsampling" from rich to plain text is to simply remove the rich formatting elements from the text. This is bogus. There are all *sorts* of ways to cope with the _underwhelming_ collection of text decorations not available in plain text.
This is of course an extremely simple example, and in no way involves reconciling
formal formatting differences, but I hope it gives you an idea of what I was talking about when I said that users might have
preferences about how differences in capabilities are reconciled with one another.
In the future, this intervening software layer will master all other web-based services users traditionally patronized in a more manual fashion, by giving the user the power to view, describe, and publish one's digital life in ways the user gets to choose
independently of which service providers they choose.
I won't go into too much detail, but another
big selling feature for this is that many websites that provides great and extremely popular service in
one facet of their offerings, simply provide grossly inadequate offerings in others. My favorite example is YouTube, plagued not only by comment moderation, but by a lack of comment moderation! How many YouTube videos have 900+ comments on them? Due to a poor sorting system which fails to give you any reasonable means for finding the most valuable and relevant comments, YouTube has become a horrific and terrible way to discuss videos.
The holy grail of this notion of "data abstraction" software that abstracts your digital life away from the limitations of existing service providers may just come in the form of a proxy or a browser plugin capable of intercepting all interaction (both ways) with websites and doing a number of great things with it:
- Extracting and digesting idioms of interest; things like comments, related blog posts, related videos, related URLs, keywords and tags, favorites / bookmarks / score, etc.
- Merging those idioms pulled from the default data source (the website you're visiting) with data pulled from other data sources.
- Reconciling the differences between the capabilities of the services those idioms are pulled from, and potentially upgrading the user interface provided by the website you're visiting to support their richest potential format (or whatever the user has specified in their preferences.)
- Accepting content pushed by the user in the upgraded format; since the idioms being displayed have been abstracted away from the website you're viewing and its inherent limitations, there's no reason you need to publish to that specification either (I really *really* hate the character limit on YouTube comments)
- Pushing the content to other data sources -- maybe the same video is even on three different video sites -- why should the user be limited from viewing the aggregate data of all three at the same time in the same interface, and participating in discussion with users of all three sites?
One day, when this software exists, deals between Facebook and Microsoft to "share" your social information will be a laugh we'll all share when we look back on a dated and monolithic
proto-information age where
users weren't the ones in charge.
Labels: decentralization