Sunday, March 9, 2008

MicroID: Verifiable? Really?

I'm going to cut right to the chase: is MicroID really "decentralized verifiable identity?" After giving a thorough reading to the "press literature" (at microid.org,) I could have honestly said that I didn't believe MicroID did anything at all. Here are some excerpts from MicroID's official website:

MicroID enables anyone to claim verifiable ownership over content hosted anywhere on the web... MicroID is not an authentication or single-sign-on service, just a straightforward method for identifying content ownership...


Identity verification sans authentication? What?

Due to the mild headache I've sustained every time I read or heard mention of MicroID, I was absolutely repulsed by the idea of reading the specification, which, at first glance appeared to be nothing other than a much more intimidating representation of the "assembly procedure" which involves splicing, hashing, and concatenating several strings. It just seemed so ill-conceived that I was not even sure where to begin dismantling it; like trying to decide how to react during a debate to a complete non-sequitur.

I suppose I might have began here: You cannot have verification without a secret.

The reason I'm writing this now, after having read the specification, is because MicroID is being grossly misrepresented. If you don't believe me, just read the press literature:

To verify a user's membership in any (trusted) 3rd party site...


Those parenthesis around the word "trusted" are not mine. Why there are parenthesis at all is extremely strange to me, as the trust it describes it not optional.

This is the crux of my distaste for MicroID's advertising. There is no verification at all. You must trust the source of the MicroID annotation in order for the annotation to mean anything at all.

Finally I'd like to point out some key parts of the technical specification:

By itself, a MicroID has no inherent meaning, since it is simply a string created from two URIs. Any entity can generate a MicroID even if it has not verified the identity of the resources associated with one or both URIs. Furthermore, a MicroID is easily copied by an entity that did not generate it. Finally, a MicroID is not digitally signed by the entity that generated it and therefore cannot be cryptographically associated with the generating entity.


The conclusion you should walk away from this article with is two-fold: First, MicroID is not verifiable. Second, MicroID does not enable anyone to claim verifiable ownership over content hosted anywhere on the web. It does do one useful thing: it could potentially serve as a simple standard for machines to infer authorship from trusted sources. It would be very cheap and low on resource consumption because it requires no CPU-intensive work that real cryptographic verification would call on, and it would be low on human resources because it's trivial to implement. But since the system is designed with a trust dependency on the service provider (ie. the website,) then if you're targeting humans with a technology for verifying authorship, why not just stamp content with a clear visual indication of who authored it? But obviously, everyone already does that...

Monday, March 3, 2008

The Opposite of Aggregation

Since this article takes the scenic route in getting to the point, I thought I'd excerpt a preview of that point here at the top of the article, so that you could be sure what it is you were going to be reading about:

The goal is provider agnosticism, enabling users to manipulate their digital lives without much care as to what specific services and data sources they're dealing with. In the future, this intervening software layer will master all other web-based services users traditionally patronized in a more manual fashion, by giving the user the power to view, describe, and publish one's digital life in ways the user gets to choose independently of which service providers they choose. People should be able to discuss, blog, post photos, post bookmarks, manipulate social graph data, and any other arbitrary data, in a pure, notional form, facilitated by an intervening layer that based on the user's preferences and settings, ought to be able to automatically figure out which endpoints to publish the user's data to, and what algorithms to use to reconcile potential discrepancies between the capabilities of those different services.

And now, on with the show...

The issues of data portability and decentralization have been on my mind for a lot of years. I've focussed a great deal of my time on trying to find an elegant solution to the problems of ubiquitous casual censorship, as well as the surprisingly related problem of information overload.

I'd like to take a moment to distinguish between casual and systematic censorship: one is an attempt at eradication of information, whereas the other is more often seen as an attempt to mitigate the challenge viewers face of wading through uselessness to get to what they really want to see. (Comment "moderation" is a good example of where this problem shows up.)

One obvious challenge of attempting to decentralize data storage and retrieval is that there will always be agents (be it an e-pharmacy, Google search results, or a neo-nazi who has just discovered The Internet) out there, misleading you or wasting your time, either deliberately or inadvertently.

Another big reason data decentralization hasn't happened yet is due to the economic forces which typically challenge existing centralized data services: the goal is often to recentralize a data service. That is to say, the enemy to many of these efforts has been existing monoliths of data services, not merely the popularity of monolithic services in general.

I've probably been pretty vague until now, so here's what I consider a very good example:

Gabbly is a service in a long lineage of services that provides a forum or chat room associated with an arbitrary URL. For *any* chat or forum service allowing sufficient channel/thread descriptor precision (so that a URL can, in some form or encoding, be made to name/handle/identifier of the channel/thread you wish to use) could fulfill this basic premise. If you use an external URL->channel mapping service, the need for channel descriptor precision would scale only with usage (think tinyurl.)

But as I already mentioned, Gabbly isn't the only service like this, and it doesn't cooperate with other chat systems like it. It's worth noting that Gabbly, like many of its competitors, provides client-side software for facilitating Gabbly discussions, and a means for developers to embed that software in their websites. The ability to embed the software in your website provides something very important, but it isn't ideal.

Embedding Gabbly or one of its competitors' chat widgets on your pages provides endorsement of Gabbly as the means of chatting about the contents of that page. It's important to note, however, that this endorsement effectively exists on almost every web-based publishing platform today. But if you want to comment on the article I'm writing right now, you'll have to make a choice, and my implicit endorsement of Blogger's commenting subsystem is what will ultimately make that choice for 99% of visitors to any website.

But of course, content providers don't always know what's best for users. Even if they make a great choice, there are still so many choices available, it's unlikely that the choice is what every visitor will prefer; we're still centralized, and still locking our users into the monolith we happen to have stumbled into.

Naturally, then, some Gabblies attempt to mitigate the pains of data centralization by wrangling together data from other sources:

co.mments.com is a service that will monitor the progress of online discussions that can be pulled via RSS (read more about it at techcrunch here.)

Unfortunately, co.mments.com isn't much other than an aggregator: a user interface for pulling, and viewing the pulled data. co.mments.com could be a great means of decentralizing the task of reading the web, but presently it does nothing for writing.

What's worse is this: the decentralization offered by co.mments.com is only skin deep: it's only a place to view monolithic discussions in a recentralized way. Because co.mments.com's notion of a discussion is only what can be pulled from a single source, escaping the monolith is dependent on the sources that co.mments.com pulls from. co.mments.com does nothing to liberate commentors from the "moderation" of the discussion, nor does it do anything to wrangle the discussion of the same content as provided by different service providers.

Granted: I don't really think this would be a good feature, yet.

Let me explain a little more clearly: Just what is it that I am proposing as the first improvement co.mments.com could make? That co.mments.com notion of a conversation needs to be abstracted away from the monolithic model that fates one data source per discussion, and become capable of merging multiple data sources into a single conversation? Yes, but I don't even believe that 0.001% of discussions tracked by co.mments.com users would benefit from this feature.

So why would I suggest such a costly architectural "improvement" if I didn't think anyone would use it? Because we have a chicken-and-egg problem: the first and last word influencing the decision of which monolithic discussion service readers will choose when they want to participate in discussion of something on the web is the endorsement I mentioned earlier. Users will only use the endorsed discussion system, no matter how monolithic, limiting, or even censored it is (think of YouTube videos uploaded by multinational corporations,) because they're afraid they won't get lumped into the same system as the other 99.999% of users who are participating in the discussion. The second reason co.mments.com and similar services should abstract discussions away from data sources is that sometimes content and discussions legitimately get published to multiple endpoints (URLs) and if you're going to offer push services (I'll get to that later) you'll need a means of preventing discussions from getting fractured by the inevitable event that users click "discuss" or "post comment" when viewing the same content through different URLs.

So now you're starting to get the picture (I hope.) Finally, we can focus on the subject that inspired the title of this article "The Opposite of Aggregation," and the future model of data portability on the web.

I've made it clear that in order for co.mments.com to really fight centralization and all that centralization carries with it, it needs to abstract notional conversations away from conversation data sources so that there is no longer the limitation of a one-to-one mapping between the two. Like I also said, there won't be a user base for this immediately because there exists no one-stop-shop for viewing this strange meta-conversation that must be pulled and stitched together from multiple data sources. But that's exactly what co.mments.com is: a user interface for viewing conversations from different sources.

The next step, however, for true discussion decentralization (ignoring, for now, automatic discussion data soruce discovery, and assuming that each user will define the data sources he deems worth for pulling and stitching together discussions,) is a push mechanism. That is, co.mments.com should implement the capability of posting comments as well. co.mments.com users ought to be able to post to a notional conversation stitched together from multiple data sources, and in doing so, have their post appear on multiple discussion services. (There are some obvious gotchas that will invariably crop up from this involving feedback loops whenever, some day, there exists more than one co.mments.com with comment push capability, but that's a discussion for another day.)

Now dealing with censorship may be one of my personal goals, however this article goes far beyond that, and thus, far beyond moderated discussions and co.mments.com. The greater goal here is monolith agnosticism, enabling users to manipulate their digital lives without much care as to what specific services and data sources they're dealing with. People should be able to discuss, blog, post photos, post bookmarks, manipulate social graph data, and any other arbitrary data, in a pure, notional form, facilitated by an intervening software and user interface layer similar to my future vision for co.mments.com. That layer then, based on the user's preferences and settings, ought to be able to automatically figure out which endpoints to publish the user's data to, and what algorithms to use to reconcile potential discrepancies between the capabilities of those different services.

In case it's unclear, a very naive and simple example of those sorts of discrepancies is the difference between rich text and plain text. By far the most common way that software applies automatic "downsampling" from rich to plain text is to simply remove the rich formatting elements from the text. This is bogus. There are all *sorts* of ways to cope with the _underwhelming_ collection of text decorations not available in plain text.

This is of course an extremely simple example, and in no way involves reconciling formal formatting differences, but I hope it gives you an idea of what I was talking about when I said that users might have preferences about how differences in capabilities are reconciled with one another.

In the future, this intervening software layer will master all other web-based services users traditionally patronized in a more manual fashion, by giving the user the power to view, describe, and publish one's digital life in ways the user gets to choose independently of which service providers they choose.

I won't go into too much detail, but another big selling feature for this is that many websites that provides great and extremely popular service in one facet of their offerings, simply provide grossly inadequate offerings in others. My favorite example is YouTube, plagued not only by comment moderation, but by a lack of comment moderation! How many YouTube videos have 900+ comments on them? Due to a poor sorting system which fails to give you any reasonable means for finding the most valuable and relevant comments, YouTube has become a horrific and terrible way to discuss videos.

The holy grail of this notion of "data abstraction" software that abstracts your digital life away from the limitations of existing service providers may just come in the form of a proxy or a browser plugin capable of intercepting all interaction (both ways) with websites and doing a number of great things with it:

  • Extracting and digesting idioms of interest; things like comments, related blog posts, related videos, related URLs, keywords and tags, favorites / bookmarks / score, etc.
  • Merging those idioms pulled from the default data source (the website you're visiting) with data pulled from other data sources.
  • Reconciling the differences between the capabilities of the services those idioms are pulled from, and potentially upgrading the user interface provided by the website you're visiting to support their richest potential format (or whatever the user has specified in their preferences.)
  • Accepting content pushed by the user in the upgraded format; since the idioms being displayed have been abstracted away from the website you're viewing and its inherent limitations, there's no reason you need to publish to that specification either (I really *really* hate the character limit on YouTube comments)
  • Pushing the content to other data sources -- maybe the same video is even on three different video sites -- why should the user be limited from viewing the aggregate data of all three at the same time in the same interface, and participating in discussion with users of all three sites?
One day, when this software exists, deals between Facebook and Microsoft to "share" your social information will be a laugh we'll all share when we look back on a dated and monolithic proto-information age where users weren't the ones in charge.

Labels:

Thursday, September 20, 2007

New Applications, and Public Registration

New Applications

A recent idea I hope to roll out within the next two months is a system which will leverage SocSurveys.org's existing infrastructure for a second purpose. Rather than only using the data for the purpose of conducting research, a new proposal for the site would allow survey-takers to take a new kind of survey -- which is meant equally to inform as to research.

The new approach employs a highly-dynamic survey interfaces which constructs new questions for users based on their input to previous questions, but with a twist: the questions are as informational for the user as they are for research purposes.

The challenge of "information overload" affects us all, including our ability to participate in a democracy. If we are not sufficiently abreast of relevant information, democracy becomes subservient to the dynamics of memetics and the media. Although for many, it seems that the choices they're given at the polls are obvious, many Americans don't pay a great deal of attention to important intermediate steps to determining candidacy, namely primaries. When also taking into consideration that democratic participation envelops more than simply casting a ballot, it becomes quite clear that the difficulty to stay informed directly undermines democracy.

The new system that may be added to SocSurveys.org would ask users questions that have been automatically identified as effective metrics for determining what news information each user would like to hear about, but also what information they are likely to not already know.

Public Registration

Although most of our resources are currently being dedicated to the development of our versatile multimedia coding software (affectionately referred to as RIAS Killer) we should be opening public registration before October is over.

At the same time we hope to roll out versatile survey continuations, as well as Standard Information Batteries, which both helps to make research done with SocSurveys.org more valuable, as well as speeding up survey development time, and incentivizing survey participation for survey-takers who have used SocSurveys.org before.

That's all for now!

Labels: , , ,

Wednesday, September 12, 2007

Blog Integration, New Workstation, and "RIAS Killer"

BLOG

Didn't think I'd be making another post on this blog before I integrated the blog with the rest of the website, which is what I had set out to do just a minute ago.

In previous generations of my server-side employment of RSS-pull I was satisfied to simply allow items in the long-term RSS stream expire when the RSS server stopped serving them. Eventually I knew my software would be tagging, organizing, and storing all kinds of items from all sorts of sources / syndication formats / request methods.

I set out to implement this for the first time with the SocSurveys.org Blog. My primary reason for this is because I'd like it to be simple change what sort of Blog you see at SocSurveys.org by switching various tags. I obviously want to be able to blog about general issues pertaining to the website, but it's also important to me that I be able to post sometimes in a more candid manner or on more personal subjects than what's typically appropriate for the SocSurveys.org blog.

At any rate, work on that has halted for a moment. Our server doesn't appear to have any way to access our MySQL database from Python, the language in which I prefer to manipulate RSS. (I actually considered a two-stage process that piped output from a Python script into a PHP script.. momentarily.)

New Workstation

Courtesy of my good friend Tom, I'll be borrowing his old server with 6 times as much RAM as the laptop I've been working on (among other improvements.) I'm going to postpone editor improvements until I am able to pick it up. Expect much more frequent updates when that happens. I expect to open public registration soon after getting my hands on it.

"RIAS Killer"

This is the first mention of this project on the web. For those of you unfamiliar with RIAS, it is a patented coding convention used primarily (as far as I know) for quantifying doctor-patient interaction. (Don't be surprised at all if it is far more generalized than that; I have a very limited perspective on this part of it.)

Casually referred to by the same name is software provided by the company that you go to if you want to license the rights to use RIAS.

It's terrible.

Purportedly, this software hasn't experienced any kind of update in years, crashed or produced fatal run-time errors frequently, is limited to the Windows platform, and appears to be written in Visual Basic.

If I recall correctly, it costs $2,000 per seat? (I'll have to get verification for this.)

This is a problem that often seems to pervade not just a single industry, but every industry. Allow me first to relate a common musing of mine about the relationship that information technology has with an invisible, but financial substantial slice of all IT users. I think it's quite relevant.

After seeing a Super Bowl ad for IBM, or after reading the list of technologies used a prospective employer, or after personally experiencing the horror of some marketing executive misappropriating (or, to use a more specific mathematical analogy, reciprocating) the phrase "embedded Python," it is often a musing of mine that, business success notwithstanding, whether or not Company X makes sound technology decisions involves a great deal of luck.

One of my favorite research subjects is how people make decisions without knowing every detail of every fact that is relevant to that decision. People instead perform a sort of compromised evaluation process that relies heavily upon building a mental network of interrelating evaluations that include people, publications, institutions, factoids, technologies, and more, to try and glean an edgewise view of far-removed factoids with which they have no direct "contact." For example, I might be lead to believe that because Alice uses Product X, it must be a good product. My evaluation process might be clearer, however, if I knew why Alice chose it to begin with: perhaps an intimate knowledge of Product X and its competitors is responsible, or perhaps she chose it because Bob did. But I'd better move on before this digression overtakes the rest of this post..

One of the tenants that capitalism depends on is the claim that people who make the most money are also the people who are the best at making money. The more true this statement is for a given society, the better it is for the economy. There are of course just as many counterexamples to this tenant as there are numbers of ways to come into money without earning it, but the truly interesting question is this: does a good businessman necessarily mean a good IT man?

The answer is "No!" of course. While many highly successful businessmen are so, this is almost always because they had the inside scoop on their own upcoming technological R&D and the lowest-to-the-ground-floor investment opportunity that ever exists. In place of actual technical knowledge, good businessmen must be masters at manipulating and utilizing their mental network of evaluations of people and factoids and all other sources of information. Underlying the businessman's methodology for solving technological problems -- and all problems outside their own realms of expertise -- is the ability to grok the strength and reliability of others' technical aptitudes. Building and maintaining a network of reliable information sources is of the utmost importance to a person trying to tackle diverse problems they are not immediately equipped to deal with.

Now, if you're thinking to yourself this describes more people than just businessmen, then you've gotten back on track before I have. As I mentioned before, this applies to everybody. This methodology of compromised evaluation is the fundamental basis of decision making employed by people making decisions 99% of the time. You don't need to be a biologist to decide to eat an apple.

But do you need to be a computer programmer to decide what technology is best for your company?

The phenomenon whose description I have been leading up to for about ten paragraphs might be aptly described as niche isolation. When a particular cultural niche is particularly devoid of specific technical aptitudes upon which it comes to depend, the problem can go completely unnoticed. Professionals in such a niche will pay far too much for products and services worth far too little, and the niche can be too obscure to be noticed by potential competitors who would help drive prices down and improve the quality of solutions in that niche. What's worse is that the poor products which have already established themselves in the niche can hold on to it through mechanisms such as buyer's remorse, and wariness toward change. If not those, then there's always the simple inhibitory factor of we've already paid for this one, we're not paying for another one. Mechanisms which cement previous solutions in their place deter competitors from entering the market.

This finally brings us back to the RIAS coding software I mentioned before. Living and working with a sociologist has exposed me to a plethora of IT horrors which seem to result from the niche isolation phenomenon, and that really grinds my gears.

Therefore, we are entering our own competitor to this software niche. If you know of any existing solutions we might not have heard of, please let us know!

Labels: , , ,

Saturday, September 8, 2007

Blog

SocSurveys.org now has a blog!

Right now you're actually looking at a website called BlogSpot, a publishing medium endorsed (and integrated to an unknown degree) by Blogger.com. I've chosen to use Blogger for their front-end interfaces for posting new content, and for their somewhat spam-moderated user database (owned by Google.)

Soon, however, the blog page will be integrated into the rest of our website.

See you then.

Labels: , , ,