Filtering the Real-Time Web

With the emergence and spread of a variety of real-time communications protocols and channels – from IM clients to RSS and enterprise applications – the useful half-life of information has shrunk significantly, in many cases to mere seconds. PubSubHubbub, RSSCloud, ping servers, and a dozen of other technologies are also transforming RSS into a near real-time medium. However, content and information are not one and the same. Looking at our overflowing inboxes, it is clear that instant delivery is but one of many applicable filters. Having dozens or hundreds of stories delivered to you in real-time is not terribly useful if the context is wrong or irrelevant. Which raises a great question: how do we filter the real-time web?

Information vs. Attention Scarcity

Delivering highly personalized, context aware and timely information should be the goal of any publisher and application. There is simply so much information online that it’s impossible to be competitive without those deliverables. The push towards real-time technologies addresses the factor of time, but also places an even higher value on context and relevance. If the breaking story requires immediate attention, should the application interrupt your current task or pull you out of a meeting? Herbert Simon identified this problem decades ago:

Many designers of information systems incorrectly represent their design problems as information scarcity rather than attention scarcity, and as a result they built systems that excel at providing more and more information to people, when what is really needed are systems that excel at filtering out unimportant or irrelevant information.

Or, as Clay Shirky put it, “It’s not information overload. It’s filter failure.”

Context, Relevance and Rich Metadata

overload

Context and relevance are subjective and require explicit knowledge about the user. Personalized news, targeted content and recommendations systems can now be found in most applications as they try to help the user navigate the continuously expanding media landscape. Metadata about the user is a scarce and a valuable resource: if the application knows your preferences, it can deliver useful information. But in order to know your preferences, you first have to go through an often painful training phase where it knows often laughably little about you. In machine learning parlance, this is known as the “cold start” problem.

However, user metadata is only one part of the equation. Having a rich description of the content itself can be an enormous help in helping determine the context and relevance to the user. For example: what is the content about, who is the author, how was it classified, how big is the audience? At PostRank, we’re trying to answer all of those questions and more. We leverage the real-time web to guarantee timely delivery, but as each story passes through the PostRank stack, it is also enriched with as much metadata as possible:

  • Language Analysis: Each story is run through a machine learning algorithm to determine the language. As it turns out, most publishers either forget to provide this data or have it set incorrectly.
  • Semantic Analysis: Each story is a categorized for general sentiment (positive, neutral, or negative), the overall emotional score, and a detailed score for each of Paul Ekman’s categories (anger, disgust, fear, happiness, sadness, surprise).
  • Real-time Feed Engagement Score: How has this feed performed within the last 30 days? A real-time engagement score is injected into every feed, allowing you to easily gauge the size of the engaged audience.
  • Topic & Category: Leveraging the PostRank index and topic classification, each feed is enriched with data about areas of coverage and their popularity.
  • And more

Attention is scarce and as designers of these applications we have to leverage every bit of information available to help the user make the best use of their time and attention. At PostRank our goal is to help the user “Find and Read What Matters”. As the real-time protocols gain wider adoption, we believe that having rich metadata will be crucial to good user experience, and the next evolution in the better, faster dissemination of content and news.

Building or thinking of a Real-Time web application? Come by and talk to us at the ReadWriteWeb Real-Time Summit this week (October 15th) in Mountain View, or ping us directly at anytime.

From Postrank

blog comments powered by Disqus