Uncov, buzz aggregation, and the dumbness of crowds
Written on January 17, 2008
Although I am not a big fan of most Web2.0-preaching blogs and bloggers (think Mike Arrington , Scoble, and the gang), I never miss an udpate from Uncov. With its poignant humor, gung-ho attitude, and abundant cynicism, it always gets to the heart of bad code, poor vision, and over-optimistic venture capitalits. In other words, Uncov is that rare idiot in the Valley village who dares speak truth to power. Plus, the pictures that accompany their posts are surely for the LULZ.
How about this little gem they dropped a few days ago while reviewing a company that sells t-shirts with Web2.0 logos:
Just as hosting conferences is the only way to make money from Facebook applications, this seems to be the only way to make money from web 2.0.
...yes, and I've been to a few conferences like that -- and they were truly horrible. And yes, I feel that a lot of money was made from talking about Facebook applications. I even remember programming one Facebook application while listening to somebody from Facebook talking about programming Facebook applications -- that was very meta.
On a serious note, what I like most about Uncov is that they are not in permanent Web2.0-denial, unlike the rest of the tech scene in the US. Sometimes, when a service is useless it's, well, just "useless", not "promising".
Recently, though, I asked myself why there is only one Uncov for a few dozen Techcrunches. Okay, this can be part of that "counterculture" thing-- but there is surely space for more than one cynical and Web2.0-bashing blog that doesn't fear to use expletives, when everybody else on the block is extremely upbeat, ambitious, and polite. The number of contrarian voices covering the Web seems to be inversely proportional to how much hype it gets in the mainstream media.
...And then I stumbled upon a post that Tim O'Reily wrote a few months ago: "Facebook, the Quant Fund Meltdown, and the Techmeme Leaderboard", in which he suggested that the discourse in tech blogosphere is slowly degenerating, under the pressure of buzz aggregators like TechMeme and specifically its Leaderboard. What O'Reily stated wasn't really a big secret: the surest way to be on the front page of TechMeme and thus part of the Leaderboard is to comment on existing stories, so the discourse on tech blogs quickly becomes self-referential, rarely looking outside of the mainstream tech subjects -- hence, enormous amount of time spent on discussing Scoble's Facebook problems as opposed to Kenya's crisis, something that Ethan captured very nicely with his post).
Above is the front page of TECHMEME. If you want to see how quickly it's changing, check this video: somebody made screenshots of the front page every 5 minutes for 50 hours.
I've always felt indifferent to the skepticism towards the impact of Web2.0 on democracy and the objectivity in online debates espoused by the likes of Cass Sunstein : in most cases that he documents, there is an inherent correction mechanism which helps us to avoid the "extremist enclaves" he fears so much. Social media aside, I've been even less willing to seriously doubt the value of buzz aggregation, particularly in the non-English blogospheres.
Consider the new addition to the family of interesting aggregators of local content-- a blog aggregator from Belarus called 101blog (disclosure: I've been closely involved with it in its early conceptual stage). It launched a few weeks ago and is very similar to BuzzFeed in its set-up and vision: there is a BuzzMonitor installation (with a lot of tweaks and custom scripts, as Yahoo Terms, the system that BuzzMonitor relies on for meta-data extraction, does not handle Cyrillic symbols well) that helps to detect the buzz; the front end is provided by a heavily modified premium WordPress theme, called Revolution News Theme. There is a group of "editors" who follow the buzz in the Belarusian blogosphere, and then wrap it up in a few sentences with great context.
Screenshot of a new buzz aggregator of Belarusian blogs, 101blog.netTo me, there is no doubt that they help to enrich the existing conversations by providing context and also by making the blogosphere more accessible to non-geeky audiences. In case of 101blog, there is also the first blog search engine for Belarus (built in Google Co-Op) -- so unless you believe in the "Google Is Making Us Dumber"argument", it's hard to argue that this service is not adding tremendous value to the blogosphere on the ground. At minimum, it helps to consolidate the national blogosphere.
Unfortunately, this "consolidation" issue is often overlooked by traditional analysis of non-English blogging: bloggers in Ukraine, Belarus, and Russia would often blog in Russian, for example, but their posts will be on local subjects--or they will be blog in Belarusian and Ukrainian, and the Russians will get xenophobic all of a sudden, as their local "diggs" and "technoratis" will be full of links to posts in languages they don't understand that well. These tensions exist, and are not really going away.
I also believe there is as much value in regional blog aggregators like Afrigator -- they at least put everybody on the map and enable some peer-learning if not outright real-life collaboration.
Yet I am hesitant to draw the same conclusion about the value of buzz aggregation to the US tech blogosphere. This community is already extremely incestuous. Adding more incest by the virtue of clustering similar stories together is going to make it any better (thus, the recent craze with social networks centered around DNA adds a quirky sublime twist to this :-). Buzz aggregation is at its most helpful when the tool helps to discover new people,blogs, and ideas; it's least useful when its use is limited to denoting the relative importance of one famous blogger to another one. All explicit compilations of such data -- like the Leaderboard -- turn a beautiful idea of helping bloggers discover other bloggers -- into a celebrity contest (I have a much stronger expression I am tempted to use as substitute ;-)
What got me thinking about this topic again recently is a Sept 2007 research paper about the impact of online recommendation systems on sales diversity. Amazon is, perhaps, a pioneer in making recommendation systems work. Theirs and similar systems rely on the concept of "collaborative filtering": analyzing purchases of other buyers who have bought the same product to predict what other products may be of interest to you; the model is largely computational-- it relies on previous historic data to calculate the similarity between you and others.
The paper that sheds more light on how collaborative filtering impacts our exposure to diverse information is "Blockbuster Culture's Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity"; it's written by Daniel Fleder and Kartik Hosanagar, both at the Wharton School at the University of Pennsylvania. There is a shorter version of the paper available at Wharton Knowledge. And to be fair to Sunstein, he does spend a good deal of time in his books talking about "collaborative filtering" from a theoretical perspective.
Conventional "long-tail" logic will lead us to believe that as the online world makes space constraints irrelevant and provides for an infinite number of titles, we would be exposed to more products and thus more titles, many of them in niches that are not currently served by big brick-and-mortar retailers. While this is certainly true in theory, it doesn't address the crucial question of which titles we are actually going to buy and what factors are going to influence our decision. If the choice of titles available to us increases from a few thousand handled by our local bookstore to millions of titles handled by Amazon, how would we go about buying new titles that we may not have even heard of? Are we likely to buy many more titles that were unknown to us before-- some of them obscure -- or would we settle on blockbusters?
Screenshot of Amazon's recommendation system
This is one of the questions that Fleder and Hosanagar try to answer in their paper. One shouldn't underestimate this matter: the logic of collaborative filtering, in its pure or somewhat mutated form, has now won over most social media web-sites, including some serving "hard" news; thus, if you get fired up about the control and business pressure that modern corporate world gets to exercise over the editorial side of things, you should be as fired up about the effect that collaborative filtering will have on what we read, how we read, and when we read.
The core argument advanced by Fleder and Hosanagar is that as recommendation systems like Amazon's are powered by calculating the relevance between data based on historical data about sales, we are much more likely to consume related titles that already sell well and avoid titles that have no solid historical track record, Thus, books that may be of terrific value, but have not yet been read by the rest of the community would always remain obscure, pushing us to buy more popular titles that have already been discovered and bought by others.
If their findings go against your own experiences with purchasing stuff online (and they certainly go against mine -- I know that, thanks to Amazon's book-recommending feature, I bought some terrific books I never expected to buy), this is because, often, the aggregate diversity decreases even though individual diversity increases (in authors' own words, "recommenders can push each person to new products, but they often push us toward the same new products"). Although a bit uncanny, it's logical: there is no way for Amazon to recommend me the next good book if nobody has bought it yet (thus, never giving it a quantifiable attribute). I hope there is a special term for this in social science, but I can only think of "engineered serendipity" -- we stumble upon something we didn't expect to find, but that something got there precisely because we did something earlier to trigger that reaction.
Can we easily jump from critiquing Amazon-like systems to critiquing social news sites and aggregators, be they Digg or Techmeme? It's too early to tell -- and there's been very little scholarship on the subject, and I feel there is no other way to answer this question but to get a lot of user data from both sites and start modeling it (also, I haven't seen much academic scholarships on Techmeme in particular -- would be great to see it, if it exists). The operating environment is slightly different: we are still not close to fine-tuned "Daily Me"-like newspapers that can change their aggregation and presentation algorithm based on user activities (Findory was the closest and, curiously enough, it was conceived by Greg Linden, one the key architects of collaborative filtering at Amazon-- but the site is no longer operational). There are a few newer projects out there-- see Spotback, for example, of almost momentous customization, and Newser, where you can easily adjust the content main page by indicating your preference for "hard" vs "soft" news.
FRONTPAGE OF SPOTBACK CHANGES IN REAL-TIME BASED ON HOW Y0U RANK EACH STORY YOU READ
WITH NEWSER, YOU CAN CHANGE WHETHER TO SEE MORE "HARD" OR "SOFT" STORIES (IN REAL-TIME)
Community-driven sites (or, if we want to take it broader, sub-sets of the global or a national blogosphere) are , of course, considerably different from aggregators like Newser. Yet, for them to work and take full advantage of wisdom of crowds, the community members not only have to be diverse and their opinions have to be independent of each other (those are just two of the four requirements to "form a wise crowd").
The problem is that we see a growing number of motivational distortions -- think Leaderboard at TechMeme, the front page of Digg, or the Technorati Index--which alter that diversity and independence in dangerous ways, modifying writing and linking habits. Factor in an online advertising system like Google Adsense, which puts premium monetary value on the above distortions, and there emerges an eco-system, where wisdom of crowds can quickly degenerate into its opposite. Thus, it's not surprise that some stories always make it to the top spot on all web-sites (again, the ubiquitous Scoble with his Facebook), while more news-worthy ones are consistently skirted both by bloggers and mainstream media.
This is probably a good enough reason to stick to Uncov!

