Posts filed under ‘Projects’
The most difficult and expensive phase is sending the satellite into the ether and even more important is what you do with it after because otherwise, it is just space junk. I rest my case. Let me be and ponder why that chicken is crossing the road.
Where were we? Ah yes, my research. That ship has sailed but the light is being carried by Jim Gemmell and Gordon Brown. They recently brought out a book called “Total Recall” that has come out and their blog has some wonderful pointers of how we are on the path to create digital surrogates on the web already. Our bookmarks, history, thoughts, expertise, appointments, events, friends, bits, interests, locations, places, reminders, TV shows, artifacts like photos are all being archived/available on the web and with the right aggregator and linking services, one can pull together a fairly accurate digital version of oneself. Irrespective of all this progress, from my early days of internet access and even today, I am aware of the vastness of the WWW which overwhelms and underwhelms me at the same time because the web is really large and massive and gives me exposure to many brilliant people and ideas. Like the narrator emphasized in the “Hitchhikers Guide To Galaxy”, it is just unbelievably vast, huge and mind-bogglingly big. Whenever I go online, I feel like my neurons are connecting to the collective sentient consciousness of an entire species (well, those who have connectivity) residing on a little blue rock…
December 14, 2009: From the early days of computers, people have speculated that computers would be used to supplement our intelligence. Extended stores of knowledge, memories once forgotten, computational feats, and expert advice would all be at our fingertips. In the last decades, most of the work toward this dream has been in the form of trying to build artificial intelligence. By carefully encoding expert knowledge into a refined and well-pruned database, researchers strove to build a reliable assistant to help with tasks. Sadly, this effort was always thwarted by the complexity of the system and environment, too many variables and uncertainty for any small team to fully anticipate. (cue: ode to Vannevar Bush and “Memex”)
Success now is coming from an entirely unexpected source, the chaos of internet. Google (and smart search engines of tomorrow) has become our external brain, sifting through the extended stores of knowledge offered by multitudes, helping us remember what we once found, and locating advice from people who have been where we now go. For example, the other day, I was trying to describe to someone how mitochondria oddly have a separate genome, but could not recall the details. A search for [mitochondria] yielded a Wikipedia page that refreshed my memory. Later, I was wondering if train or flying between Venice and Rome was a better choice; advice arrived immediately on a search for [train flying venice rome]. Recently, I had forgotten the background of a colleague, restored again with a quick search on her name. Hundreds of times, I access le external brain, supplementing what is lost or incomplete in my own. This external brain is not programmed with knowledge, at least not in the sense we expected. There is no system of rules, no encoding of experts, no logical reasoning. There precious little understanding of information, at least not in the search itself. There is knowledge in the many voices that make up the data on the Web, but no synthesis of those voices.
Perhaps we should have expected this. Our brains, after all, are a controlled storm of competing patterns and signals, a mishmash of evolutionary agglomeration that is barely functional and easily fooled. From this chaos can come brilliance, but also superstition, illusion, and psychosis. While early studies of the brain envisioned it as a disciplined and orderly structure, deeper investigation has proved otherwise. And so it is fitting that the biggest progress on building an external brain also comes from chaos. Search engines pick out the gems in a democratic sea of competing signals, helping us find brilliance. Occasionally, our external brain leads us astray, as does our internal brain, but therein lies both the risk and beauty of building a brain on disorder. I have seen/played with future and it is not classical AI.
To illustrate it more clearly, let us have a peek into what Collecta (a self-proclaimed ‘real-time search engine’) is doing. It primarily scours for blogs, tweets, comments and media from the social media landscape and exposes a simple search box on top of the index with scrolling results on every tick of time. While Collecta still does not crawl other activity streams like Delicious, Evernote, Hooeey, Digg etc. (due to lack of API or traction or both), very quickly, I was inundated by opinions of people talking about something that I was interested in knowing although it is a very simple keyword play using data stream algorithms is what I thought was happening in the background. In other words, I was firing a query and figuratively, getting people and their noisy thoughts, not documents per-se, as results. I was pretty amused (even UI is cute) but not impressed. Quite. There should be a semantic-people-web out there. As for object-centric web (or 3.0 or 4.0) of the future, someone has to invent it but there are some chains of thought of what will searching on that web look like. I have some ideas too (hey, they’re free) but let us not get ahead of ourselves and stick to the real-time search topic and come back to the Battelle article, shall we? Till then, here are some choice quotes from article(s) and available commentary. It is at best incomplete because, a lot of debate must have ensued, ideas spawned, hundreds of blog posts (and tweets and comments) written yada yada yada in the downstream but we will never know about ALL even if citations and trackbacks are supposed to be transitive. If B mentions A and C mentions B, then A should carry over to C (or the commentary should attach to A). This is what I want to work on using the a multitude of available APIs (ThoughtReactions seems to be a good name for such an endeavour, no?) but that is a project for the proverbial another day which never ever seems to dawn. Am rattling again. Out with the commentary…
Google was the ultimate interface for stuff that had already been said – a while ago. When you queried Google, you got the popular wisdom – but only after it was uttered, edited into HTML format, published on the web, and then crawled and stored by Google’s technology. It’s inarguable that the web is shifting into a new time axis. Blogging was the first real indication of this. Technorati tried to be the search engine for the “live web” but failed. Twitter can succeed because it is quickly gaining critical mass as a conversation hub. But there is ambient data more broadly, in particular as described by John Markoff’s article (posted here). All of us are creating fountains of ambient data, from our phones, our web surfing, our offline purchasing, our interactions with tollbooths, you name it. Combine that ambient data (the imprint we leave on the digital world from our actions) with declarative data (what we proactively say we are doing right now) and you’ve got a major, delicious, wonderful, massive search problem, er, opportunity.
Let’s say you are in the market to buy something – anything. You get a list of top pages for “Canon EOS”, and you are off on a major research project. Imagine a service that feels just like Google, but instead of gathering static-web results, it gathers live-web results – what people are saying, right now about “Canon EOS”? And/or, you could post your query to that engine, and you could get real-time results that were created – by other humans – directly in response to you? Add in your social graph (what your friends, and your friend’s friends are saying), far more sophisticated algorithms a critical mass of data – and those results could be truly game changing. OneRiot just launched and I believe we’re taking a piece of the problem by finding the pulse of the web. The content people are talking about today by having over 2 million people share their activity data processing it in real-time and create the first real-time index. The web as it is today, now, tackling news first followed by videos and products next. And therefore, each pulse.
How much journalism these days is spotting patterns from the real-time web? How much is mining the static web? There is another form of journalism, which involves spending time in the real world, but it may be falling out of fashion. I’m not sure that there’s a huge great wobbly lump of wondermoney sitting at the end of the real-time web search rainbow. And if there is, I wonder if it’s much bigger than the one sitting a day further down the line, where the massive outpouring of us auto-digitising hominids has been filtered by the mechanisms we have, more or less, in place now. Google’s big problem isn’t that it can’t be Google a day earlier, it’s that it can’t be cleverer about imparting meaning to what it filters. For now, and until AI gets a lot better, the new worth of the Web is how we humans organise, rank and connect it. The good stuff takes time and thought, and so far nobody’s built an XML-compliant thought accelerator – Rupert Goodwins
Do you think live feeds be treated similar to how newswires were 30 years ago – considered a pay-for service? You’ve described how Twitter could start making money (via its search) and made me think of the possibility of Google buying Twitter. How different from Twitter should Google’s indexing of Twitter be? Their blog search is dismal because they’re searching good with junk. Look at those Twitter results, I am wondering exactly what utility they actually bring? I mean what value to the user? To be frank I care less what my friends think about the Canon EOS than what the opinions of professional photographers. In that regard there need to some method for improving authority. My social graph is my social graph – it’s of dubious value to me for making buying decisions. All the same, great post as it continues to generate lots of discussion in our office. The point you raise about what this feels like to users is especially near to me – it’s one thing to bring back real time results, and another thing entirely to present them in dynamic, useful ways.
I’m not all that concerned about what twit Twittered what in the last 24 hours, and I think that most of the people that do are twits. For instance, if I was researching a camera or a car, I’d be interested in the best stuff written about it in the last year or so, not in the last five minutes. Sure, a public relations flack might want to keep track of bad things people say on Twitter so they can have their lawyers send them nastygrams, but for ordinary people, it’s just a waste of time. Entertaining maybe, but a waste of time. Right on. It is not just real-time search, there’s a lot more that can cash in on this (and provide great user experience in the process). There will also be a goodly sum of what Rupert calls “wondermoney” racing at lightspeed toward the bank account of the company that will best provide the means to protect privacy of hundreds of millions who have absolutely no need nor any desire to see the dots of their every action and comment connected and delivered to “the matrix”.
This is definitely the next big thing in search. Your articulation of it is perfect. I say this, because I experienced this same thing over the last several weeks when I created a new Twitter account for our new products and wanted to track what people are saying. A quick Twitter search was the answer and a few replies later I had some conversations going and new followers as well. The real-time web will far outweigh the benefits of the archived web, atleast for certain types of information. Journalism was the original search engine, albeit with a rather baroque query interface. It tends to adopt most efficient use of people and technology to produce good data, being a notoriously Darwinist entity, and it’s quite good at adapting quickly – hasn’t taken long for blogs to make their mark. It’s a good thing to track if you want to sniff out utility on the Web – after all, journalism is the first draft of history.
Marketers would love that ambient data but that is a backwards approach to search. I don’t see the usefulness or appetite for people to query about what their friends are doing – especially when its already being delivered to them. You really need to see what’s going on in FriendFeed more to grok the real time nature of the web. Look at my realtime feed here for just a small taste – that’s 4,800 hand-picked people being displayed in real-time here. So, I think evolution is the wrong word. Perhaps the right word is “rediscovery”, or “mass public revelation” or “adoption” or something like that. The future was here 15 to 50 years ago. It just wasn’t (to quote the popular phrase) evenly distributed. So maybe all you’re saying is that this particular aspect of search, i.e. routing and filtering, or SDI, or whatever we may call it, is finally “growing” or “spreading”. But “growing” != “evolving”. But search is not evolving; what you are speaking of already exists and has existed.
We are talking about “text filtering” which sounds exactly like an idea that has been around for 40+ years. Here is a description of the problem from http://trec.nist.gov/pubs/trec11/papers/OVER.FILTERING.pdf – a text filtering system sifts through a stream of incoming information to ﬁnd documents relevant to a set of user needs represented by profiles. Unlike the traditional search query, user profiles are persistent, and tend to reflect a long term information need. With user feedback, the system can learn a better proﬁle, and improve its performance over time. The TREC ﬁltering track tries to simulate on-line time-critical text filtering applications, where the value of a document decays rapidly with time. This means that potentially relevant documents must be presented immediately to the user. There is no time to accumulate and rank a set of documents. Evaluation is based only on the quality of the retrieved set. Filtering differs from search in that documents arrive sequentially over time. This overview paper was from 2002, but the TREC track itself goes back to the 90s and the idea goes back even further. In fact, now that I think of it, I remember talking with a friend at Radio Free Europe (anyone else remember that?) in Prague back in 1995, and he was describing a newswire system that they had, that did this online, real-time filtering. So maybe there’s a shift from static to real-time search in the public, consumer web. But there have been systems (and research) around in other circles that have been doing this for a while.
You may note that the link refers to a machine called ‘Memex’, Vannevar Bush (one of the first visionaries of “automated” information storage and retrieval schemes) wrote about decades before Luhn wrote about SDI. But you could go back a couple millenia, too – for example: the ancient Greeks argued whether words were real or ideal, representations or hoaxes for “actual observation” (and such disputation persisted throughout the Middle Ages [Occam’s Razor] to this very day [one of most renowned philosophers of 20th Century – Ludwig Wittgenstein – probably immensely influenced the AI community without their even being “aware” of it). The issue that such “gizmos” such as SDI and/or AI in general cannot deal with is that the world keeps changing: change is the only constant. Everything is in flux – always! As it always has been, no? The ideas and technology for all search were around way before Alta Vista popularized them, and Google.
 Technorati is a cautionary tale but then, most blog search engines (Technorati, Icerocket, Tailrank) have not made an impact because value of pure play search is in doubt. No one wants to go to a search box when there are the triumvirate of Google, Wikipedia and Browser Search Bar. Even Google is neglecting the area (cue: Google Blog Search sucks). Sad really because I feel that blogs empowered the first and therefore, the impressionable pioneering wave of citizen journalism and democratization of media phenomenon (Podcasts, YouTube, Seesmic, Qik etc. followed) that is a promising and enticing field which got washed away while still raw by Twitter (which can still be seen as lazy blogging if one is really looking hard) and the search companies the statusosphere spawned (OneRiot, Topsy, Collecta, Scoopler, TweetMeme). Maybe it is the ‘path of least resistance’ or ‘journalism is not for everybody’ at play here or just that something might be missing like say, attention data that can today be sucked from various places (eg. “implicit web”). Some blogosphere companies still exist and have survived, nay, thrived because they were smart to change their technology, business and operational model like Sphere (where I worked) and another promising company, Twingly (working on ideas such as ‘Channels’ and integrating with rest of mainstream Web 2.0). Am not a betting person but if life depended on it, I predict a revenge of the hybridosphere (blogs plus history, status and trails) when the Twitter fad cools down as well, just another phase of tripe (Facebook has 40 times more updates). We are already seeing it because Twitter is becoming yet another ego-URL store and copy-cat social network where it is becoming increasingly difficult to seperate the genuine article from the millions of pretenders, spammers and worst, marketeers.
 Between extremes of organized mainstream professional media to unstructured freestyle frivolous noise of jibber-jabber, there is a small, yet significant band of people-centric web which offers a truly multi-opinionated clairvoyance to the world. An analogy is ye faithful human eye which can only see a very small portion of the electromagnetic spectrum. Sure, it would be nice to be able to see the ultraviolent and infrared frequencies but the most interesting things happen in the visible band because it is so colourful and vibrant. There has to be an evolutionary benefit that the eye has settled to its current state. Getting out of the metaphor, this narrow band of semi-professional passionate implicit-explicit human generated content (you call it ‘hybridosphere’ if you like), if captured and processed intelligently, can be made to do some very magical and wonderful things (search, direct and indirect such as ‘related articles’, is just one of the many applications that can be built on top of this foundation and as proof look at crowd powered news site Insttant and sentiment analysis companies like Clara and Infegy) to all stakeholders but most of all, to the general public who just want to see the web as a collective of nice people living harmoniously in a wee global village free from shackles of big media opening up a world of discovery from all parts of our little blue marble in the sky. It is a matter of time and effort (luck is to work on RSSCloud, ThoughtReactions, Histosphere and other neologisms) when we will see such Webfountain’ish hybrid companies (data mix of blogs, status, history, conversation, bookmarks, attention, trails, media, objects etc.) claiming their rightful place in Web 2.0 (or 3.0 or 4.0) pecking order, bringing to the fore badly needed innovation to excavate the people-centric web diamond mine. In my vision, searching in such a world looks figuratively like this…
This is inspired by a scene in “The Time Machine” (2002) where the protagonist Alexander encounters the Vox System in the early 21st century. The virtual assistant (played by Orlando Jones), is seen on a series of glass fibre screens offering to help the hero using a “photonic memory core” linked to every database in the world effectively making it into a compendium of all human knowledge. Since this scene must have been thoroughly researched, it is safe to rip it and suffice to say that an immersive search experience is one where the searcher is virtually forwarded to experts in the area who might have the answer he/she seeks. [edit: 20091214] Apparently, such a thing has been pondered before. Obvious really. It is Battelle again writing for BingTweets Blog, “Decisions are Never Easy – So Far. Part-3” –
Normally a 30 minute conversation is a whole lot better for any kind of complex question. What is it about a conversation? Why can we, in 30 minutes or less, boil down what otherwise might be a multi-day quest into an answer that addresses nearly all our concerns? And what might that process teach us about what the Web lacks today and might bring us tomorrow? The answer is at once simple and maddenly complex. Our ability to communicate using language is the result of millions of years of physical and cultural evolution, capped off by 15-25 years of personal childhood and early adult experience. But it comes so naturally, we forget how extraordinary this simple act really is. I once asked Larry Page of Google, what his dream search engine looked like. His answer: Computer from Star Trek – an omnipresent, all knowing machine with which you could converse. We’re a long way from that – and when we do get there, we’re bound to arrive a with a fair amount of trepidation – after all, every major summer blockbuster seems to burst with the bad narrative of machines that out-think humans (Terminator, Battlestar Galactica, 2001 Space Odyssey, Matrix, I Robot… you get the picture).
Allow me to wax a bit philosophical. While the search and Internet industry focus almost exclusively on leveraging technology to get to better answers, we might take another approach. Perhaps instead of scaling machines to the point of where they can have a “human” conversation with us (a la Turing), perhaps instead (or, as well), we might leverage machines to help connect us to just the right human with whom we might have that conversation? Let me go back to my classic car question to explain – and this will take something of a leap of faith, in that it will require we, as a collective culture, adapt to the web platform as a place where we’re perfectly comfortable having conversations with complete strangers. Imagine I have at my fingertips a service, that allows me to ask a question about which classic car to buy and how, and that engine instantly connects me to an expert – or a range of experts that can be filtered by critieria I and others can choose (collective intelligence and feedback loops are integrated, naturally). Imagine Mahalo crossed with Aardvark and Squidoo, at Google and Facebook scale.
An ‘expert’ of course is still undefined and the jury is still out on what such an entity constitutes. Hey! I never said I have all the answers. Besides, aren’t things like call centres, web site with live chat etc. already handle this rant of human-on-line? and communication is always a problem. So, good luck with that. Live long and prosper.
 Let us talk about histosphere. The concept is fairly simple. There are several companies (Hooeey, Google, Infoaxe, Thumbstrips, WebMynd, Iterasi, Timelope, Cluztr, Wowd, Nebulus etc.) that are collecting the browsing history of users mainly through the mechanism of toolbars. On an individual basis, ‘web memory’ has utility and so, users can be convinced that it is a good tool to have and that it is a good idea to share the surf logs to the public at large not very unlike the case made for social bookmarking. This collective social history (also count Opera Mini logs whose web proxy server is collecting 500Million URLs per-day and Mozilla Weave which will have similar numbers soon) is what I call the ‘histosphere’ (a parallel word being the blogosphere and the criminally underexploited, bookmarkosphere). A simple theory is that the histosphere is a proper superset of blogosphere and bookmarkosphere and hence it is as useful, if not more so, than both combined. There is a trickle effect at play here. Not all history gets bookmarked and not all bookmarks get blogged. So, the narrow band we talked about above is really narrow but as any signal processing engineer would vouch, we should also count the haze or radiation to make sense of the quasar. Therefore, the same business and technology models of blogosphere (example, Sphere) and bookmarkosphere (example, Digg) can be replicated for the histosphere but given the noisy nature of surf logs, one should apply filters (like ‘engagement metrics’) and use properties of attention data (like ‘observer neutrality’) to deliver better experiences. Google is already trying to do this if one is logged in to get personalized search results but they suck in one-off rare cases they are visible. A use-case is to combine web memory with the side-effect of identity provided by toolbars to customize the whole web experience. Everywhere you go, the web memory follows sifting through the cacophony. For example, if I am using Infoaxe and go to NYT or WSJ, the publishers will detect that it is me@infoaxe and deliver relevant content (and also ads, sic). Whichever search engine (reference or blog or real-time) loads history (and other streams) onto its cart will no doubt upset the shifting gravy train. Go Hybridosphere!
[edit – 20091119] While I initially did this in jest, it never escaped my purview that “running water” is still a dream chased by over 90% of the worlds population. There is just not enough water (the future wars will be fought over water and all that) and plumbing and there are just too many people. Why, just today, Thomas L Freidman in his latest piece, “Americans Living in Fools Paradise” quips –
people in the developing world are very happy being poor – just give them a little running water and electricity and they’ll be fine, no worries at all for us
Just gives more weight to the pondering, ain’t it? I guess we just have to live with the knowledge that those of us (believe me, it was a fight to get it working in my house) who have running water are the chosen lucky buggers and could do with a little more modesty in complaining about our pampered and mundane lives. If you are statistics/story inclined, you might want to go see some numbers and realities on Red Button Design (disclaimer: co-founder of this company, James Brown, was a club-mate of mine at University of Glasgow) making acclaimed Reverse Osmosis Sanitation Systems (ROSS) based water purifiers aimed at BoP of the third world.
Srinath Ramakkrushnan and his IIT-Madras team who call themselves ‘Graminavitas’, are a lot more ambitious lot, proposing an integrated solution that spans rice de-husking in Natham, a 300-household village 60 km north of Chennai (with a de-husking machine he himself made after a two-year stay in Ujire in Karnataka) to building a micro-grid architecture that would partly use biogas produced from the husk to produce power to providing a workable public toilet system to improve rural sanitation to using the waste from the toilet to produce biogas to replace the need for LPG… phew.
Neha (chirpy 20-something Punjabi kudi in pink tees and blue jeans) and team from Sri Venkateswara College of Engineering are trying to produce electricity using local resources in a village in Tamil Nadu so they can have power supply round-the-clock, instead of just two hours a day. The ‘Energy Boosters’ chose Kaliyapettai village near Chennai, which has a textile mill nearby discharging industrial effluents. Neha and friends used the effluents as nutrients to grow algae on. Algae convert carbon dioxide absorbed from the atmosphere into lipids, which are then converted into biodiesel to generate electricity in a diesel generator. The team grew algae in a tank and have sent in the oil they produced for analysis of its power potential. Neha says the oil produced in 5 days can power lighting for the village’s 600 families through the day, for an initial cost of as little as Rs. 1 lakh (or 2000$).
Shashikant Burnwal, Arnab Chatterjee and Ashim Sardar of IIT-Kharagpur have built a pot-in-pot storage system that helps store vegetables and cooked food at temperatures as low as 8 to 10 degree Celsius, using nothing more than two earthen pots and a fan picked up from the insides of a desktop computer. Refrigeration, with minimal electricity necessitated by global warming. They have also designed a home cooling system in which sunlight falls on a PVC roof and heats it up, causing airflow between low pressure and high pressure areas, cooling homes – again, no electricity used.
Are these ideas, and those of the other 15 teams, practical, scaleable and worth the trouble? Well, the judges went around grilling the participants on the economics, the scientific principles and technology and the novelty of the ideas. GE and the Indian government’s Department of Science and Technology (through DSIR TEPP program) have already sweetened the deal. Each of the 18 finalist teams will take home Rs.20,000. In addition, GE will award the winning team, to be announced on Friday, Rs.5 lakh and a runner-up Rs. 1 lakh. And, to boot, the DST will consider funding their ideas so they can turn it into reality. While I feel that I have seen some if not all of these ideas during the days when there was only one TV channel in India (so the whole family watched just about everything from cheesy Mahabharatha to agricultural programs on biogas and mushroom farming), I suppose, there are some positives. Atleast it got some people thinking even if it is heavily incentivized.