3 March, 2005 at 19:45 Leave a comment

Paper/Poster: Total Search – Where Art Thou?
Much has been written down and there is a spike of literature and available tools addressing desktop search and its oxymoron counterpart, ‘total search’ over the last few months. The buzz in the present day is pervasive search and its oxymoron counterpart, ‘ubiquitous search’.
But seriously, what exactly is the status-quo of search? Have we made any progress towards the Memex vision? Or does that vision need a change in light of the way technology and user attituditional change is progressing?
Srikant Jakilinki
IR Group, Department of Computing Science, University of Glasgow
Accepted for poster display/talk in PREP’05

Red Line

Introduction
In this paper, we describe the requirements analysis and architectural design of a total-search(TS) system called “MemNet”. As a part of this process we have tried and tested a number of existing TS systems (also called desktop search systems) and have identified issues that basically render the word TS into an oxymoron. We designed “MemNet” by taking into account the inadequacies, the reasons for their presence and considering the changes afoot and already visible in how users are dealing with information in the present day. We will briefly describe our findings, justifications and ideas in the current implementation of “MemNet” which is an incremental system viz. “MemNet” uses the API of an existing total-search tool and plugs-in intelligence on top of it.

The TS Landscape
In the last 18-months or so, the landscape of TS has changed quite drastically, While before, users still had to rely on slow operating system find/locate services to search for files and expensive tools, today, there are a wealth of free and commercial TS tools[1] (mainly from rival search companies to compete on the desktop) to choose from and many more are in the pipeline. However, it is striking to note that all such TS tools have taken a crude, brute force, keyword driven, temporally ordered, results list ordering UI retrieval approach. Striking because such attempts to solve the problem at hand have been shown to be inadequate by commercial and research systems before. Wisdom dictates clearly that the TS problem is to be handled differently as expectations and usage are markedly different from a keyword IR system. However, we note that with these tools, users are today better equipped to find stuff on their bludgeoning systems provided they remember the keywords in the files they forgot. It is a paradox. But we note this is a small step in the right direction.

Inadequacies
Each of the tools we reviewed have their own advantages and disadvantages on the 4-axes of speed, size, status and style and at the end of the analysis, there is no clear winner and consumer-choice is never a bad thing. However, we have identified six main problems which are highly desired but have not been found lacking irrespectively –
1)Multimedia Retrieval: Little progress has been made in this field and most systems rely on using just the filenames. ASR, OCR, CBIR are nowhere on horizon. TV, SMS, telephone integration (among others) are not even on the blue-sky agenda of these systems
2)Availability Problem: Some media are inherently volatile like DVD, music players, subscribed access, streams, radio, network disks etc. They aren’t currently supported until tweaked heavily
3)Information Distribution: A typical users information is distributed over several devices (like PDA) and services (web based email) as per a ‘cloud storage’ paradigm. This also raises issues of privacy and security
4)Application/Format Locking: Even if these tools only handle text, they still cannot provide access to file formats in vogue today Furthermore, applications (like email clients) close the information making them unavailable
5)State/Context Loss: While users interact with information through myriad applications, there is no framework to capture what exactly is happening at the metacognitive and task level. All this richness is getting lost
6)Relevance Ranking: Inspite of having much information in their indexes, the default ranking for these tools is that of reverse chronology which is not befitting to the value of the index. We also note that the index itself is relatively unused for interesting tasks

Enter MemNet
MemNet is a network/graph of documents using a discretized, update-per-day, retain-state/change approach and is a verbatim translation of our ‘digital pond’ metaphor viz.  “all documents are alive interacting as ripples in a pond intertwingled and   emergically related in a small-world” –

ripples
Fig 1. Science-Art impression of “Life as Digital Pond”. Each ripple is a document affecting every other. The diagram is the Moire pattern which by itself is interesting in the present context

MemNet is a very minimal implementation of the “Strands Theory” and “Lifesearch Platform”[2] the authors are proposing to the M4L[3] DTI project. Using the API of a total-search system, we have designed a plug-in which adds an intelligent layer on top of the index so accrued. By simple reasoning on this index and matching them with the clues of captured interactions by applications, we build a simple graph and hypothesize that MemNet will exhibit emergent and small-world properties. The design overview of MemNet is as shown below –

memnet-dashboard-framework
Fig 2. MemNet follows basic Cluepacket architecture of Dashboard/Beagle project[4] but with very simple parasitical IPC mechanism and heavy post-processing.

It is clear that MemNet basically leaves the bulk of the processing to the applications and this is partially true. But, the application dependence is only virtual. MemNet works with neutral C/A/M timestamps and uses the principle of “resonance” and “co-activation” to update itself. By quantizing activity, MemNet is in a position to reason about any document which has been “touched” by the user at any point by analyzing the traces left by the document in its life –

memnet-document-traces
Fig-3: Document Traces help MemNet reason about the actual utility of a document vis-a-vis the user

MemNet tries to capture interactions of users with their documents and we believe that this paradigm is going to solve most, if not all of the issues identified in existing systems and research efforts. At present, MemNet is in the middle of implementation and we are working towards demonstrating it at PREP’2005.

Conclusion
Review of total-search systems has been done which identified many inadequacies and that rich application interactions could be critical. MemNet addresses these problems.

Main References
[1] Systems Reviewed –
Google, MSN, Yahoo, Ask, SIS, Copernic, Enfish, Blinkx, HotBot, Beagle, Spotlight, DocSearcher, Chandler, Haystack, Xanadu, Scopeware, Forget-me-Not, X1, SoomSoom etc.

[2] Strands Architecture and Lifesearch Platform –
Closed direct access only. Please contact for further details

[3] Memories for Life – Computing Science Grand Challenge and a European DTI Project –
http://www.csd.abdn.ac.uk/~ereiter/memories.html

[4] GNOME Beagle Project –
http://www.planetbeagle.org

Red Line

Key words to describe this work:
Personal, Information, Retrieval, Life, Search, Review, Distributed, Architecture, Network, Emergence, Cluepackets
Key Results:
Existing ‘total search'(TS) tools are reviewed and compared within and in context of attempts in the past while identifying issues reflected upon in the design of a new architecture
How does the work advance the state-of-the-art?
Recommends a light, open, application-conversing architecture to solve the most challenging problems that present research and commercial TS tools have not yet been able to address
Motivation (Problems addressed):
Memories for life. Possibility of storing every piece of data in an individuals life, previously largely unknown, has clearly shown the inadequacy of available total-search tools and paradigms to manage, archive and retrieve such huge amounts of data and ushered in an express need to explore fresh ways of thinking and exploring novel architectures

Advertisements

Entry filed under: Uncategorized.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Calendar

March 2005
M T W T F S S
« Feb   Apr »
 123456
78910111213
14151617181920
21222324252627
28293031  

Tweets


%d bloggers like this: