This blog covers my thoughts and observations about organizing, retrieving, and understanding the structure of the live web. I use the phrase “live web” to cover freshly published content, including blogs, news feeds, and actions in social networks.
author: Paul Ogilvie
I’m a Ph.D. candidate at the Language Technologies Institute at Carnegie Mellon University studying the use of document structure for search engines supporting humans as well as other natural language processing applications. I’ll be finishing my dissertation in that area soon.
Since September 2007, I have also been working full time as the Principal Scientist at mSpoke. mSpoke provides personalization, filtering, and metadata services – our product FeedHub demonstrates these technologies by filtering RSS Feeds. My work at mSpoke has exposed me to a very broad set of information retrieval related problems, including language identification, duplicate detection, large scale text categorization, clustering, cluster labeling, filtering, and the use of implicit feedback.