the What?

I have never heard the Who’s music; and scientists have always wondered about the Why’s; Science mostly explains the How’s; Today’s question is the What. On first glance, the What appears to be tame compared to its illustrious peers. While beginning Jaynes’s tome on probability, I came across this quote said around 1948 and attributed to von Neumann – “For those who believe that comptuers cannot do all that we humans can; please explain in finite, precise steps, what it is that you can do; I will make the computer do precisely that.” ...

Anchor Text and Focused Crawling

Its been a while since I have blogged anything technical. These days, I am working on the open source search engine, Nutch. Before I get into what I am doing, let me explain why, in the last sentence, I put the phrase “open source search engine” as a part of the href tag. Search engines use anchor text extensively to figure out what a page is about. For example, the home page of Tejaswi doesn’t have the phrase “home page” anywhere. So, by looking at the anchor text of all the in-links to a page, the search engine figures out what the content of the page might be about. This is a latent way of identifying the content of a page: by looking at what in-links call it. Now, when I say “the open source search engine Nutch” in the anchor text and link to nutch.org, that phrase gets associated with the site, and helps someone searching for an open source search engine, but has no clue about Nutch itself. ...