the What?

I have never heard the Who’s music; and scientists have always wondered about the Why’s; Science mostly explains the How’s; Today’s question is the What.

On first glance, the What appears to be tame compared to its illustrious peers. While beginning Jaynes’s tome on probability, I came across this quote said around 1948 and attributed to von Neumann – “For those who believe that comptuers cannot do all that we humans can; please explain in finite, precise steps, what it is that you can do; I will make the computer do precisely that.”

This is precisely the importance of the What. What is it that we are doing here? What is it that we want? What is it that happens when we breathe? What is emotion? What is thought? What is pride? What is light? What is everything….

A what can mostly be answered by enumerating all the members of the answer set, and hopefully some abstraction will come out of it so that the next time, we use the abstraction instead of enumeration. But the first time, or till the time the abstraction comes out, we are stuck with the enumeration of all members of the answer set. Here is precisely where things might go out of hand. We might not be able to enumerate all that we think contribute to the answer to a single What. There might be just too many of them.

Let me elaborate that with an example. If someone asks me – What is poetry? – I can either come up with an absract answer which encapuslates all forms of poetry, all poems ever written, all poems that will be written in the future, all poems’ purposes, all this, for all languages. The abstract encapuslation of these needs to be simple, concise, and should give a precise description of poetry.
I can simply enumerate ALL possible poems there are, all of them, in a set, and give the enumeration set as the answer to “What is poetry?”.

It seems that for all the NP-complete problems, we are stuck with the enumeration till the abstraction for all enumerations is found. We don’t even know if such an abstraction exists. This is the famous P vs NP question, and well, there is a chance that this question itself is beyond answer.

My next post will elaborate on my current thoughts on the relevance of the P vs Np question in the eternal human-vs-machine debate.

ps – while writing about that poetry example, I remembered that there was something called Information Complexity which I had read about somewhere, and here it is. It goes by the name of Kolmogorov Complexity.

What started out as a blog….

ended up being this. Do check out Cosma Shalizi’s Notebooks. What amazes me is the amount of reading a person can do. And the diverse range of reading interests anyone can hold. This is by far the widest range of reading that I have seen anyone accomplish.

I wonder where time passes us by. Passed me by. I need to rework my schedule.

In other news, Aishwarya Rai has been asked to present an Oscar. Why her? and why this? Of course, she is beautiful, artsy, speaks good English, hazel eyes, brown hair, high cheek bones, the works. First, Cannes, and now Oscars. I guess she is just what the media wants; here, as well as in the west. The clueless reader from Bangalore will be proud about how India has finally arrived on the World Stage and fold Bangalore Times, finish her donut, exit a Starbucks like coffee shop, and don her headphones, and become Melanie again.

I am really not sure what I am lashing out against, but somewhere, it hurts. It also feels bad cuz the hurt is not fully backed by logic. And the hurt is not strong enough to change the way I am. I can’t find a consistent set of theories regarding why things are happening the way they are. I can’t reconcile my own desire for pizzas with my empathy for the Tsunami victims. I hate it that I take refuge in the fact that Gandhi used to be a fashionable dandy during his London years, and only became the Gandhi we know after his stint in South Africa. Its a screwed mix of thoughts, and needs a lot of figuring out before real plans of actions are charted out. Of course, I can’t wait for all that…..I need some food now, and thats where Pizza Hut comes in. I am waiting for the sensualist in me to die….

Manufacture of Consent?

Currently a very heated discussion is on in the IITB.general newsgroup here at IIT-Bombay. The topic is somewhat pertaining to the lack of academic and research interest of undergraduate (UG) students (as compared to postgraduate (PG)), and their building interest in extra-curricular activities. Replying to the “IIT-B.Tech-praised-by-press-across-the-world” comment by some UG, one of the professors replied that both the Indian and the US presses are biased and self serving, or serving some kind of big-brother of sorts. These presses want go glorify BTech education in IITs, and not bother about MTech and PhDs, and esp any form of research conducted in IISc, and IITs. Why is it that we never hear about IISc in the Indian Media? Is it cuz it is sub-standard? Think again!

I have put the entire text of the professor’s IIT newsgroup posting in this blog entry. Make sure you read the last section about Indian Media, and self-censorship. The posting was titled “Mysterious sedition on part of Indian Press,” and here is the full text.

After reading that, followed by summaries and excerpts of Manufacturing of Consent, I was put into some deep thought regarding what I personally think are cool in this world. Here are a few questions that I asked myself:

– Why do Star Movies/HBO rarely play non-American movies (with English subtitles) ?

– Same applies to TV series on AXN/StarWorld/etc

– Why do I know the Fifth Ammendment of the US consitution? (courtesy TV series – The Practice)

– Why do I care about Mardi Gras?

– Why am I charmed by NYC so much?

– Why do I understand the American accent better than I understand, say British?

– Whatever happened to Varsha Bhosle and Rajeev Sreenivasan and other columnists of that ilk on

– How come European sites never get listed as frequently on Google’s search results as are American sites. Trust me, this is more common than you’d notice, and you wouldn’t notice it probably cuz of the what you are reading right now.

– etc. etc.

I do not deny the non-American influence on me, and of course, the native Indian/Kannada/Bangalore/Family influence on me is perhaps the strongest; but the proactive pro-American influence on my thinking, actions, personality and what I spread as my sphere of influence on others, is incredibly high, done subtly and is very effective – startling!!

I really need to rework myself. Never knew that I, as an individual, had a ghost of an elder brother.

The Jackfruit Letter

The Indian Railways is so fascinating. Mesmerising legacy, mindboggling scale, unbelievable efficiency and of course, the romance of a train journey. My earliest memories of train fascination is that of Gopi: a very close friend from high school days. He used to rattle the starting and ending stations of _all_ train names from the Railway Timetable. Believe me, I used to actually quiz him with that book.

There is this very cooL urban legend about Gopi that made its rounds in our high school corridors (I think its due to Dhruva, but he might claim innocence, like always ;-> ). Legend has it that when some friends had been to Gopi’s place to call him for a customary game of cricket. Gopi and his brother were all dressed up and ready to go somewhere. They were pestering their father about how they would be late for some train and would miss it and all. The unsuspecting friends assumed that their pal was going out of station or something, and decided not to include him in the next day’s game. But it so turned out later (much to their amazement) that the party was going to the Bangalore City Railway Station not to travel soemwhere, but to check out some train that would halt there for some half an hour, and had some “cool” technical specification!!!. Whoa, now, that is a true enthusiast, considering that we were around 12 that time, and his brother was barely 10. Imagine being 10 and being interested in Railway Gauges. Anwyays, thats just the legend. I wonder if Gopi still has his passion for Indian Rail.

Of all the longest-fastest-oldest of the Indian Railways, I find Bangalore only once; The Bangalore Mail has supposedly been running since 1864! If you did visit that page, do check out the listing of speciality food items in specific stations. I am pleased to see the very delicious Maddur Vade listed (and not pathetically called “vada”, but sticking to the Kannada “vade”). I also feel quite sad that I missed out on all the Lonavala and Khandala station goodies during my train rides between Mumbai and Bangalore on the very normal and quait and non-famous Lokamanya Tilak Express. Will make it a point to eat the Chikis and the Burfis the next time around. Gosh, in retrospect, flights are somewhat boring.

As for the title of this posting, its worth checking out the Jackfruit Letter. Quite funny.

Anchor Text and Focused Crawling

Its been a while since I have blogged anything technical.

These days, I am working on the open source search engine, Nutch. Before I get into what I am doing, let me explain why, in the last sentence, I put the phrase “open source search engine” as a part of the href tag. Search engines use anchor text extensively to figure out what a page is about. For example, the home page of Tejaswi doesn’t have the phrase “home page” anywhere. So, by looking at the anchor text of all the in-links to a page, the search engine figures out what the content of the page might be about. This is a latent way of identifying the content of a page: by looking at what in-links call it. Now, when I say “the open source search engine Nutch” in the anchor text and link to, that phrase gets associated with the site, and helps someone searching for an open source search engine, but has no clue about Nutch itself.

Currently, I am working on the crawler part of the search engine. The crawler/spider is an offline process that goes all over the web and gets pages for the search engine to index. The idea is to start the crawler with a set of seed pages. The crawler then starts indexing the textual content of each page, and recursively crawls each page’s out-links. This goes on ad-infinitum. This part is pretty standard, and is already implemented. My job is to ensure that the crawl is not ad-hoc, ie. not all out-links are crawled. I am trying to “focus” the crawl so that only pages pertinent to certain topics get crawled, and subsequently indexed. Topics like “cycling”, “art cinema”, “photography”, “BDSM” etc. Why do we need to focus a crawl?

Google currently claims that it indexes 8 billion webpages. According to recent estimates, un-indexed pages outnumber indexed pages by a factor of 4-5. This means that there are at at least 33 billion pages out there that Google can index, but is not indexing. Why not? well, for one, more pages doesn’t necessarily mean better search results. Good number of pages representing a broad range of topics means better search results. This is where a focused crawl might be preferred over an ad-hoc crawl. If you are really interested, take a look at my advisor‘s Focused Crawling page for more information.

In other news, read Jeremy Zawodny’s post on Mark Jen to know about the Google employee who got fired for blogging some company internals. All corporate bloggers out there….you reading this?


While browsing through someone‘s website, I came across this thought provoking take on nostalgia –

“I’ve always viewed nostalgia as a heresy, but it becomes increasingly harder to fight it off as one grows older. Perhaps it is part of the mechanism we use to cope with regret: when enough patina accumulates, mistakes can be viewed as formative experiences, and switch from being sources of regret to being key moments that contributed to the development of one’s present self. Viewed in that light, nostalgia is a form of self-deception, which doesn’t make it any easier to accept.”

I pondered over it for a while: in the abstract, through the light of my own experiences, my own ideas on it. I conclude (at least for now) that for me, nostalgia is prevalent, but not important/controlling/mood-altering. As I write this, a nagging doubt that nostalgia is indeed mood altering, though not controlling, is creeping in. Why do I say this? As my mood….

I lost a few “instense-nostalgia-triggers” with my wallet a few days back. The first thought was of sadness at loosing these nostalgia-triggers. And now, I can think of the sadness these triggers themselves used to bring about each time by making me take nostalgic trips down memory lane. Now, as I think of the lost triggers, I am struck that I now have memories of my memories. Time to move on.

H. W. Longfellow thought that Nostalgia is a feeling of sadness and longing that is not akin to pain, and resembles sorrow only as the mist resembles the rain. Poetic eh?

Blogger Help

I check around 4-5 different blogs a day, and all of them are on blogspot. But, I get kind of irritated each day because I have to remember all 5 of them, type in their URLs on the browser and check whether they have updated their blogs. I am wondering if blogspot has some feature similar to livejournal, where you can add people as your friends and read all their blog updates on a single page. If you want to know what I mean, visit . Anyone knows?

JEE, GMAT, CAT – Harnessing efforts

Here is my interpretation of one of Samba’s various bursts of inspiration. An abstract idea whose viability, logistics, implementation etc. need to be worked out.

We know that a lot of effort goes into the preparation for these competitive exams; out of which only the top 2% or so make it in to IITs, IIMs etc. Out of say every 100 candidates that takes each exam, 2 of them actually make it in, and so, in some sense, their efforts are not wasted. And I will assume that around 40 of them just took it up as a part of their regular path, and weren’t really serious about them. These numbers can be inaccurate; but bear with me.

The real wasted effort lies in those 50 of them, who actually put in a lot of effort, and out of them, there are at least 10 of them who almost as good as the 2 of them that get in, but just cannot make it in because the number of seats are limited. Now, the question is: can this effort be harnessed to do something good? something profitable? Can the exam structure or the interview structure be changed to do this?

One concrete implementation of this idea. The top few percentile people from CAT are called in for a GD/PI stage, where they are put against each other, and are evaluated for managerial potential, stress, ability to think on their feet etc. One idea is to divide them into chunks, and assign them to random villages in the rural heartland of India. Their job is to stay in these villages for a week (along with an official from the IIMs), and involve themselves in some constructive activity. Call this a real case study as opposed to the arm-chair variety that gets done right now.

This suggestion is not all that preposterous for few reasons
– Candidates are quite serious about their admissions and an IIM admit is a great incentive.
– Its more realistic than an arm-chair case study because real results can be evaluated giving better candidates to these institutes.
– The candidates are smart and some real work that might be useful to these villages can be carried out.
– As CAT filtering has already been carried out, we have a more manageable number of people.

Of course, all these details, and the basic idea itself are up for debate, and thats why the posting 🙂 The rationale behind this thinking is to somehow harness the efforts and the incentives that are a part of this big circus.

Any ideas? problems? possibilities?

Playing the Devil’s Advocate

My usual gang here at IIT, we hang out at Nishant’s room (H5, #76), be it after dinner, after lunch, just bored, just about anytime. And occasionally, we end up having disagreements. And today was one such. It was the usual Capitalism vs. Socialism debate that’s been haunting me for a while now. And this time, I tried playing the devil’s advocate and tried defending Capitalism; tried everything I had in my arsenal: globalization, trickle-down effect, jungle-culture, primal-instinct, and myriad other theories. When Amit was here a few weeks back, we had had the same argument through his entire stay in Bombay, and I tried to remember what he had used then, and tried in vain to use it now. But after an hour of heated discussions, I realized that it is incredibly hard to defend something you truly don’t believe in.

I salute this aspect of defence lawyers, esp. the ones who know whether their clients are guilty. I wonder how many of them go through conscience turmoils when their professional ethics force them to defend a guilty client and their personal morals abhor the same client.

Speaking of dilemmas, I watched and loved Swades. As Upperstall put it, its one of the few good big movies which has some social message. I will watch it again in a theatre.

Speaking of academics, I am working on the open source search engine, Nutch these days. I don’t know whether it will result in any solid contribution to the community, but I do hope that I get the required grades :)). Will get back to that now…..


Bombay is an experience; and esp. after living in a hamlet like Bangalore, Bombay hits me each time I venture out of the campus. And this weekend was especially severe. Two friends (Amit Rathore from ThoughtWorks and Akshay from IIM-Lucknow) were visiting, one for the exclusive purpose of “chilling out” and the other had some official work. But the three of us hit the yuppie circuit here, in full blast. And as Amit said before leaving, it was some severe shit. Restaurants, cafes, bars, beaches, local trains, never ending taxi drives, late late night chilly auto rides, malls, and all the other elements which every urban jungle has. But its sheer scale, and the way things are intertwined here; that is the difference. Amidst all our chaotic travelling and induldence, there was a lot of talk on economics, India, Kannada, Bombay, mis-adventures with women, books and so on. Coffeehouse philosophy at its very best; three geeks, what else can you get?!

Among other things, I got my wallet picked during a local train ride. The irony of it was, this was right after an eye-opening RSS meeting/lunch on Republic Day. Done with blocking all my bank-cards, still have to get new ones, get duplicate identity card, driver’s licence, etc. But, more significantly, all my other personal belongings in the wallet are gone now, and all of them are beyond replacement. All of them, beyond replacement.

Bombay is a cauldron. Lot of heat, tons of volume. A single wallet?