SoundEx

A couple of days ago, it was the 70th birthday of Donald Knuth (or rather, his webpage), who as you found out, is hero to many programmers around. I think he’s still working on completing his masterwork, ‘The Art Of Programming’. Anyway, his birthday has not gone unnoticed in the blogosphere. It’s almost like everybody knows each other. No, really. Really.

That brings me up to something (slightly) related: I was using DailyMotion to find older clips of the 80’s band Blondie and the search result of that site always seem to include references to movies with blondes; some of them quite, lets say, exquisite. I won’t link to a URL of such, but I encourage you try it out. Obviously, DailyMotion is using a ‘soundex’ routine: this is an algorithm that indexes keywords by sound based on (language-specific) rules. This works brilliantly for looking up people’s last names (or even first names) but not for searching specific terms like the example I mentioned earlier.

Seriously:If I’m searching for ‘Blondie’, I’m expecting results for Blondie and not ‘Blond’, ‘Blonde’, ‘Blondes’, ‘Bland’, ‘Britney Spears’. And definitely not ‘2 h0t bl0nd3s k1ss1ng 3ach 0ther’1.

1 Obviously, I used ‘some encryption’ there to ensure that your kids don’t end up on this kids-safe website when they google for ‘Blondie’. You’re welcome.