Searching for the next Google
The Internet search industry woke from its slumber last week when Majestic 12, an experimental British search engine, smashed the one billion page mark.
You may have thought that innovative searching was over. That Google had sewn up the market. After all, according to Internet marketers Nielsen/Netratings, Google corners almost 50 per cent of all searches.
But not so, says one Birmingham based programmer. Alex Chudnovsky, a Russian national and regular of the Contractor UK bulletin boards, has started an Internet search company by utilising the power of distributed computing.
Like many great inventions, Chudnovsky's innovation stems from necessity. "Indexing a billion pages is a huge barrier. Bandwidth is the bottleneck," he says, explaining how entrants to the search engine market are faced with huge bills for hardware and data pipes.
His alternative is to employ idling computers while their participating owners are busy doing something else. It's an example of distributed computing; contributors download a small program – from www.majestic12.co.uk – which runs in the background, crawling the Internet and sending back results in a massively compressed format, neatly sidestepping the bandwidth issue.
It would take a single PC years to index a billion pages. With just 70 participants, Chudnovsky has created the largest index of web pages, to his knowledge, owned by a UK founded company.
Not only does the distribution of web-crawling allow a small player to join the big-boy's game, it also enables individuals to tailor the websites included in the crawl. This advantage is not lost on non-English speakers fed up with the US/English prejudice in current web indexes, and Chudnovsky believes it may lead to a niche status for his search engine.
In addition, users are able to define their own ranking formula – the set of rules that determines in which order results are displayed. "Relevance is a big problem, there can be millions of matches for a search query, and only the top 10 are shown."
He has no end of ideas, and Chudnovsky is understandably enthusiastic about his work, having been full-time for over 18 months after resigning from his previous position. "A search engine is the last thing I could sacrifice," he says. "People don't realise it is such a luxury that was not even present a few years ago. It's amazing."
His amazement is what drives him, and the presence of Google doesn't cause any concerns, only a vision of possibilities. "How many products to do with search have Google released recently?" he asks. "Have they stopped? Is Google perfect?"
Virtual skies are the limit, and Chudnovsky seems destined to discover just how big the Internet is by indexing the whole lot: "With just 10,000 users, we can be bigger than Google," he says. In a distributed environment this figure is by no means beyond reach. The BBC climate change experiment has clocked up nearly 60,000 participants in the UK alone.
But participants are not contributing much in the way of hard currency, and to date, Chudnovsky has relied on self-funding - a status he is keen to change.
He will shortly release an enterprise search engine based on the technology developed for the distributed search, and he is no doubt as to the quality of such an enterprise search tool. Making a product run successfully on the Internet requires bullet-proof code because the test-data is so vast, and so full of junk, that errors in the software will be found out. By comparison, an enterprise presents a relatively coherent set of data.
Testing has also been helped by the CUK community, and Chudnovsky stresses his appreciation.
"They were the first people to join in," he says, "They helped me big time with the project when it started. They took it though the hardest stage."
"Truth is born in argument," he adds, alluding to the combative nature of some discussions on the bulletin boards.
But given there is no sexy screen saver, like the BBC's climate change experiment, what do participants of the distributed search gain for donating their computer time?
Easy: a chance to be involved at the beginning of a technically fascinating project; the opportunity to sit at the same table as Google, and perhaps, to even steal a delicious slice of the the Internet giant's favourite pie.