• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Serious Question for AtW

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Serious Question for AtW

    Q. How many pages have you indexed now?

    If distributed processing can search for aliens, why not web pages?

    Inspired by projects that harness spare time on PCs, one programmer wants to hand back control of internet searching to users

    Michael Pollitt
    Thursday March 23, 2006
    The Guardian


    Are you worried about Google's growing dominance of the internet search market? Alex Chudnovsky certainly is. To develop a community-led alternative, the Birmingham-based Russian programmer is building a new type of search engine. By harnessing the power of distributed computing, he's already managed to build an index that covers 1bn web pages.
    He has called his venture Majestic-12, possibly a reference to the alleged secret committee formed after the 1947 Roswell UFO incident. His passion for technically challenging work stems from worries about Google's iron grip on the market, its tight control of search results, and even whether some sites are indexed at all.

    "Because of their success, they have effectively created a monopoly in the virtual world. Monopolies never end up well for consumers," says Chudnovsky, who has developed search engines and other software for leading UK retailers. "I want to build the biggest UK search index."

    He has a challenge. The market research firm Nielsen/NetRatings says Google has a UK market share of more than 60%, with Yahoo and MSN trailing by a substantial margin. "To many, search and Google are synonymous. Its dominance increases bit by bit each month," says a spokesman.

    Chudnovsky, though, wants to use the technique that has worked so well - at least for recruitment - for the Search for Extraterrestrial Intelligence (Seti), cancer searches, climate change modelling and, most recently, cracking a set of encrypted messages sent from a submarine during the second world war. Distributed computing lets many participants do little bits of work to create a huge result, using spare time on their computers.

    Big shoes to fill

    Chudnovsky has a huge result to follow. Google stopped publicising the size of its search index when it reached 8bn pages four months ago. "We maintain the largest collection of documents searchable on the web," says a Google spokeswoman. "We estimate this expanded search index to be more than three times as large as any other search engine. We update the entire index about once a month, and some areas more frequently."

    In his latest book "An Introduction to Search Engines and Web Navigation" Mark Levene, the professor of computer science at Birkbeck College, says Google has more than 15,000 servers and "crawls" - examines for indexing - 3,000 URLs per second. (Other estimates have ranged from 31,000 to 79,000 servers.)

    Your home PC is clearly no match. For example, you cannot crawl more than 1m pages a day on a 2Mbps broadband connection. It will take you 8,000 days (about 22 years) to acquire a Google-sized but hopelessly out-of-date index.

    The solution? Recruit like-minded people who donate computer time, as they do for Seti@home and other projects. "Google's database is about 8bn pages, so fewer than 10,000 people taking part in this project can recrawl the whole of Google's database every single day," says Chudnovsky.

    A large-scale distributed crawling project has been attempted and involved thousands at its peak. Danny Sullivan, the editor-in-chief of Search Engine Watch, points to Looksmart's Grub project (http://grub.looksmart.com) of 2003, which is no longer operational.

    Majestic-12's volunteers - 60 so far - are crawling about 50m pages a day using unlimited broadband connections and software that runs in the background. Over the past few months, 7bn pages have been crawled although, at 1bn pages, the completed index lags behind for now. This is stored centrally to enable the Majestic-12 distributed search engine (via majestic12.co.uk) to return fast, relevant results.

    "Ideally, I'd like to distribute the search index," says Chudnovsky. This is a challenging proposition that would see duplicate chunks of a huge index distributed between broadband-connected PCs. There are also parallels with peer-to-peer systems such as Gnutella, which share music, films and software. A small-scale experiment with one country, perhaps Finland, may happen later this year.

    Professor Jon Crowcroft of Cambridge University says this type of collaborative web crawling and indexing is very reasonable. "Many search engines do this to reduce the traffic load returning to a single central site - distributing the index itself is OK, so long as you have an efficient mechanism to search the index."

    These efforts also interest Professor Levene. "I hope the project succeeds. People finding novel ways of doing crawling or search is good for the competition," he says. Should Google, Yahoo, and MSN be worried? "It would be hard to push Google out of the way - they're just going to buy you out."

    Chudnovsky's aspirations are more community-minded, helping to develop a search engine that users control. Nevertheless, his innovative code might revitalise searches on corporate websites or, more controversially, assist with search engine optimisation. But as video, images and music are added to burgeoning search engine indices, crawling and search tasks will need to become more distributed.
    What happens in General, stays in General.
    You know what they say about assumptions!

    #2
    You hit the national news, congrats AtW



    Oh you hit it a while ago (only just seen date), still congrats for then

    Comment


      #3
      Yeah, and he's been very quiet about his 'SKA Updates' ever since

      Has it all gone tits up or has he found a good woman to take care of his needs ?

      Comment


        #4
        "or has he found a good woman to take care of his needs ?" ROTFLMAO

        Yeh Mrs Palmer
        What happens in General, stays in General.
        You know what they say about assumptions!

        Comment


          #5
          I missed the original of this back in March. Mordac mentioned it, saying to me 'it was a few weeks back'. The poor dear, I think his age is addling his brain.
          What happens in General, stays in General.
          You know what they say about assumptions!

          Comment


            #6
            Alex

            "monkey trading" ltd - crashed the search engine.

            fyi.
            What happens in General, stays in General.
            You know what they say about assumptions!

            Comment


              #7
              so did game server uk, looks like it's a bit b0rked atm

              Comment


                #8
                Just got back from work, fking train was stuck midway with broken brakes and some small fire was in 1st class. How do I know? I was standing in the corridor between 1st and normal class because the train was overcrowded, ffs!

                Of course I took that as opportunity to save a few drop dead gorgeous girls and acted like a knight saying no to coming to their bedrooms because it would not be honourable to expect from a woman whom you just saved life.

                As for SKA stay tuned for big news - I just finished contract and good amount of cash is earmarked for very big hardware upgrades - tens of terabytes galore.

                Comment


                  #9
                  "Note: Word com was ignored because its too common. "

                  Atw. I dont think your search routine should tell me what to leave out!!!!!
                  What happens in General, stays in General.
                  You know what they say about assumptions!

                  Comment


                    #10
                    Good God!!!
                    Just did a bit of snooping around on Google on the bold alexei, and discovered that we have the same birthday!! To think I not only share it with Joan Collins and Marvin Hagler, but also AtW. I'm off to lie down!!
                    “The period of the disintegration of the European Union has begun. And the first vessel to have departed is Britain”

                    Comment

                    Working...
                    X