• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Technical challenge

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Technical challenge

    Ok, I appreciate that this will be beyond abilities of most people on here, but at least give it a try.

    Task: you need to estimate hardware requirements for a software that will enable fast searching (returning either -1 for not found, or positive unique number associated with each of those strings) for strings in an index with lots and lots of unique strings: at least 100 mln of them, but the solution should really scale to 1 bln and higher. String lenghts' could be different, lets say 20 bytes each.

    Perfect scalability (onto multiple servers) of solution is a big plus as it would increase capacity and provide for redundancy of your (no doubt) crappy code.

    What hardware resources do you estimate it will take to provide for at least 50 searches per second. Also your implementation quote would be useful.

    Go on I dare you prove that you are not all talk.

    #2
    That's what the hired help are for.

    HTH

    FFS Get a life ....
    Hard Brexit now!
    #prayfornodeal

    Comment


      #3
      That's AtW's new contract he needs some help with it!!!!

      Comment


        #4
        Why doesn't he do his job himself ?

        Comment


          #5
          That's correct - this is the contract (mini SKA license actually) work I am about to finish, now that it's almost done I want to know if any of you jokers could have done better than me: ie more efficient implementation that would reduce Total Cost of Ownership (TCO) for the good client of SKA Ltd.

          Comment


            #6
            I thought I'd be a smart ass and chip in to this one :-p

            Firstly, there's no such thing as 'Perfect scalability' (not yet anyway, but not impossible for the future but unlikely) so anyone who speaks of it is clearly bs'ing their way through their job.

            Secondly, the hardware is as important as the software for this.

            The question of software brings me on to what's on the market right now. There are quite a few search solutions, Autonomy and ALUI Grid Search come to mind. Both these can handle 1 billion strings. However, anyone who knows anything about search will know that relevance is more important so most engines do stemming, tokenising and other things to build an index. Anyone who says otherwise has either invented something new or is, again, bs'ing their way through their job.

            Once the software aspect has been dealt with and user expectations set correctly with regards to what a search engine is and what it can do, then we can look at hardware.

            Anyone who'll give you hardware specs without going through all of the above, is, bs'ing their way through their job...

            Happy?

            Comment


              #7
              Originally posted by AtW
              That's correct - this is the contract (mini SKA license actually) work I am about to finish, now that it's almost done I want to know if any of you jokers could have done better than me: ie more efficient implementation that would reduce Total Cost of Ownership (TCO) for the good client of SKA Ltd.

              Wow, TCO?! Have you been reading Basic Business for Dummies?
              Hard Brexit now!
              #prayfornodeal

              Comment


                #8
                Not happy - please try harder: I am finishing work and will benchmark speed in a few hours, will use 100 mln index for that, testing is done on 10 mln one.

                It would be wise to read the original challenge carefully - lots of unique strings (well you need to remove dups), you need to either confirm that given search string exists in the index (and if so return integer number associated with that string - let's say unique RowID of it) or -1 if that string does not exist: hence no issue with relevancy here - the search answer is either yes or no.

                Never used Autonomy stuff, never head of ALUI grid serach - if you know either then please tell server requirement, it does not have to be exact - and put a price on such a server(s).

                Comment


                  #9
                  Originally posted by AtW
                  Never used Autonomy stuff, never head of ALUI grid serach - if you know either then please tell server requirement, it does not have to be exact - and put a price on such a server(s).
                  Never heard of your putative competitors? Hmmm, methinks you need to re-read that Basic Business for Dummies book.
                  Hard Brexit now!
                  #prayfornodeal

                  Comment


                    #10
                    Just try to answer the question - I know it is hard to keep whole picture in that huge managerial brain of yours, but at least make an effort.

                    I wonder if my .NET code will provide requirement minimum of 100 searches per second on 100 mln unique strings database

                    Comment

                    Working...
                    X