Web scraping Web scraping
Page 1 of 2 12 LastLast
Posts 1 to 10 of 12

Thread: Web scraping

  1. #1

    Double Godlike!

    original PM's Avatar
    Join Date
    Apr 2008
    Location
    Cheshire
    Posts
    12,020

    Default Web scraping

    We are trying to use a service or some software to trawl the web to obtain competitor prices

    Has anyone ever done anything like this?

    Any suggestions/recommendations?

    TIA!!

  2. #2

    TPAFKAk2p2

    mudskipper's Avatar
    Join Date
    Sep 2009
    Location
    Null island
    Posts
    25,985

    Default

    Quote Originally Posted by original PM View Post
    We are trying to use a service or some software to trawl the web to obtain competitor prices

    Has anyone ever done anything like this?

    Any suggestions/recommendations?

    TIA!!
    Services out there that do this.

    e.g. Datafiniti | Intelligent Web Data for Data-Driven Businesses (might be mainly US, but they'll add in stuff to your requirements)

    What product line?

  3. #3

    Double Godlike!

    original PM's Avatar
    Join Date
    Apr 2008
    Location
    Cheshire
    Posts
    12,020

    Default

    Quote Originally Posted by mudskipper View Post
    Services out there that do this.

    e.g. Datafiniti | Intelligent Web Data for Data-Driven Businesses (might be mainly US, but they'll add in stuff to your requirements)

    What product line?
    Thanks I'll give them a try - and I cannot tell you that because it would probably be too obvious where I work!

  4. #4

    My post count is Majestic

    northernladuk's Avatar
    Join Date
    Mar 2009
    Posts
    39,816

    Default

    Quote Originally Posted by original PM View Post
    Thanks I'll give them a try - and I cannot tell you that because it would probably be too obvious where I work!
    Hee hee.
    'CUK forum personality of 2011 - Winner - Yes really!!!!

  5. #5

    My post count is Majestic

    NickFitz's Avatar
    Join Date
    Jun 2007
    Location
    Your local branch
    Posts
    45,524

    Default

    I've done a fair bit of ad hoc web scraping, usually involving a bit of Python knocked together in an hour or so. For example, I've got a script I run occasionally to archive Monday Links to AWS S3 then parse and extract to a database, so I can easily search to make sure I'm not posting the same thing twice. (I caught one I'd already posted about seven years ago recently.)

    If I'd realised people were willing to pay for that kind of thing, I would have made it a plan B ages ago

  6. #6

    I Am Legend


    Join Date
    Aug 2006
    Posts
    102,704

    Default

    Quote Originally Posted by NickFitz View Post
    I've done a fair bit of ad hoc web scraping, usually involving a bit of Python knocked together in an hour or so. For example, I've got a script I run occasionally to archive Monday Links to AWS S3 then parse and extract to a database, so I can easily search to make sure I'm not posting the same thing twice. (I caught one I'd already posted about seven years ago recently.)

    If I'd realised people were willing to pay for that kind of thing, I would have made it a plan B ages ago
    Didn't you do tpd too?

  7. #7

    Super poster

    Hobosapien's Avatar
    Join Date
    Feb 2016
    Location
    LA - la la fantasy land
    Posts
    2,533

    Default

    AtW should have been able to sell you the data, already scraped via his backlink scanning service, but last time I raised this as a potential additional income stream he said they didn't retain all the scraped data. Scrapes the whole internet and throws out the majority of the data.

    Plenty of web scraping tools out there, and sure you can scrape a competitors site, but if they catch you and have said not to do such a thing (in their robots.txt and/or T&Cs page or copyright notices) then you risk being 'done'. I suppose the risk increases if you make the data public rather than capture it for internal use only, but if they are clued up they will be checking their web logs for obvious competitor activity.
    Maybe tomorrow, I'll want to settle down. Until tomorrow, I'll just keep moving on.

  8. #8

    Super poster

    woohoo's Avatar
    Join Date
    Nov 2007
    Location
    In the country
    Posts
    4,222

    Default

    Quote Originally Posted by NickFitz View Post
    I've done a fair bit of ad hoc web scraping, usually involving a bit of Python knocked together in an hour or so. For example, I've got a script I run occasionally to archive Monday Links to AWS S3 then parse and extract to a database, so I can easily search to make sure I'm not posting the same thing twice. (I caught one I'd already posted about seven years ago recently.)

    If I'd realised people were willing to pay for that kind of thing, I would have made it a plan B ages ago
    I worked with a company that used a 3rd party company to scrape prices (wont say more as will identify the business).

    Apparently, it's a constant battle to keep the scraping software working because the sites being scrapped are constantly changing things around to stop it. I was told the 3rd party employed people to continuously keep the software up to date, doesn't sound fun at all.

  9. #9

    Super poster

    Hobosapien's Avatar
    Join Date
    Feb 2016
    Location
    LA - la la fantasy land
    Posts
    2,533

    Default

    Quote Originally Posted by woohoo View Post
    I worked with a company that used a 3rd party company to scrape prices (wont say more as will identify the business).

    Apparently, it's a constant battle to keep the scraping software working because the sites being scrapped are constantly changing things around to stop it. I was told the 3rd party employed people to continuously keep the software up to date, doesn't sound fun at all.
    Seems many companies are wasting resource scraping or preventing scraping when the better solution would be for the target to provide an API to sell the info. If it's getting 'stolen' anyway, they may as well make money from it. Also makes it more solid in court if the data is licensed via an appropriate channel and others are stealing the data to avoid paying the licence.
    Maybe tomorrow, I'll want to settle down. Until tomorrow, I'll just keep moving on.

  10. #10

    My post count is Majestic

    NickFitz's Avatar
    Join Date
    Jun 2007
    Location
    Your local branch
    Posts
    45,524

    Default

    Quote Originally Posted by BrilloPad View Post
    Didn't you do tpd too?
    Yes - that was in PHP rather than Python, though

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •