• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Help wanted - extracting data from the web using Excel 2007

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Help wanted - extracting data from the web using Excel 2007

    I use Excel web queries to pull in data from the web. Unfortunately more and more sites are desisting from using tables to structure their web pages, making it hard to just pull in the data I want. For example, I want the Bank of England base rate, which is published on their home page. But not in a table. And I've yet to discover a site where it is published in a reasonably small table.

    What's the best way forward if one can't find a site that publishes desired data in a table?

    #2
    There's a thing here that I use to check my saved lottery tickets against latest results on the web. Uses oHTTP to get the whole webpage as string, then you will have to use specific VBA to extract the values you want:

    http://www.xoggoth.org/temp/lottery.xlsm

    An alternative in VBA is to put a webbrowser on a sheet and then get the document contents as string from that:

    temp = WebBrowser1.Document.body.innerhtml

    Forget details, will be on Google, but you can also access specific element Ids in VBA, eg something like:

    rate = WebBrowser1.Document.currentrate.innerhtml
    Last edited by xoggoth; 19 November 2011, 11:15.
    bloggoth

    If everything isn't black and white, I say, 'Why the hell not?'
    John Wayne (My guru, not to be confused with my beloved prophet Jeremy Clarkson)

    Comment


      #3
      Originally posted by IR35 Avoider View Post
      I use Excel web queries to pull in data from the web. Unfortunately more and more sites are desisting from using tables to structure their web pages, making it hard to just pull in the data I want. For example, I want the Bank of England base rate, which is published on their home page. But not in a table. And I've yet to discover a site where it is published in a reasonably small table.

      What's the best way forward if one can't find a site that publishes desired data in a table?
      Google finance api

      It is deprecated but they say they have no plans to remove the functionality
      Knock first as I might be balancing my chakras.

      Comment


        #4
        Have you thought of offshoring this function to a team in Bangalore. They could do this for you for half your current costs.
        What happens in General, stays in General.
        You know what they say about assumptions!

        Comment


          #5
          I downloaded that, thanks. I will take a look later, but I was hoping for a solution that doesn't involve writing lines of code. My spreadsheet extracts 50 or so different values from about 10 different websites, which change their layouts from time to time, so I'm hoping for something not much more complicated than what I do currently, clicking on the table icon in the web query wizard. (I have a vague notion that it may be possible to write a one-line XML query to extract data from a web page, however even if that's possible I don't know if it's a straight-forward way to pull data into Excel.)

          Maybe I should write to BOE and suggest they publish their data in machine-readable form. (Perhaps they do already. I find that some sites do publish data as .xls files, but in many cases you can't automate use of the data because it's a temporary file name that changes every time the data changes, or even worse, you can only navigate to the data by manually invoking some javascript on a web page.)

          Update: I've found some accessible .xls files on BOE web site, just need to work out how to automatically import data.

          Comment


            #6
            Originally posted by suityou01 View Post
            Google finance api

            It is deprecated but they say they have no plans to remove the functionality
            I will look into Google Finance, I suppose they should do interest rates and not just stock quotes.

            Comment


              #7
              Tabular info should appear in tables. Unfortunately the 'thou shalt not use tables for layout' has been taken to mean that you shouldn't use tables at all.
              Excel web queries are a POS, btw. That horrible browser/wizard <shudder>
              +50 Xeno Geek Points
              Come back Toolpusher, scotspine, Voodooflux. Pogle
              As for the rest of you - DILLIGAF

              Purveyor of fine quality smut since 2005

              CUK Olympic University Challenge Champions 2010/2012

              Comment


                #8
                Originally posted by IR35 Avoider View Post
                I use Excel web queries to pull in data from the web. Unfortunately more and more sites are desisting from using tables to structure their web pages, making it hard to just pull in the data I want. For example, I want the Bank of England base rate, which is published on their home page. But not in a table. And I've yet to discover a site where it is published in a reasonably small table.

                What's the best way forward if one can't find a site that publishes desired data in a table?
                This is what you need:

                https://scraperwiki.com/

                Have a look at some of the existing examples, and from the code it should be pretty easy to write your own. You only need to know a subset of your chosen language (personally I prefer ruby) to write a scraper.

                The output can be viewed as HTML (in a table), or output as a CSV. You need to create a username to be able to run the scrapers.
                "A life, Jimmy, you know what that is? It’s the s*** that happens while you’re waiting for moments that never come." -- Lester Freamon

                Comment


                  #9
                  The best solution so far, pull in the following table I've just found, then use vlookup(today(),...) to locate the current value.

                  Bank of England|Statistics|Statistical Interactive Database

                  (I don't know yet how stable that link is going to be.)

                  Comment


                    #10
                    It looks like one can also extract a dynamically created cut-down version of that table in html (or excel or csv or xml) format with a link with from and to dates in it. Not actually any easier than just downloading the whole table, but fairly impressive. I take back some of the rude things I thought about BOE. But I still think they should provide an easier route to statistics important enough to feature on their home page.

                    Comment

                    Working...
                    X