How to get the most out of Web.BrowserContents()
So back in my post about my submission to a London Power BI Challenge I mentioned how I had issues with the page I was trying to webscrape. This was because the links that I wanted didn’t load immediately but after a short delay. At the time I used a python script work around this. Now this did work for a one off report however due to Power BI lacking of support for refreshing Python code in the service this solution could not be used for a regularly occurring report.
However, I was looking through the Power Query documentation for Web.BrowserContents() and I noticed that in it’s option record it allows you to set a delay before scraping the html. Combining this functionality with the Html.Table() example from Chris Webb achieves the same results as my script but is far better. It is faster, refreshes in the service, and does not require the use of a gateway to refresh. I just wish I knew about this a few months ago when I needed it.
Hopefully you can now avoid this and learn from my mistake, I will include some example code below as a starting point. There are a lot of possibilities with this option field such as using CSS selectors. If all goes well there will be another blog post going into further detail soon.
let
Source =
Web.BrowserContents("http://cycling.data.tfl.gov.uk/",[WaitFor = [Timeout = #duration(0,0,0,10)]]),
Links = Html.Table(Source, {{"Link","a[href^=""http""]", each [Attributes][href]}})
in
Links
3 Comments
Mark Thim · 2019-03-06 at 20:37
Thanks Thomas, I will check this out
Mark Thim · 2019-03-05 at 18:37
Hi Thomas, great post! I’m wondering if using HTML.Table would it be possible to manipulate the Table Selector. I’m trying to get data from Yahoo Finance – Income Statement Table. I need the Income Statement (Quarterly data) for Apple from this link: https://finance.yahoo.com/quote/AAPL/financials?p=AAPL&.tsrc=fin-srch
Since the page address does not change, I have not managed to extract Quarterly data instead of Annual data which appears in the initial page.
Do you you have a suggestion on how to manipulate the table selector (Annual /Quarterly)?
Regards,
Mark
Thomas foster · 2019-03-06 at 19:52
Hi Mark,
Unfortunately that website looks very difficult to web scrape due to the nature of how it is structured. Have you have already looked into some finance apis such as https://eodhistoricaldata.com/ to see if they will help solve your problem.