TFL cycling data Power BI Report
- 4 minutes read - 682 wordsThis Power BI Report was created for the fourth London Power BI User Group [PUG] challenge. This is the second time I have done this challenge but the first I feel I have been pleased with my submission
If you are interested in Power BI I would really recommend going to a PUG. It is great to meet and learn from people who have a similar passion. If you come to the London PUG, take part in the challenges! They are every other month so you have two months to work on the challenge. They get announced at the user groups but also on Prathy’s blog.
The next London PUG is on October 23rd with Susie Bentley, hopefully I will see you there!
The Power Query and the Python
Edit: I have since found a way to do the link scraping entirely with Power Query. I have posted about it here
The dataset/theme for this challenge was cycling in London. There was a suggested data source of http://cycling.data.tfl.gov.uk/ which I ended up using by extracting the utilization statistics. This was far more of an involved challenge than I had initially expected. This was due to the TFL web page not loading all the links to the files instantly but delayed.
This meant that the in built web scraping in Power Query would not work. The solution I went with in the end was a using the relatively new feature of Python integration in Power BI through a Python package called Selenium. Selenium allows you to use chrome or firefox to load up web pages and then perform automated tasks. I used the python code below to get a table containing the links I needed.
(I’ve included the python code in case anyone else finds it useful. I am by no means an expert python programmer so there may well be a better way to do this.)
from selenium import webdriver
import time
import pandas as pd
chromedriver = #location of your chrome driver
browser = webdriver.Chrome(executable_path=chromedriver)
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1200x600') # optional
browser = webdriver.Chrome(executable_path=chromedriver, options=options)
browser.get('http://cycling.data.tfl.gov.uk/')
time.sleep(5)
d = []
for a in browser.find_elements_by_xpath('.//a'):
link = a.get_attribute('href')
d.append(link)
df = pd.DataFrame({'Links':d})
Once I had this table I used Power Query function to first obtain the file contained within the link and then to transform it into a table. From this point on it was all simple transformations such as splitting the date time columns into separate date and time columns.
The final Power Query Step was to get the geographic info about the stations. For this I tried a variety of services from Google Maps to Bing, however I settled on Postcodes.io as it seemed to offer the best results, first I got the Lat and Long for each station from the TFL API and then parsed these into Postcodes.io to get the postcode, district, and parliamentary constituency.
The visualizations
For the data visualization I used three key custom visuals, the flow map which allowed me to show how many people were going along certain routes, the synoptic panel (which is a fantastic visual even if it does take some getting used to), and the pulse chart which allowed for a more interesting take on how the cycles are used through the course of a day.
The slicer pane as an expandable tab was taken a Power BI report that was featured in the September Power BI release blog post, I think it is a very creative solution and I have used it several times now.
Another trick that I find very helpful are dynamic titles, in this report the title will change depending on the date range selected. This is just done with a very simple DAX formula yet helps reports users a lot.
Title =
"Historical Santander Cycle Usage: "
& FORMAT ( MIN ( DateTable[Date] ), "MMM YYYY" )
& " to "
& FORMAT ( MAX ( DateTable[Date] ), "MMM YYYY" )
If you have any questions feel free to ask me either at the user group in person or in the comments of this post.