CIV1498 - Introduction to Data Science

Project - Toronto Bike Share

Team https://xkcd.com/1838/

Exploratory Data Analysis

Setup Notebook

Read the cleaned dataset

Read the last DataFrame from the "Data Wrangling and Cleaning" part into this Notebook as trips_data. For future convenience, a list of column names in this DataFrame will be printed.

1. Annual Pattern of Bike Share Usage

As a starting point, this section will provide a simple and intuitive visualization of the data set, where the daily numbers of bike share trips and the distribution of trip durations for each year (2017, 2018, 2019, 2020) will be plotted. To reduce the size of the DataFrame, a new DataFrame trips_time_info will be created by copying the date and time infromation from trips_data.

1.1 Daily numbers of bike share trips

First, since the analysis will look at the daily occurance of trips, a Date column will be created in trips_time_info to make things easier.

A function named plot_daily_ride is created to plot the daily number of trips using bike share. To better demonstrate the change in bike share usage over the years, all graphs will be plotted on a scale of 0 to 25000 trips.

It is rather evident that bike share usage has increase significantly over the years, especially for summer and early fall months, namely May, June, July, August, September, and October. These months also tend to have much higher bike share usage than the other months. In addition to the seasonal trend, the number of daily bike share trips also have seemingly weekly ups and downs, forming many sharp "spikes".

The impact of COVID-19 pandemic can also be witnessed in these graphs. For the other years, March was usually when the daily bike share trips number starts to increase after the snow season. For 2020, instead of an increase, the plot shows daily trips even less than before March for the second half of March and subsequent months. Recalling that March 17, 2020 was the day when state of emergency regarding the disease was initially declared in Ontario, this pattern corresponds to the timeline of COVID-19 response. However, the number of daily trips rose back to normal or even higher than previous years after the first round of lockdown. This is probably because that after staying at home for too long, people want to leave their houses and experience the early summer breeze.

1.2 Distribution of trip durations

A function named plot_trip_duration is created for plotting distribution of trip durations. Unlike plot_daily_trips, this function will not set a fixed scale as the objective here is to look at the shape of the distribution curve.

An interesting trip duration distribution for 2020! The other three years all have a nicely right-skewed distribution, while the distribution for 2020 looks like a right-skewed distribution with an extra chunk stacked on the right limb. The reason for this strangely shaped distribution of bike share trip durations in 2020 might be the COVID-19 pandemic—to keep social distancing, people may switch to riding a bike from public transportation such as buses and subways where keeping distance from the others is nearly impossible.

2. Seasonal Trend in Bike Share Usage

From the previous section, there is a seasonal trend in how often people would use bike share. This secion will analyze the seasonal trend of bike share usage pattern for both frequency and duration by comparing the monthly averages over the four years.

2.1 Monthly average of bike share trips

The first thing to analyze is the number of trips occured in each month. The monthly average of bike share usage between 2017 and 2020 will be plotted to provide some insight about the pattern of how bike share usage varies across the months.

This graph clearly demonstrates that the bikeshare usage between May and October is much higher than between November and April. Intuitively, the major reason is very possibly the weather condition. The summer and early fall months (between May and October) usually experience warm temperature and no snow condition, in contrast to the other half of the year where cold to sometimes freezing temperature and poor road condition due to snow makes travelling by bikes much harder.

2.2 Average trip duration for each month

There is possibility that the trip duration also varies between different months. To verify this, the average bike share trip duration for each month will be plotted.

Compared with the monthly average bike share trip count, the trip duration graph does not show a dramatic difference between each month. But the average trip duration for months between May and September is still slightly higher than the rest of the year, which loosely corresponds to the pattern of monthly average bike share usage.

3. Statutory Holidays and Bike Share Demand

Holidays may impact the bike share demand on those days. To take holidays into consideration, the bike share usage on Ontario statutory holidays will be compared with the usage on non-holiday days. Several bar plots will be created to visualize the difference in bike share usage among the holidays and average days in the months where the holidays are located. Here is a list of statutory holidays in Ontario:

Holidays Date
New Year's Day January 1
Family Day Third Monday in February
Good Friday Friday before Easter Sunday
Victoria Day Last Monday before May 25
Canada Day July 1
Labour Day First Monday in September
Thanksgiving Second Monday in October
Christmas Day December 25
Boxing Day December 26

Since quite a few holidays do not have a fixed date each year, a DataFrame containing the exact dates for these holidays between 2017 and 2020 will be created and combined with trips_data. A list of codes will be used to denote what holiday the date is.

Start with January, the major statutory holiday in January is New Year's Day on January 1. From the plot below, the bike share usage on New Year's Day is significantly lower than usual.

Move to February, the major statutory holiday in February is Family Day on the third Monday. Similar to New Year's Day, the bike share usage on Family Day is also much lower than average.

Good Friday is a little bit tricky since counting the date for this one is based on another holiday, Easter Sunday. In general, the date of Good Friday is either in late March or somewhere in April. So the bike share usage on Good Friday will be compared with the average daily usage in March and April combined. It can be seen that bike share usage on Good Friday is lower than average, but the difference between Good Friday and average days is much smaller than that for New Year's Day and Family Day.

Victoria Day is the last Monday before May 25, so the bike share usage on Victoria Day is compared with the other days in May. Similar to Good Friday, Victoria Day experience a bike share usage slightly lower than average.

There is no statutory holiday in June, so July is the next subject. Canada Day is the first day of July, and unlike the other holidays, bike share usage on this day is almost the same as the average days.

Labour Day is the first Monday in September. Bike share usage on Labour Day is lower than but very close to the daily average of the month.

Thanksgiving is the second Monday in October. The difference in bike share usage between Thanksgiving and average days is similar to Good Friday and Victoria Day.

December has two statutory holidays, Christmas Day and Boxing Day, adjacent to each other at the end of the month. Bike share usage on these two holidays are significantly lower than the average, just like the following New Year's Day.

In conclusion, almost all holidays experience lower bike share usage than the average but to different extent based on the current data set, other than Canada Day where the bike share usage is even marginally higher than average days in July. New Year's Day, Family Day, Christmas Day, and Boxing Day are statutory holidays with bike share demand markedly lower than the average, which is even lower than half of the average. Bike share demand on Good Friday, Victoria Day, Labour Day, and Thanksgiving are lower than the average, but is still more than half of the demand on average days.

4. Weekly Pattern of Bike Share Usage

Now that a panoramic view on the data set is established, this section will further analyze the pattern of daily bike share trips in a week. To make things easier, a Day of Week column will be created in trips_time_info to show which day of the week the bike share trip occured. Then, the daily average of bike share trips in a week between 2017 and 2020 will be plotted. Considering the potential impact of COVID-19 pandemic on the bike share usage in 2020, weekly pattern will be plotted seperately for each year with a function plot_trips_week.

Another interesting pattern for 2020! The weekly bike share usage patter is quite constant throughout the first three years, where weekends have lower demand than weekdays. But this pattern is completely reversed in 2020 with more demand on weekends than weekdays. A speculation on why this change occurs is that a portion of bike share users take bikes for commute before 2020, but has to switch to working from home in 2020, and bike share becomes a transport for weekend leisure rides.

5. Daily Pattern of Bike Share Usage

To zoom in a bit further, the daily pattern of bike share usage between 2017 and 2020 will be analyzed. The hourly average of bike share trips in a day will be plotted. Again, considering the impact of the pandemic in 2020, plots will be created for each year separately with a function plot_trips_day. For this part, bike share usage of annual member and casual member will be added to the graphs to have a sneak peek on user type difference. Therefore, a new DataFrame user_daily_trips will be created.

The power of COVID-19 strikes again! The hourly bike share trips patterns for 2017, 2018, and 2019 are almost identical. Annual member usage peaks around morning and evening rush hours with a local maximum at lunch time, and casual members usage changes minimally through out the day. This probably indicates that a lot of annual members take bike share as a way to commute. The highest peak for both annual members and casual members occurs during evening rush hours. A possibility is that evening rush hours are usually less time-sensitive compared with morning rush hours (i.e. late to home is less concerning than late to work/school), and bikes are limited in speed. So some annual members may take faster ways of transport during morning rush hours, while casual members may just take bike share after work/school for occasional casual ride.

Meanwhile, this pattern drastically changed in 2020. From the graph for 2020, bike share usage peak around morning rush hours is barely visible, which makes the peak around evening rush hours become the only major peak of the day. Another significant change is the rise of casual member usage. The peak hourly casual member usage skyrockets from about 150 in 2019 to almost 300 in 2020. This change is likely reflecting the way people live and work during the pandemic. The morning rush hour is diminishing, but people would still like a casual trip after a day of working at home.

6. Weather Condition and Bike Share Demand

Out of intuition, it is reasonable to think that weather condition could change people's pattern of travelling. To explore more on this, the relationship between weather and bike share usage will be discussed. The orignial DataFrame trips_data contains some weather information. To reduce the size of the DataFrame, a new DataFrame trips_weather_info will be created by copying the columns containing the weather information from trips_data. Considering the possible aspects of weather condition that may influence people's choice on whether to use bike share system or not, this section will look at weather, temperature, humidity, wind speed, and visibility.

6.1 Type of weather

Here is a list of possible types of weather recorded in the data set and number of trips occured under them.

Most bike share trips occured under a clear condition and this number is on a much higher scale than the others. To accommodate the difference in scale, all weather types other than Clear, Fog, Rain, and Snow will be categorized to Others. A pie chart will be created to visualize the bike share trips count under different types of weather.

The pie chart shows an absolue dominance of bike share trips count under clear condition. However, it is not appropriate to dictate that types of weather is correlated to bike share usage, as clear condition is the most common type of weather one can encounter at an arbituary time. To determine the relationship between types of weather and bike share usage, this pie chart needs to be compared with the time-wise distribution of different types of weather condition in Toronto. To do so, trips_weather_info needs to be grouped by hours as weather data is recorded hourly.

The porportion of time when the weather is in clear condition is actually lower than the porportion of bike share trips occurs under clear condition, while the porportions of weather are higher than proportions of trips for all other four types of weather. This means that people tend to use bike share system more often when the weather is clear than under the other weather conditions. Therefore, it is certain that there is some kind of correlation between weather and bike share usage.

6.2 Temperature

Bike share usage is very possibly related to temperature because people tend to stay inside when the temperature is not comfortable for a bike ride. The relationship between temperature and bike share usage is plotted as below. It is quite obvious that a bike share trip is much more likely to take place when the temperature is in the range between 15 °C and 25 °C.

6.3 Humidity

Humidity is a weather parameter that can impact how people feel the heat. The higher the humidity, the harder for the sweat on our skin to vaporize, thus the harder for our body to get rid of excessive heat. To put it simple, humidity changes people's perception on the current temperature, thus it is usually used in combination with temperature to provide a "feels like" temperature. According to National Geographics, human body is most comfortable at 45% relative humidity.

To check if humidity plays a rold in bike share usage pattern, the trip distribution under different humidity is plotted as below. It looks like relative humidity in the 60% to 80% range is when a bike share is most likely to happen.

6.4 Wind speed

Wind speed can effect whether people are willing to take bikes as transport. When the wind speed is too high, the resistance pressure from the are would be higher than normal, and it can be quite difficult to keep the balance on the bike and ride safely and comfortably. Below is the relationship between wind speed and bike share usage.

6.5 Visibility

Visibility is in general an important weather parameter for travelling safely. Low visibility due to fog or other reasons may obscure vision and increase the potential of traffic accident. A density plot of visibility and bike share usage is shown below. Toronto rarely experience weather with low visibility, so the probability density for bike share trips under high visibility condition is extremely hight. However, this might not be an indication that bike share usage is related to visibility.

7. Spatial Analysis

7.1 Neighbourhood differences

In this section, we will look at any patterns or disparities in the neighbourhoods where users are taking trips. Questions to explore include:

First, import the shapefile for City of Toronto neighbourhood boundaries.

Import Toronto subway stations and the bicycle network as well.

Generate a map of bikeshare and subway station locations in Toronto.

Find the number of trips departing from and arriving at each bikeshare station and add them to new columns in 'bikeshare_stations'.

Next, find the neighbourhoods with the largest number of total rides departing from bike stations located within them.

Neighbourhoods that frequently have trips departing from them tend to also have frequent arrivals as well so for the purposes of visualization, we will look at only departures to represent activity at each station. Visualize the number of departure trips at each neighbourhood on a map.

The large majority of trip are concentrated in a few neighbourhoods located in downtown Toronto. The large majority of bikeshare stations are also located in the downdtown core, with a much lower density of stations elsewhere.



7.2 Proximity to Transit

In this section, the impact of proximity to public transit on ridership at bikeshare stations will be analyzed. Questions to explore include:

Subway Station Access

First, look at the proximity to TTC Subway stations to see if there is a relationship with bikehsare station ridership.

Plot bikeshare stations on a map.

Bikeshare stations will be considered near to TTC stations if they are within 200 meters, to account for the TTC standard of 300-400 meter stop spacing.

First, create a 200 meter radius buffer around each subway station. Create a new column in subway_stations that is true if if the station is within 200 meters of a bikeshare station.

Plot on a map of Toronto.

Subway stations located within central and downtown Toronto all had easy access to bikeshare stations.

Similarly, identify bikeshare stations that are located within 200 meters of a TTC station to see if access to the subway saw an increase in bikeshare ridership at that station.

Again, departures and arrivals at each station are generally similar, so departures are used to compare ridership between stations. We can confirm this assumption by plotting departure and arrival trips at different bikeshare stations.

Departures and arrivals at different stations are generally seen to be equal, although there are a few stations that deviated away from this norm. Certain stations, particularly those without subway access, saw significantly more departures than arrivals. Nonetheless, proximity to a subway station generally did not meaningful impact arrivals compared to departures, likely due to the fact that many riders make "2-way" commute trips.

As such, compare the number of trips departing from bikeshare stations with subway access compared to without.

Bikeshare stations without subway access have a higher propotion of stations with few departure trips. In other words, they are more likely to be a comparatively underutilized station. Additionally, the bikeshare stations that have historically been the most popular are those with subway access.


7.3 Proximity to TTC Stations

Explore if proximity to TTC stops (including bus and streetcar stops) has an impact on ridership at bikeshare stops. TTC stop data available: https://open.toronto.ca/dataset/ttc-routes-and-schedules/). Specifically, 'stops.txt' with information on stop locations.

Plot these stations on a map of Toronto.

Determine if each bikeshare station is located within 200 meters of a TTC stop. Following a similar procedure as in the previous section, create a buffer around each TTC stop.

Similar to TTC access, compare trips from bikeshare stations with TTC access to those without. Like with subway station access, bikeshare stations with TTC access saw a higher average number of daily departures, although the distribution of departures was much less drastic. In other words, subway stations access was found to be a stronger influence on bikeshare station popularity than TTC access.

8. Ridership Programs and Initiatives

In this section, explore some of the programs and initiatives run by Bike Share Toronto to determine their popularity and success in attracting riders. These programs include:

8.1 Annual and Casual Membership

User Type impacts on usage behaviour

Is there a difference in usage behaviour between Casual and Annual Member riders? First we look at the overall trend in daily ridership over this period by making a new DataFrame where each row corresponds to one day, with columns of for the number of trips, annual member trips, and casual member trips.

The total count of trips per day is increasing over time, with peaks in ridership during the summer months. When we plot the trips taken each day separated by user type, we can see that annual and casual members are both increasing in ridership over this period.

The increase is more obvious when seasonal fluctuations are ignored. Plot the total annual number of trips by user type to see if Annual and Casual members are both increasing in ridership.

From the yearly data, the increase in total ridership over time is more obvious. However, 2020 saw a decrease in trips made by annual members in 2020. Given that overall ridership continues to increase, that means the uptake of trips made by casual members in the past year has increased significantly. This may reflect how travel patterns have changed due to COVID-19 restrictions. Annual members tend to make commute trips to work, and as less people are making regular commute trips to their place of work, it follows that there is a decrease in annual membership rides. People are still choosing to take bike trips with a casual membership, indicating that their usage is likely more sporadic.

2020 data only goes up to the end of October, so actual ridership for 2020 is expected to be even higher than shown in the chart here.

Trip Duration

Next, see if user type resulted in any differences in trip duration. Plot the distribution of trip durations by user type, and find the mean trip durations for each.

Annual members in general take trips with a shorter duration than casual members, with a mean duration of 11.6 minutes. Casual members take an average trip diration of Additionally, casual members have a larger spread in the duration of trips. Given that annual members tend to be commuters, it is reasonable that the trip durations both tend to be



Weekdays vs. Weekends

Looking at the day of the week members are taking rides.

Annual members make more trips on works days compared to on the weekend. On the other hand, there are many work days with few Casual member trips, and there is not a significant difference between trips taken on workdays compared to the weekend.

As the previous sections confirmed that restrictions brought on by COVID-19 were found to have an impact on ridership in 2020, plot daily trips taken in 2019 and 2020 separately. In 2019, Annual member trips show a clear distinction in ridership on workdays compared to weekends with workdays tending to have much higher ridership.

Compared to 2019, 2020 shows less of a distinction between workdays and weekends, likely due to the impact of the pandemic on commuter trips taken.

8.2 FREE RIDE WEDNESDAYS

Next, we will look at the Free Ride Wednesday (FRW) initiative. Free Ride Wednesdays give customers unlimited 30-minute trips. The program was run in July 2017, June 2018, August 2019, and September 2020. To determine the success of the program we will look at the number of trips taken on Wednesdays of FRW months compared to the number of trips taken on other weekdays. We will compare this to non-FRW months to see if there is an increase in ridership.

The average number of daily trips taken on FRW days is noticeably higher than on non-FRW Wednesdays and other days of the week.

However, we know that there are seasonal trends in bicycle ridership. To determine if days running the FRW initiative see a higher ridership than other days in a similar period, we can plot the number of daily trips taken each day.

We can see that in 2017, 2018, and 2019, the FRW initiative did result in higher than normal daily ridership compared to other days in the same year and season. This was not the case in 2020, however, which saw even fewer daily rides on FRW days than in the previous year, despite a growth in ridership overall.

This can likely be attributed to the month that the FRW initiative was run in. In 2017, 2019, and 2019, the program was run during the summer (in July, June, and August respectively) which are months that already see higher ridership. In 2020, the program was run during September, when bicycle ridership begins to decline.

Taking a closer took at 2020, like in previous years, September is past the peak of ridership during the summer months. Within September and the nearby time frame, the FRW initiative still does not have an increase in ridership. This is likely because past the summer months, electives and recreational trips made by cycling decreases.

8.3 Free Ride Weekends

Bike Share Toronto ran a Free Ride Weekend on which offered users unlimited free rides under 30 minutes on February 29 and March 1, 2020. Investigate if the initiative was popular.

The Free Ride Weekend did not see any increase in ridership compared to other weekend days in a similar time period. Like with the 2020 FRW initiative, this offer was run during a comparatively low-ridership month in February. As such, users may not be as willing to take elective trips to take advantage of the program during this time due to the temperature, road conditions, or other factors that dissuade ridership in the colder months. Additionally, in February 2020, pandemic restriction further limited the number of trips people were choosing to make.

The purpose of these initiatives is to encourage usage of the bike share program to attract riders to the program. One-day and weekend programs are unlikely to attract ridership from commuters, as travel behaviour for commuters is much more regular and annual members (who tend to be commuters) are more likely to have annual memberships, which already offer unlimited 30-minute trips. Based on these observations, it may be more beneficial to promote free ride initiatives during times when people choose to take elective or non-commute trips ie. during the summer months.