Exploratory Data Analysis
# Import 3rd party libraries
import warnings
import os
import json
import requests
import pandas as pd
import numpy as np
import seaborn as sns
import geopandas as gpd
import pickle
from datetime import date
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
#!pip install descartes
#!pip install folium
import folium
from folium import Choropleth
from folium import GeoJson
from folium import Marker
from folium.plugins import MarkerCluster
from folium.plugins import HeatMap
from shapely.geometry import MultiPoint
# Configure Notebook
%matplotlib inline
plt.style.use('fivethirtyeight')
sns.set_context("notebook")
warnings.filterwarnings('ignore')
Read the last DataFrame from the "Data Wrangling and Cleaning" part into this Notebook as trips_data
. For future convenience, a list of column names in this DataFrame will be printed.
trips_data = pd.read_pickle('trips_data_final.pkl')
trips_data.head()
Trip Id | Start Time | End Time | Trip Duration | Start Station Id | Start Station Name | End Station Id | End Station Name | User Type | Subscription Id | ... | Wind Spd Flag | Visibility (km) | Visibility Flag | Stn Press (kPa) | Stn Press Flag | Hmdx | Hmdx Flag | Wind Chill | Wind Chill Flag | Weather | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 712431 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:51:00-05:00 | 494 | 7163 | Yonge St / Wood St | 7634 | University Ave / Gerrard St W (West Side) | Annual Member | NaN | ... | NaN | 16.1 | NaN | 99.81 | NaN | NaN | NaN | NaN | NaN | Clear |
1 | 712432 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:50:00-05:00 | 425 | 7163 | Yonge St / Wood St | 7634 | University Ave / Gerrard St W (West Side) | Annual Member | NaN | ... | NaN | 16.1 | NaN | 99.81 | NaN | NaN | NaN | NaN | NaN | Clear |
2 | 712433 | 2016-12-31 23:44:00-05:00 | 2016-12-31 23:50:00-05:00 | 388 | 7163 | Yonge St / Wood St | 7634 | University Ave / Gerrard St W (West Side) | Annual Member | NaN | ... | NaN | 16.1 | NaN | 99.81 | NaN | NaN | NaN | NaN | NaN | Clear |
3 | 712435 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:02:00-05:00 | 851 | 7284 | University Ave / King St W - SMART | 7046 | Niagara St / Richmond St W | Annual Member | NaN | ... | NaN | 16.1 | NaN | 99.81 | NaN | NaN | NaN | NaN | NaN | Clear |
4 | 712436 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:00:00-05:00 | 693 | 7070 | 25 York St – Union Station South | 7172 | Strachan Ave / Princes' Blvd | Annual Member | NaN | ... | NaN | 16.1 | NaN | 99.81 | NaN | NaN | NaN | NaN | NaN | Clear |
5 rows × 43 columns
# Check for what kind of data is in the DataFrame
trips_data.columns.values.tolist()
['Trip Id', 'Start Time', 'End Time', 'Trip Duration', 'Start Station Id', 'Start Station Name', 'End Station Id', 'End Station Name', 'User Type', 'Subscription Id', 'Bike Id', 'Trip Duration (mins)', 'Start Point', 'End Point', 'geometry', 'merge_time', 'Longitude (x)', 'Latitude (y)', 'Station Name', 'Climate ID', 'Year', 'Month', 'Day', 'Time', 'Temp (°C)', 'Temp Flag', 'Dew Point Temp (°C)', 'Dew Point Temp Flag', 'Rel Hum (%)', 'Rel Hum Flag', 'Wind Dir (10s deg)', 'Wind Dir Flag', 'Wind Spd (km/h)', 'Wind Spd Flag', 'Visibility (km)', 'Visibility Flag', 'Stn Press (kPa)', 'Stn Press Flag', 'Hmdx', 'Hmdx Flag', 'Wind Chill', 'Wind Chill Flag', 'Weather']
As a starting point, this section will provide a simple and intuitive visualization of the data set, where the daily numbers of bike share trips and the distribution of trip durations for each year (2017, 2018, 2019, 2020) will be plotted. To reduce the size of the DataFrame, a new DataFrame trips_time_info
will be created by copying the date and time infromation from trips_data
.
trips_time_info = trips_data[['Start Time','End Time','Trip Duration (mins)','Year','Month','Day','Time']].copy()
trips_time_info.head()
Start Time | End Time | Trip Duration (mins) | Year | Month | Day | Time | |
---|---|---|---|---|---|---|---|
0 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:51:00-05:00 | 8.233333 | 2017 | 1 | 1 | 00:00 |
1 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:50:00-05:00 | 7.083333 | 2017 | 1 | 1 | 00:00 |
2 | 2016-12-31 23:44:00-05:00 | 2016-12-31 23:50:00-05:00 | 6.466667 | 2017 | 1 | 1 | 00:00 |
3 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:02:00-05:00 | 14.183333 | 2017 | 1 | 1 | 00:00 |
4 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:00:00-05:00 | 11.550000 | 2017 | 1 | 1 | 00:00 |
First, since the analysis will look at the daily occurance of trips, a Date
column will be created in trips_time_info
to make things easier.
trips_time_info['Date'] = pd.to_datetime(trips_time_info[['Year','Month','Day']],format='%Y%m%d')
A function named plot_daily_ride
is created to plot the daily number of trips using bike share. To better demonstrate the change in bike share usage over the years, all graphs will be plotted on a scale of 0 to 25000 trips.
def plot_daily_trips(year):
"""Plot how many bike share trips occured everyday in the given year"""
# Obtain daily number of trips for the target year
daily_trips = trips_time_info[trips_time_info['Year']==year].groupby(['Date']).size().reset_index(name='Trips Count')
# Plotting
fig,ax = plt.subplots(figsize=(14,7))
sns.lineplot(ax=ax,x='Date',y='Trips Count',data=daily_trips)
ax.xaxis.set_major_locator(mdates.MonthLocator(bymonthday=1))
ax.set(ylim=(0,25000))
ax.set_title('Daily Number of Bike Share Trips in {}'.format(year))
return
plot_daily_trips(2017)
plot_daily_trips(2018)
plot_daily_trips(2019)
plot_daily_trips(2020)
It is rather evident that bike share usage has increase significantly over the years, especially for summer and early fall months, namely May, June, July, August, September, and October. These months also tend to have much higher bike share usage than the other months. In addition to the seasonal trend, the number of daily bike share trips also have seemingly weekly ups and downs, forming many sharp "spikes".
The impact of COVID-19 pandemic can also be witnessed in these graphs. For the other years, March was usually when the daily bike share trips number starts to increase after the snow season. For 2020, instead of an increase, the plot shows daily trips even less than before March for the second half of March and subsequent months. Recalling that March 17, 2020 was the day when state of emergency regarding the disease was initially declared in Ontario, this pattern corresponds to the timeline of COVID-19 response. However, the number of daily trips rose back to normal or even higher than previous years after the first round of lockdown. This is probably because that after staying at home for too long, people want to leave their houses and experience the early summer breeze.
A function named plot_trip_duration
is created for plotting distribution of trip durations. Unlike plot_daily_trips
, this function will not set a fixed scale as the objective here is to look at the shape of the distribution curve.
def plot_trip_duration(year):
"""Plot the distribution of trip durations for the given year."""
fig,ax = plt.subplots(figsize=(9,5))
ax = sns.histplot(data=trips_time_info[trips_time_info['Year']==year], x='Trip Duration (mins)',kde=True,binwidth=1,stat='count')
ax.set_title('Distribution of Trip Durations in {}'.format(year))
ax.set_xlabel('Trip Duration (mins)')
ax.set_ylabel('Trips Count')
return
plot_trip_duration(2017)
plot_trip_duration(2018)
plot_trip_duration(2019)
plot_trip_duration(2020)
An interesting trip duration distribution for 2020! The other three years all have a nicely right-skewed distribution, while the distribution for 2020 looks like a right-skewed distribution with an extra chunk stacked on the right limb. The reason for this strangely shaped distribution of bike share trip durations in 2020 might be the COVID-19 pandemic—to keep social distancing, people may switch to riding a bike from public transportation such as buses and subways where keeping distance from the others is nearly impossible.
From the previous section, there is a seasonal trend in how often people would use bike share. This secion will analyze the seasonal trend of bike share usage pattern for both frequency and duration by comparing the monthly averages over the four years.
The first thing to analyze is the number of trips occured in each month. The monthly average of bike share usage between 2017 and 2020 will be plotted to provide some insight about the pattern of how bike share usage varies across the months.
# Obtain the monthly average of trips
count_average = trips_time_info.groupby(['Month']).size().reset_index(name='Trips Count')
# Plotting
fig,ax = plt.subplots(figsize=(11,6))
sns.barplot(ax=ax,x='Month',y='Trips Count',data=count_average)
ax.set_title('Monthly Average of Bike Share Usage between 2017 and 2020')
plt.ticklabel_format(style='plain',axis='y')
plt.show()
This graph clearly demonstrates that the bikeshare usage between May and October is much higher than between November and April. Intuitively, the major reason is very possibly the weather condition. The summer and early fall months (between May and October) usually experience warm temperature and no snow condition, in contrast to the other half of the year where cold to sometimes freezing temperature and poor road condition due to snow makes travelling by bikes much harder.
There is possibility that the trip duration also varies between different months. To verify this, the average bike share trip duration for each month will be plotted.
# Obtain the monthly average of trips
duration_average = trips_time_info.groupby(['Month'])['Trip Duration (mins)'].mean().reset_index(name='Trip Duration (mins)')
# Plotting
fig,ax = plt.subplots(figsize=(11,6))
sns.barplot(ax=ax,x='Month',y='Trip Duration (mins)',data=duration_average)
ax.set_title('Monthly Average of Bike Share Trip Duration between 2017 and 2020')
plt.ticklabel_format(style='plain',axis='y')
plt.show()
Compared with the monthly average bike share trip count, the trip duration graph does not show a dramatic difference between each month. But the average trip duration for months between May and September is still slightly higher than the rest of the year, which loosely corresponds to the pattern of monthly average bike share usage.
Holidays may impact the bike share demand on those days. To take holidays into consideration, the bike share usage on Ontario statutory holidays will be compared with the usage on non-holiday days. Several bar plots will be created to visualize the difference in bike share usage among the holidays and average days in the months where the holidays are located. Here is a list of statutory holidays in Ontario:
Holidays | Date |
---|---|
New Year's Day | January 1 |
Family Day | Third Monday in February |
Good Friday | Friday before Easter Sunday |
Victoria Day | Last Monday before May 25 |
Canada Day | July 1 |
Labour Day | First Monday in September |
Thanksgiving | Second Monday in October |
Christmas Day | December 25 |
Boxing Day | December 26 |
Since quite a few holidays do not have a fixed date each year, a DataFrame containing the exact dates for these holidays between 2017 and 2020 will be created and combined with trips_data
. A list of codes will be used to denote what holiday the date is.
# Make a dictionary of all statutory holidays in Ontario between 2017 and 2020
holidays = { pd.Timestamp('2017-01-01'):"NY",
pd.Timestamp('2017-02-20'):"FD",
pd.Timestamp('2017-04-14'):"GF",
pd.Timestamp('2017-05-22'):"VD",
pd.Timestamp('2017-07-01'):"CA",
pd.Timestamp('2017-09-04'):"LD",
pd.Timestamp('2017-10-09'):"TG",
pd.Timestamp('2017-12-25'):"CM",
pd.Timestamp('2017-12-26'):"BD",
pd.Timestamp('2018-01-01'):'NY',
pd.Timestamp('2018-02-19'):'FD',
pd.Timestamp('2018-03-30'):'GF',
pd.Timestamp('2018-05-21'):'VD',
pd.Timestamp('2018-07-01'):'CA',
pd.Timestamp('2018-09-03'):'LD',
pd.Timestamp('2018-10-08'):'TG',
pd.Timestamp('2018-12-25'):'CM',
pd.Timestamp('2018-12-26'):'BD',
pd.Timestamp('2019-01-01'):'NY',
pd.Timestamp('2019-02-18'):'FD',
pd.Timestamp('2019-04-19'):'GF',
pd.Timestamp('2019-05-20'):'VD',
pd.Timestamp('2019-07-01'):'CA',
pd.Timestamp('2019-09-02'):'LD',
pd.Timestamp('2019-10-14'):'TG',
pd.Timestamp('2019-12-25'):'CM',
pd.Timestamp('2019-12-26'):'BD',
pd.Timestamp('2020-01-01'):'NY',
pd.Timestamp('2020-02-17'):'FD',
pd.Timestamp('2020-04-10'):'GF',
pd.Timestamp('2020-05-18'):'VD',
pd.Timestamp('2020-07-01'):'CA',
pd.Timestamp('2020-09-07'):'LD',
pd.Timestamp('2020-10-12'):'TG' }
# Convert to DataFrame
holidays = pd.Series(holidays).to_frame().reset_index()
holidays.columns = ['Date','Holiday']
# View the DataFrame
holidays.head()
Date | Holiday | |
---|---|---|
0 | 2017-01-01 | NY |
1 | 2017-02-20 | FD |
2 | 2017-04-14 | GF |
3 | 2017-05-22 | VD |
4 | 2017-07-01 | CA |
# Combine Holidays with trips_data
trips_time_info = pd.merge(trips_time_info,holidays,how='left',left_on='Date',right_on='Date')
# Fill the blank in "Holiday" column with "NA"
trips_time_info['Holiday'].fillna('NA',inplace=True)
#View the DataFrame
trips_time_info.head()
Start Time | End Time | Trip Duration (mins) | Year | Month | Day | Time | Date | Holiday | |
---|---|---|---|---|---|---|---|---|---|
0 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:51:00-05:00 | 8.233333 | 2017 | 1 | 1 | 00:00 | 2017-01-01 | NY |
1 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:50:00-05:00 | 7.083333 | 2017 | 1 | 1 | 00:00 | 2017-01-01 | NY |
2 | 2016-12-31 23:44:00-05:00 | 2016-12-31 23:50:00-05:00 | 6.466667 | 2017 | 1 | 1 | 00:00 | 2017-01-01 | NY |
3 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:02:00-05:00 | 14.183333 | 2017 | 1 | 1 | 00:00 | 2017-01-01 | NY |
4 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:00:00-05:00 | 11.550000 | 2017 | 1 | 1 | 00:00 | 2017-01-01 | NY |
Start with January, the major statutory holiday in January is New Year's Day on January 1. From the plot below, the bike share usage on New Year's Day is significantly lower than usual.
NY_comparison = trips_time_info[trips_time_info['Month']==1].groupby(['Holiday','Date']).size()\
.groupby(level=0).mean().reset_index(name='Trips Count')
fig,ax = plt.subplots(figsize=(5,4))
sns.barplot(ax=ax,x='Holiday',y='Trips Count',data=NY_comparison,color='goldenrod',order=['NY','NA'])
ax.set_title('Bike Share Usage on Holidays and Average Days in January')
plt.show()
Move to February, the major statutory holiday in February is Family Day on the third Monday. Similar to New Year's Day, the bike share usage on Family Day is also much lower than average.
FD_comparison = trips_time_info[trips_time_info['Month']==2].groupby(['Holiday','Date']).size()\
.groupby(level=0).mean().reset_index(name='Trips Count')
fig,ax = plt.subplots(figsize=(5,4))
sns.barplot(ax=ax,x='Holiday',y='Trips Count',data=FD_comparison,color='skyblue',order=['FD','NA'])
ax.set_title('Bike Share Usage on Holidays and Average Days in February')
plt.show()
Good Friday is a little bit tricky since counting the date for this one is based on another holiday, Easter Sunday. In general, the date of Good Friday is either in late March or somewhere in April. So the bike share usage on Good Friday will be compared with the average daily usage in March and April combined. It can be seen that bike share usage on Good Friday is lower than average, but the difference between Good Friday and average days is much smaller than that for New Year's Day and Family Day.
GF_comparison = trips_time_info[trips_time_info['Month'].isin([3,4])].groupby(['Holiday','Date']).size()\
.groupby(level=0).mean().reset_index(name='Trips Count')
fig,ax = plt.subplots(figsize=(5,4))
sns.barplot(ax=ax,x='Holiday',y='Trips Count',data=GF_comparison,color='orchid',order=['GF','NA'])
ax.set_title('Bike Share Usage on Holidays and Average Days in March and April')
plt.show()
Victoria Day is the last Monday before May 25, so the bike share usage on Victoria Day is compared with the other days in May. Similar to Good Friday, Victoria Day experience a bike share usage slightly lower than average.
VD_comparison = trips_time_info[trips_time_info['Month']==5].groupby(['Holiday','Date']).size()\
.groupby(level=0).mean().reset_index(name='Trips Count')
fig,ax = plt.subplots(figsize=(5,4))
sns.barplot(ax=ax,x='Holiday',y='Trips Count',data=VD_comparison,color='yellowgreen',order=['VD','NA'])
ax.set_title('Bike Share Usage on Holidays and Average Days in May')
plt.show()
There is no statutory holiday in June, so July is the next subject. Canada Day is the first day of July, and unlike the other holidays, bike share usage on this day is almost the same as the average days.
CA_comparison = trips_time_info[trips_time_info['Month']==7].groupby(['Holiday','Date']).size()\
.groupby(level=0).mean().reset_index(name='Trips Count')
fig,ax = plt.subplots(figsize=(5,4))
sns.barplot(ax=ax,x='Holiday',y='Trips Count',data=CA_comparison,color='indianred',order=['CA','NA'])
ax.set_title('Bike Share Usage on Holidays and Average Days in July')
plt.show()
Labour Day is the first Monday in September. Bike share usage on Labour Day is lower than but very close to the daily average of the month.
LD_comparison = trips_time_info[trips_time_info['Month']==9].groupby(['Holiday','Date']).size()\
.groupby(level=0).mean().reset_index(name='Trips Count')
fig,ax = plt.subplots(figsize=(5,4))
sns.barplot(ax=ax,x='Holiday',y='Trips Count',data=LD_comparison,color='steelblue',order=['LD','NA'])
ax.set_title('Bike Share Usage on Holidays and Average Days in September')
plt.show()
Thanksgiving is the second Monday in October. The difference in bike share usage between Thanksgiving and average days is similar to Good Friday and Victoria Day.
TG_comparison = trips_time_info[trips_time_info['Month']==10].groupby(['Holiday','Date']).size()\
.groupby(level=0).mean().reset_index(name='Trips Count')
fig,ax = plt.subplots(figsize=(5,4))
sns.barplot(ax=ax,x='Holiday',y='Trips Count',data=TG_comparison,color='sandybrown',order=['TG','NA'])
ax.set_title('Bike Share Usage on Holidays and Average Days in October')
plt.show()
December has two statutory holidays, Christmas Day and Boxing Day, adjacent to each other at the end of the month. Bike share usage on these two holidays are significantly lower than the average, just like the following New Year's Day.
CM_BD_comparison = trips_time_info[trips_time_info['Month']==12].groupby(['Holiday','Date']).size()\
.groupby(level=0).mean().reset_index(name='Trips Count')
fig,ax = plt.subplots(figsize=(7,4))
sns.barplot(ax=ax,x='Holiday',y='Trips Count',data=CM_BD_comparison,color='turquoise',order=['CM','BD','NA'])
ax.set_title('Bike Share Usage on Holidays and Average Days in December')
plt.show()
In conclusion, almost all holidays experience lower bike share usage than the average but to different extent based on the current data set, other than Canada Day where the bike share usage is even marginally higher than average days in July. New Year's Day, Family Day, Christmas Day, and Boxing Day are statutory holidays with bike share demand markedly lower than the average, which is even lower than half of the average. Bike share demand on Good Friday, Victoria Day, Labour Day, and Thanksgiving are lower than the average, but is still more than half of the demand on average days.
Now that a panoramic view on the data set is established, this section will further analyze the pattern of daily bike share trips in a week. To make things easier, a Day of Week
column will be created in trips_time_info
to show which day of the week the bike share trip occured. Then, the daily average of bike share trips in a week between 2017 and 2020 will be plotted. Considering the potential impact of COVID-19 pandemic on the bike share usage in 2020, weekly pattern will be plotted seperately for each year with a function plot_trips_week
.
trips_time_info = trips_time_info.set_index('Date')
trips_time_info['Day of Week'] = trips_time_info.index.day_name()
trips_time_info.head()
Start Time | End Time | Trip Duration (mins) | Year | Month | Day | Time | Holiday | Day of Week | |
---|---|---|---|---|---|---|---|---|---|
Date | |||||||||
2017-01-01 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:51:00-05:00 | 8.233333 | 2017 | 1 | 1 | 00:00 | NY | Sunday |
2017-01-01 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:50:00-05:00 | 7.083333 | 2017 | 1 | 1 | 00:00 | NY | Sunday |
2017-01-01 | 2016-12-31 23:44:00-05:00 | 2016-12-31 23:50:00-05:00 | 6.466667 | 2017 | 1 | 1 | 00:00 | NY | Sunday |
2017-01-01 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:02:00-05:00 | 14.183333 | 2017 | 1 | 1 | 00:00 | NY | Sunday |
2017-01-01 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:00:00-05:00 | 11.550000 | 2017 | 1 | 1 | 00:00 | NY | Sunday |
def plot_trips_week(year):
"""Plot the weekly pattern of bike share trip usage for the given year."""
# Get the weekly pattern of bike share usage
cats = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
weekly_pattern = trips_time_info[trips_time_info['Year']==year].groupby(['Day of Week','Date']).size().groupby(level=0)\
.mean().reindex(cats).reset_index(name='Trips Count')
# Plotting
fig,ax = plt.subplots(figsize=(8,5))
sns.barplot(ax=ax,x='Day of Week',y='Trips Count',data=weekly_pattern)
ax.set_title('Average Daily Bike Share Trips in a Week in {}'.format(year))
return
plot_trips_week(2017)
plot_trips_week(2018)
plot_trips_week(2019)
plot_trips_week(2020)
Another interesting pattern for 2020! The weekly bike share usage patter is quite constant throughout the first three years, where weekends have lower demand than weekdays. But this pattern is completely reversed in 2020 with more demand on weekends than weekdays. A speculation on why this change occurs is that a portion of bike share users take bikes for commute before 2020, but has to switch to working from home in 2020, and bike share becomes a transport for weekend leisure rides.
To zoom in a bit further, the daily pattern of bike share usage between 2017 and 2020 will be analyzed. The hourly average of bike share trips in a day will be plotted. Again, considering the impact of the pandemic in 2020, plots will be created for each year separately with a function plot_trips_day
. For this part, bike share usage of annual member and casual member will be added to the graphs to have a sneak peek on user type difference. Therefore, a new DataFrame user_daily_trips
will be created.
user_daily_trips = trips_data[['Start Time','Year','Month','Day','User Type']].copy()
user_daily_trips['Date'] = pd.to_datetime(user_daily_trips[['Year','Month','Day']],format='%Y%m%d')
user_daily_trips.head()
Start Time | Year | Month | Day | User Type | Date | |
---|---|---|---|---|---|---|
0 | 2016-12-31 23:43:00-05:00 | 2017 | 1 | 1 | Annual Member | 2017-01-01 |
1 | 2016-12-31 23:43:00-05:00 | 2017 | 1 | 1 | Annual Member | 2017-01-01 |
2 | 2016-12-31 23:44:00-05:00 | 2017 | 1 | 1 | Annual Member | 2017-01-01 |
3 | 2016-12-31 23:48:00-05:00 | 2017 | 1 | 1 | Annual Member | 2017-01-01 |
4 | 2016-12-31 23:48:00-05:00 | 2017 | 1 | 1 | Annual Member | 2017-01-01 |
def plot_trips_day(year):
"""Plot the average hourly bike share trips by annual member and casual member for the given year."""
# Obtain the hourly usage of bike share by each type of user
daily_pattern = user_daily_trips[user_daily_trips['Year']==year].set_index('Start Time')\
.groupby(pd.Grouper(level='Start Time',freq='H'))['User Type']\
.agg(['count',lambda x:(x=='Annual Member').sum(),lambda x:(x=='Casual Member').sum()])
daily_pattern.columns = ['rides','annual_members','casual_members']
daily_pattern = daily_pattern.groupby(daily_pattern.index.hour).mean()
# Plotting
fig,ax = plt.subplots(figsize=(8,5))
fig = plt.plot(daily_pattern['annual_members'],label='Annual Members')
fig = plt.plot(daily_pattern['casual_members'],label='Casual Members')
plt.xticks(np.arange(0,23,step=2))
plt.xlabel('Hour of Day')
plt.ylabel('Trips Count')
plt.title('Average Hourly Bike Share Trips by Each Type of Users in {}'.format(year))
plt.legend(loc="upper left")
return
plot_trips_day(2017)
plot_trips_day(2018)
plot_trips_day(2019)
plot_trips_day(2020)
The power of COVID-19 strikes again! The hourly bike share trips patterns for 2017, 2018, and 2019 are almost identical. Annual member usage peaks around morning and evening rush hours with a local maximum at lunch time, and casual members usage changes minimally through out the day. This probably indicates that a lot of annual members take bike share as a way to commute. The highest peak for both annual members and casual members occurs during evening rush hours. A possibility is that evening rush hours are usually less time-sensitive compared with morning rush hours (i.e. late to home is less concerning than late to work/school), and bikes are limited in speed. So some annual members may take faster ways of transport during morning rush hours, while casual members may just take bike share after work/school for occasional casual ride.
Meanwhile, this pattern drastically changed in 2020. From the graph for 2020, bike share usage peak around morning rush hours is barely visible, which makes the peak around evening rush hours become the only major peak of the day. Another significant change is the rise of casual member usage. The peak hourly casual member usage skyrockets from about 150 in 2019 to almost 300 in 2020. This change is likely reflecting the way people live and work during the pandemic. The morning rush hour is diminishing, but people would still like a casual trip after a day of working at home.
Out of intuition, it is reasonable to think that weather condition could change people's pattern of travelling. To explore more on this, the relationship between weather and bike share usage will be discussed. The orignial DataFrame trips_data
contains some weather information. To reduce the size of the DataFrame, a new DataFrame trips_weather_info
will be created by copying the columns containing the weather information from trips_data
. Considering the possible aspects of weather condition that may influence people's choice on whether to use bike share system or not, this section will look at weather, temperature, humidity, wind speed, and visibility.
trips_weather_info = trips_data[['Start Time','Year','Month','Day','Time',
'Weather','Temp (°C)','Rel Hum (%)','Wind Spd (km/h)','Visibility (km)']].copy()
trips_weather_info.head()
Start Time | Year | Month | Day | Time | Weather | Temp (°C) | Rel Hum (%) | Wind Spd (km/h) | Visibility (km) | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-12-31 23:43:00-05:00 | 2017 | 1 | 1 | 00:00 | Clear | 1.5 | 69.0 | 39.0 | 16.1 |
1 | 2016-12-31 23:43:00-05:00 | 2017 | 1 | 1 | 00:00 | Clear | 1.5 | 69.0 | 39.0 | 16.1 |
2 | 2016-12-31 23:44:00-05:00 | 2017 | 1 | 1 | 00:00 | Clear | 1.5 | 69.0 | 39.0 | 16.1 |
3 | 2016-12-31 23:48:00-05:00 | 2017 | 1 | 1 | 00:00 | Clear | 1.5 | 69.0 | 39.0 | 16.1 |
4 | 2016-12-31 23:48:00-05:00 | 2017 | 1 | 1 | 00:00 | Clear | 1.5 | 69.0 | 39.0 | 16.1 |
Here is a list of possible types of weather recorded in the data set and number of trips occured under them.
weather = trips_weather_info.groupby('Weather').size()
weather
Weather Clear 7439574 Fog 143329 Freezing Rain 73 Freezing Rain,Fog 206 Freezing Rain,Snow 251 Haze 32804 Haze,Blowing Snow 82 Heavy Rain,Fog 2111 Heavy Snow 57 Moderate Rain 6048 Moderate Rain,Fog 6802 Moderate Snow 304 Rain 233180 Rain,Fog 57136 Rain,Snow 2065 Snow 84656 Snow,Blowing Snow 109 Thunderstorms 7293 Thunderstorms,Fog 1403 Thunderstorms,Heavy Rain 1237 Thunderstorms,Heavy Rain,Fog 4182 Thunderstorms,Moderate Rain 2444 Thunderstorms,Moderate Rain,Fog 152 Thunderstorms,Rain 13392 Thunderstorms,Rain,Fog 476 dtype: int64
Most bike share trips occured under a clear condition and this number is on a much higher scale than the others. To accommodate the difference in scale, all weather types other than Clear
, Fog
, Rain
, and Snow
will be categorized to Others
. A pie chart will be created to visualize the bike share trips count under different types of weather.
# Iteration setup
Others = 0
weather_trip = weather
# Iterate to count the sum of trips under non-listed weather and drop them from the Series
for n in range(len(weather)):
if weather.keys()[n] not in ['Clear','Fog','Rain','Snow']:
Others = Others+weather[n]
weather_trip = weather_trip.drop(labels=weather.keys()[n])
# Combine the count with listed weather
add = {'Others': Others}
weather_trip = weather_trip.append(pd.Series(data=add, index=['Others']))
weather_trip
Clear 7439574 Fog 143329 Rain 233180 Snow 84656 Others 138627 dtype: int64
# Prepare the variables for the chart
name = np.array(weather_trip.keys().tolist())
count = np.array(weather_trip.tolist())
percent = 100.*count/count.sum()
# Plot the pie chart
colors = ['tab:green','tab:orange','tab:blue','tab:red','tab:gray']
pie,texts = plt.pie(count,colors=colors,explode=[0.1]*len(weather_trip),startangle=90,radius=1.1)
# Add the legends
labels = ['{0} - {1:1.1f} %'.format(i,j) for i,j in zip(name,percent)]
plt.legend(pie,labels,loc='best', bbox_to_anchor=(-0.1, 1.),fontsize=10)
plt.title('Bike Share Usage under Different Types of Weather')
plt.show()
The pie chart shows an absolue dominance of bike share trips count under clear condition. However, it is not appropriate to dictate that types of weather is correlated to bike share usage, as clear condition is the most common type of weather one can encounter at an arbituary time. To determine the relationship between types of weather and bike share usage, this pie chart needs to be compared with the time-wise distribution of different types of weather condition in Toronto. To do so, trips_weather_info
needs to be grouped by hours as weather data is recorded hourly.
# Group trips data by hours
trips_weather_hourly = trips_weather_info.groupby(pd.Grouper(key='Start Time',freq='H')).agg(Trips_Count=('Day','count')
,Weather=('Weather','first'),Temp=('Temp (°C)','max'),Rel_Hum=('Rel Hum (%)','max')
,Wind_Spd=('Wind Spd (km/h)','max'),Visibility=('Visibility (km)','max'))
# View the DataFrame
trips_weather_hourly.head()
Trips_Count | Weather | Temp | Rel_Hum | Wind_Spd | Visibility | |
---|---|---|---|---|---|---|
Start Time | ||||||
2016-12-31 23:00:00-05:00 | 9 | Clear | 1.5 | 69.0 | 39.0 | 16.1 |
2017-01-01 00:00:00-05:00 | 18 | Clear | 1.5 | 69.0 | 39.0 | 16.1 |
2017-01-01 01:00:00-05:00 | 13 | Clear | 1.5 | 68.0 | 35.0 | 16.1 |
2017-01-01 02:00:00-05:00 | 15 | Clear | 1.2 | 68.0 | 37.0 | 16.1 |
2017-01-01 03:00:00-05:00 | 10 | Clear | 1.3 | 67.0 | 37.0 | 16.1 |
# Get weather distribution
trips_weather_hourly = trips_weather_hourly.replace(to_replace=[x for x in trips_weather_hourly['Weather'].unique().tolist()
if x not in ('Clear','Fog','Rain','Snow')],value='Others')
weather_dist = trips_weather_hourly.groupby('Weather').size().reindex(index = ['Clear','Fog','Rain','Snow','Others'])
weather_dist
Weather Clear 28179 Fog 1163 Rain 1648 Snow 1320 Others 1291 dtype: int64
# Prepare the variables for the chart
name = np.array(weather_dist.keys().tolist())
count = np.array(weather_dist.tolist())
percent = 100.*count/count.sum()
# Plot the pie chart
colors = ['tab:green','tab:orange','tab:blue','tab:red','tab:gray']
pie,texts = plt.pie(count,colors=colors,explode=[0.1]*len(weather_dist),startangle=90,radius=1.1)
# Add the legends
labels = ['{0} - {1:1.1f} %'.format(i,j) for i,j in zip(name,percent)]
plt.legend(pie,labels,loc='best', bbox_to_anchor=(-0.1, 1.),fontsize=10)
plt.title('Distribution of Different Weather Types')
plt.show()
The porportion of time when the weather is in clear condition is actually lower than the porportion of bike share trips occurs under clear condition, while the porportions of weather are higher than proportions of trips for all other four types of weather. This means that people tend to use bike share system more often when the weather is clear than under the other weather conditions. Therefore, it is certain that there is some kind of correlation between weather and bike share usage.
Bike share usage is very possibly related to temperature because people tend to stay inside when the temperature is not comfortable for a bike ride. The relationship between temperature and bike share usage is plotted as below. It is quite obvious that a bike share trip is much more likely to take place when the temperature is in the range between 15 °C and 25 °C.
fig,ax = plt.subplots(figsize=(9,5))
ax = sns.kdeplot(data=trips_weather_info,x='Temp (°C)',bw_adjust=2)
ax.set_title('Distribution of Bike Share Trips at Different Temperature')
ax.set_xlabel('Temperature (°C)')
ax.set_ylabel('Probability Density')
plt.show()
Humidity is a weather parameter that can impact how people feel the heat. The higher the humidity, the harder for the sweat on our skin to vaporize, thus the harder for our body to get rid of excessive heat. To put it simple, humidity changes people's perception on the current temperature, thus it is usually used in combination with temperature to provide a "feels like" temperature. According to National Geographics, human body is most comfortable at 45% relative humidity.
To check if humidity plays a rold in bike share usage pattern, the trip distribution under different humidity is plotted as below. It looks like relative humidity in the 60% to 80% range is when a bike share is most likely to happen.
fig,ax = plt.subplots(figsize=(9,5))
ax = sns.kdeplot(data=trips_weather_info,x='Rel Hum (%)',bw_adjust=2)
ax.set_title('Distribution of Bike Share Trips at Different Relative Humidity Level')
ax.set_xlabel('Relative Humidity (%)')
ax.set_ylabel('Probability Density')
plt.show()
Wind speed can effect whether people are willing to take bikes as transport. When the wind speed is too high, the resistance pressure from the are would be higher than normal, and it can be quite difficult to keep the balance on the bike and ride safely and comfortably. Below is the relationship between wind speed and bike share usage.
fig,ax = plt.subplots(figsize=(9,5))
ax = sns.kdeplot(data=trips_weather_info,x='Wind Spd (km/h)',bw_adjust=4)
ax.set_title('Distribution of Bike Share Trips at Different Wind Speed')
ax.set_xlabel('Wind Speed (km/h)')
ax.set_ylabel('Probability Density')
plt.show()
Visibility is in general an important weather parameter for travelling safely. Low visibility due to fog or other reasons may obscure vision and increase the potential of traffic accident. A density plot of visibility and bike share usage is shown below. Toronto rarely experience weather with low visibility, so the probability density for bike share trips under high visibility condition is extremely hight. However, this might not be an indication that bike share usage is related to visibility.
fig,ax = plt.subplots(figsize=(9,5))
ax = sns.kdeplot(data=trips_weather_info,x='Visibility (km)',bw_adjust=2)
ax.set_title('Distribution of Bike Share Trips at Different Visibility')
ax.set_xlabel('Visibility (km)')
ax.set_ylabel('Probability Density')
plt.show()
In this section, we will look at any patterns or disparities in the neighbourhoods where users are taking trips. Questions to explore include:
First, import the shapefile for City of Toronto neighbourhood boundaries.
# Import City of Toronto neighbourhood boundaries.
neighbourhoods = gpd.read_file('toronto_neighbourhoods.shp')
# Drop all columns except the neighbourhood name and geometry. Change the name of the neighbourhood column to name.
neighbourhoods = neighbourhoods[['FIELD_8', 'geometry']].rename(columns={'FIELD_8': 'name'})
# Remove neighbourhood id from name.
neighbourhoods['name'] = neighbourhoods['name'].str.replace('\(.*?\)', '').str.strip()
print('There are {} neighbourhoods'.format(len(neighbourhoods)))
neighbourhoods.head()
neighbourhoods.crs
There are 140 neighbourhoods
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
Import Toronto subway stations and the bicycle network as well.
# Import City of Toronto subway stations.
subway_stations = gpd.read_file('subway_stations.shp')
# Drop all columns except the neighbourhood name and geometry. Change the name of the neighbourhood column to name.
subway_stations = subway_stations[['STATION', 'LINE', 'geometry']]
# Convert to coordinate system EPSG 4326.
subway_stations = subway_stations.to_crs(epsg=4326)
print('There are {} subway stations'.format(len(subway_stations)))
subway_stations.head()
There are 65 subway stations
STATION | LINE | geometry | |
---|---|---|---|
0 | Kipling | Bloor-Danforth | POINT (-79.53583 43.63734) |
1 | Islington | Bloor-Danforth | POINT (-79.52462 43.64537) |
2 | Royal York | Bloor-Danforth | POINT (-79.51133 43.64825) |
3 | Old Mill | Bloor-Danforth | POINT (-79.49510 43.65010) |
4 | Jane | Bloor-Danforth | POINT (-79.48446 43.64978) |
# Import bikeshare station data.
bikeshare_stations = pd.read_csv('bikeshare_stations.csv')
# Add geometry as a point from lat/lon
bikeshare_stations = gpd.GeoDataFrame(bikeshare_stations, geometry=gpd.points_from_xy(bikeshare_stations.lon, bikeshare_stations.lat))
# Add coordinate system EPSG 4326.
bikeshare_stations.crs = {'init': 'epsg:4326'}
print('There are {} bikeshare stations'.format(len(bikeshare_stations)))
bikeshare_stations.head()
There are 610 bikeshare stations
Station Id | Station Name | lat | lon | capacity | geometry | |
---|---|---|---|---|---|---|
0 | 7000 | Fort York Blvd / Capreol Ct | 43.639832 | -79.395954 | 35 | POINT (-79.39595 43.63983) |
1 | 7001 | Lower Jarvis St / The Esplanade | 43.647830 | -79.370698 | 15 | POINT (-79.37070 43.64783) |
2 | 7002 | St. George St / Bloor St W | 43.667333 | -79.399429 | 19 | POINT (-79.39943 43.66733) |
3 | 7003 | Madison Ave / Bloor St W | 43.667158 | -79.402761 | 15 | POINT (-79.40276 43.66716) |
4 | 7004 | University Ave / Elm St | 43.656518 | -79.389099 | 11 | POINT (-79.38910 43.65652) |
Generate a map of bikeshare and subway station locations in Toronto.
ax = neighbourhoods.plot(figsize = (15, 8), edgecolor='w', alpha=0.75)
bikeshare_stations.plot(ax = ax, color='green', edgecolor = 'k', label='Bike Stations')
subway_stations.plot(ax = ax, color='red', edgecolor='k', label='Subway Stations')
plt.legend(fontsize=16, loc=2)
plt.xlabel('East, meters', fontsize=18)
plt.ylabel('North, meters', fontsize=18)
plt.show()
Find the number of trips departing from and arriving at each bikeshare station and add them to new columns in 'bikeshare_stations'.
# For simplicity in this section, take only columns with location and geometry data from trips_data.
geo_data = trips_data[['Trip Id', 'Start Time', 'End Time', 'Trip Duration',
'Start Station Id', 'Start Station Name', 'End Station Id',
'End Station Name', 'User Type', 'Trip Duration (mins)', 'Start Point', 'End Point',
'merge_time', 'Station Name', 'Year', 'Month', 'Day', 'Time', 'geometry']]
geo_data.head()
Trip Id | Start Time | End Time | Trip Duration | Start Station Id | Start Station Name | End Station Id | End Station Name | User Type | Trip Duration (mins) | Start Point | End Point | merge_time | Station Name | Year | Month | Day | Time | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 712431 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:51:00-05:00 | 494 | 7163 | Yonge St / Wood St | 7634 | University Ave / Gerrard St W (West Side) | Annual Member | 8.233333 | (-79.3825, 43.6622222) | (-79.3899265, 43.6578449) | 2017-01-01 00:00:00-05:00 | TORONTO CITY CENTRE | 2017 | 1 | 1 | 00:00 | MULTIPOINT (-79.38250 43.66222, -79.38993 43.6... |
1 | 712432 | 2016-12-31 23:43:00-05:00 | 2016-12-31 23:50:00-05:00 | 425 | 7163 | Yonge St / Wood St | 7634 | University Ave / Gerrard St W (West Side) | Annual Member | 7.083333 | (-79.3825, 43.6622222) | (-79.3899265, 43.6578449) | 2017-01-01 00:00:00-05:00 | TORONTO CITY CENTRE | 2017 | 1 | 1 | 00:00 | MULTIPOINT (-79.38250 43.66222, -79.38993 43.6... |
2 | 712433 | 2016-12-31 23:44:00-05:00 | 2016-12-31 23:50:00-05:00 | 388 | 7163 | Yonge St / Wood St | 7634 | University Ave / Gerrard St W (West Side) | Annual Member | 6.466667 | (-79.3825, 43.6622222) | (-79.3899265, 43.6578449) | 2017-01-01 00:00:00-05:00 | TORONTO CITY CENTRE | 2017 | 1 | 1 | 00:00 | MULTIPOINT (-79.38250 43.66222, -79.38993 43.6... |
3 | 712435 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:02:00-05:00 | 851 | 7284 | University Ave / King St W - SMART | 7046 | Niagara St / Richmond St W | Annual Member | 14.183333 | (-79.3847928, 43.6477785) | (-79.409597, 43.64534) | 2017-01-01 00:00:00-05:00 | TORONTO CITY CENTRE | 2017 | 1 | 1 | 00:00 | MULTIPOINT (-79.38479 43.64778, -79.40960 43.6... |
4 | 712436 | 2016-12-31 23:48:00-05:00 | 2017-01-01 00:00:00-05:00 | 693 | 7070 | 25 York St – Union Station South | 7172 | Strachan Ave / Princes' Blvd | Annual Member | 11.550000 | (-79.380414, 43.643667) | (-79.4088888888889, 43.635) | 2017-01-01 00:00:00-05:00 | TORONTO CITY CENTRE | 2017 | 1 | 1 | 00:00 | MULTIPOINT (-79.38041 43.64367, -79.40889 43.6... |
# Function that returns the number of Start and End locations in a given neighbourhood
bikeshare_station_data = bikeshare_stations
def stn_departure_counter(station):
departures = sum(geo_data['Start Station Id'] == station)
return departures
def stn_arrival_counter(station):
arrivals = sum(geo_data['End Station Id'] == station)
return arrivals
# Add a column for number of trips arriving and ending at each bikeshare station
bikeshare_station_data['Departures'] = bikeshare_station_data['Station Id'].apply(stn_departure_counter)
bikeshare_station_data['Arrivals'] = bikeshare_station_data['Station Id'].apply(stn_arrival_counter)
# The three stations with the most number of trips.
print('The Stations with the greatest number of departing trips are')
bikeshare_station_data.sort_values(by = ['Departures'], ascending = False)[['Station Name', 'Departures']].head()
# View GeoDataFrame
bikeshare_station_data.head()
The Stations with the greatest number of departing trips are
Station Id | Station Name | lat | lon | capacity | geometry | Departures | Arrivals | |
---|---|---|---|---|---|---|---|---|
0 | 7000 | Fort York Blvd / Capreol Ct | 43.639832 | -79.395954 | 35 | POINT (-79.39595 43.63983) | 54239 | 50856 |
1 | 7001 | Lower Jarvis St / The Esplanade | 43.647830 | -79.370698 | 15 | POINT (-79.37070 43.64783) | 28791 | 34739 |
2 | 7002 | St. George St / Bloor St W | 43.667333 | -79.399429 | 19 | POINT (-79.39943 43.66733) | 42372 | 38269 |
3 | 7003 | Madison Ave / Bloor St W | 43.667158 | -79.402761 | 15 | POINT (-79.40276 43.66716) | 26538 | 22901 |
4 | 7004 | University Ave / Elm St | 43.656518 | -79.389099 | 11 | POINT (-79.38910 43.65652) | 23454 | 22748 |
Next, find the neighbourhoods with the largest number of total rides departing from bike stations located within them.
# Function that returns the total number of rides from bicycle stations located within the niehgbourhood.
neighbourhood_data = neighbourhoods
def departure_counter(neighbourhood):
departures = sum(bikeshare_station_data.loc[bikeshare_station_data.geometry.apply(lambda x: x.within(neighbourhood)), 'Departures'])
return departures
def arrival_counter(neighbourhood):
arrivals = sum(bikeshare_station_data.loc[bikeshare_station_data.geometry.apply(lambda x: x.within(neighbourhood)), 'Arrivals'])
return arrivals
# Add a column for number of trips arriving and ending in each neighbourhood
neighbourhood_data['Departures'] = neighbourhood_data.geometry.apply(departure_counter)
neighbourhood_data['Arrivals'] = neighbourhood_data.geometry.apply(arrival_counter)
neighbourhood_data.head()
name | geometry | Departures | Arrivals | |
---|---|---|---|---|
0 | Wychwood | POLYGON ((-79.43592 43.68015, -79.43492 43.680... | 45089 | 30790 |
1 | Yonge-Eglinton | POLYGON ((-79.41096 43.70408, -79.40962 43.704... | 28606 | 24093 |
2 | Yonge-St.Clair | POLYGON ((-79.39119 43.68108, -79.39141 43.680... | 6005 | 4631 |
3 | York University Heights | POLYGON ((-79.50529 43.75987, -79.50488 43.759... | 5292 | 5255 |
4 | Yorkdale-Glen Park | POLYGON ((-79.43969 43.70561, -79.44011 43.705... | 0 | 0 |
# Summarize the results
# The neighbourhoods with the greatest and least number of departing trips.
print('The neighbourhoods with the greatest number of departing trips are {}.'.format(', '.join(
neighbourhood_data.sort_values(by = ['Departures'], ascending = False)['name'][0:5].tolist())))
print('The neighbourhoods with the least number of departing trips are {}. \n'.format(', '.join(
neighbourhood_data.sort_values(by = ['Departures'])['name'][0:5].tolist())))
# The neighbourhoods with the greatest and least number of trips arriving.
print('The neighbourhoods with the greatest number of trips arriving are {}.'.format(', '.join(
neighbourhood_data.sort_values(by = ['Arrivals'], ascending = False)['name'][0:5].tolist())))
print('The neighbourhoods with the least number of trips arriving are {}. \n'.format(', '.join(
neighbourhood_data.sort_values(by = ['Arrivals'])['name'][0:5].tolist())))
# The number of neighbourhoods with active bikeshare stations.
print('There are {} neighbourhoods with active bikeshare stations.'.format(
sum((neighbourhood_data['Departures'] > 0) | (neighbourhood_data['Arrivals'] > 0))))
The neighbourhoods with the greatest number of departing trips are Waterfront Communities-The Island, Bay Street Corridor, Church-Yonge Corridor, Kensington-Chinatown, Niagara. The neighbourhoods with the least number of departing trips are Willowdale West, Black Creek, Birchcliffe-Cliffside, Bendale, Beechborough-Greenbrook. The neighbourhoods with the greatest number of trips arriving are Waterfront Communities-The Island, Bay Street Corridor, Church-Yonge Corridor, Kensington-Chinatown, Niagara. The neighbourhoods with the least number of trips arriving are Willowdale West, Black Creek, Birchcliffe-Cliffside, Bendale, Beechborough-Greenbrook. There are 71 neighbourhoods with active bikeshare stations.
Neighbourhoods that frequently have trips departing from them tend to also have frequent arrivals as well so for the purposes of visualization, we will look at only departures to represent activity at each station. Visualize the number of departure trips at each neighbourhood on a map.
# Create a GeoDataFrame gemoetry and neighbourhood name as index.
plot_geography = neighbourhood_data[['name', 'geometry']]
plot_geography = plot_geography.set_index('name')
# Base map
map_1 = folium.Map(location=[43.720, -79.3871], tiles='cartodbpositron', zoom_start=11)
# Add a choropleth map to the base map
Choropleth(geo_data = plot_geography.__geo_interface__,
columns = ['name', 'Departures'],
data = neighbourhood_data,
key_on='feature.id',
fill_color='YlOrRd',
legend_name='Bicycle Trip Departures)'
).add_to(map_1)
# Save to a file
map_1.save("Neighbourhood Departures.html")
# Display the map
map_1
The large majority of trip are concentrated in a few neighbourhoods located in downtown Toronto. The large majority of bikeshare stations are also located in the downdtown core, with a much lower density of stations elsewhere.
In this section, the impact of proximity to public transit on ridership at bikeshare stations will be analyzed. Questions to explore include:
Subway Station Access
First, look at the proximity to TTC Subway stations to see if there is a relationship with bikehsare station ridership.
Plot bikeshare stations on a map.
# Now plot trip departures on the map using the MarkerCluster Plugin
# Create base map
map_2 = folium.Map(location=[43.7100, -79.3871], tiles='cartodbpositron', zoom_start=11)
# Add points to the map
for idx, row in bikeshare_station_data.to_crs(epsg=4326).iterrows():
Marker([row.geometry.y, row.geometry.x]).add_to(map_2)
# Cluster points on new map
map_3 = folium.Map(location=[43.7100, -79.3871], tiles='cartodbpositron', zoom_start=11)
mc = MarkerCluster()
for idx, row in bikeshare_station_data.to_crs(epsg=4326).iterrows():
mc.add_child(Marker([row.geometry.y, row.geometry.x]).add_to(map_2))
map_3.add_child(mc)
# Display map
map_3
Bikeshare stations will be considered near to TTC stations if they are within 200 meters, to account for the TTC standard of 300-400 meter stop spacing.
First, create a 200 meter radius buffer around each subway station. Create a new column in subway_stations that is true if if the station is within 200 meters of a bikeshare station.
# Create a 200m buffer around each bikeshare station.
bikeshare_stations_buffer = bikeshare_station_data.to_crs(epsg=26917).geometry.buffer(200)
# Combine all buffers into one multipolygon for simplicity.
bike_station_union = bikeshare_stations_buffer.geometry.unary_union
# Convert to GeoDataFrame
bike_station_union = gpd.GeoDataFrame(geometry=[bike_station_union], crs='EPSG:26917')
# Create a column for each subway station that indicates if it has bikeshare station access within 200m.
subway_stations_data = subway_stations.to_crs(epsg=26917)
subway_stations_data['Bikeshare Access'] = subway_stations_data.geometry.apply(
lambda x: bike_station_union.contains(x))
subway_stations_data.head()
STATION | LINE | geometry | Bikeshare Access | |
---|---|---|---|---|
0 | Kipling | Bloor-Danforth | POINT (618101.613 4832636.300) | False |
1 | Islington | Bloor-Danforth | POINT (618990.613 4833544.113) | False |
2 | Royal York | Bloor-Danforth | POINT (620056.496 4833882.764) | False |
3 | Old Mill | Bloor-Danforth | POINT (621361.678 4834111.901) | False |
4 | Jane | Bloor-Danforth | POINT (622220.664 4834091.381) | True |
# Find percent of subway stations with bikeshare access.
print('{} % of subways stations are within 200 meters of a bike station.'.format(
100*sum(subway_stations_data['Bikeshare Access']) / len(subway_stations_data['Bikeshare Access'])))
67.6923076923077 % of subways stations are within 200 meters of a bike station.
Plot on a map of Toronto.
# Create a map showing stations with bikeshare access.
map_4 = folium.Map(location=[43.6426, -79.3871],
tiles='cartodbpositron',
zoom_start=10)
# Plot each polygon on the map
GeoJson(bike_station_union.to_crs(epsg=4326)).add_to(map_4)
# Add points to the map
for idx, row in subway_stations_data.to_crs(epsg=4326).iterrows():
if row['Bikeshare Access']:
Marker([row.geometry.y, row.geometry.x],
icon=folium.Icon(color='green'),
popup=row['STATION']).add_to(map_4)
else:
Marker([row.geometry.y, row.geometry.x],
icon=folium.Icon(color='red'),
popup=row['STATION']).add_to(map_4)
# Show the map
map_4
Subway stations located within central and downtown Toronto all had easy access to bikeshare stations.
Similarly, identify bikeshare stations that are located within 200 meters of a TTC station to see if access to the subway saw an increase in bikeshare ridership at that station.
# Create a 200m buffer around each subway station.
subway_stations_buffer = subway_stations_data.to_crs(epsg=26917).geometry.buffer(200)
# Combine all buffers into one multipolygon for simplicity.
subway_stations_union = subway_stations_buffer.geometry.unary_union
# Convert to GeoDataFrame
subway_stations_union = gpd.GeoDataFrame(geometry=[subway_stations_union], crs='EPSG:26917')
# Create a column for each bikeshare station that indicates if it has subway station access within 200m.
bikeshare_station_data = bikeshare_station_data.to_crs(epsg=26917)
bikeshare_station_data['Subway Access'] = bikeshare_station_data.geometry.apply(
lambda x: subway_stations_union.contains(x))
bikeshare_station_data.head()
Station Id | Station Name | lat | lon | capacity | geometry | Departures | Arrivals | Subway Access | |
---|---|---|---|---|---|---|---|---|---|
0 | 7000 | Fort York Blvd / Capreol Ct | 43.639832 | -79.395954 | 35 | POINT (629379.266 4833121.140) | 54239 | 50856 | False |
1 | 7001 | Lower Jarvis St / The Esplanade | 43.647830 | -79.370698 | 15 | POINT (631398.949 4834049.105) | 28791 | 34739 | False |
2 | 7002 | St. George St / Bloor St W | 43.667333 | -79.399429 | 19 | POINT (629040.059 4836170.077) | 42372 | 38269 | True |
3 | 7003 | Madison Ave / Bloor St W | 43.667158 | -79.402761 | 15 | POINT (628771.800 4836145.462) | 26538 | 22901 | True |
4 | 7004 | University Ave / Elm St | 43.656518 | -79.389099 | 11 | POINT (629896.208 4834985.048) | 23454 | 22748 | True |
# Find percent of subway stations with bikeshare access.
print('There are {} bikeshare stations within 200 meters of a subway station.'.format(sum(bikeshare_station_data['Subway Access'])))
print('{} % of bikeshare stations have subway access.'.format(
100*sum(bikeshare_station_data['Subway Access']) / len(bikeshare_station_data['Subway Access'])))
There are 102 bikeshare stations within 200 meters of a subway station. 16.721311475409838 % of bikeshare stations have subway access.
Again, departures and arrivals at each station are generally similar, so departures are used to compare ridership between stations. We can confirm this assumption by plotting departure and arrival trips at different bikeshare stations.
sns.lmplot(x = 'Departures', y = 'Arrivals', hue = 'Subway Access', data=bikeshare_station_data)
<seaborn.axisgrid.FacetGrid at 0x7fade8c95f10>
Departures and arrivals at different stations are generally seen to be equal, although there are a few stations that deviated away from this norm. Certain stations, particularly those without subway access, saw significantly more departures than arrivals. Nonetheless, proximity to a subway station generally did not meaningful impact arrivals compared to departures, likely due to the fact that many riders make "2-way" commute trips.
As such, compare the number of trips departing from bikeshare stations with subway access compared to without.
#Plot distribution of start times
f, ax = plt.subplots(figsize = (6, 5))
ax = sns.distplot(bikeshare_station_data.loc[bikeshare_station_data['Subway Access'] == True, 'Departures'], label = 'Subway Access')
ax = sns.distplot(bikeshare_station_data.loc[bikeshare_station_data['Subway Access'] == False, 'Departures'], label = 'No Subway Access')
ax.set_title('Distribution of Station Departures by Subway Access', size = 14)
ax.set_xlabel('Total Departures', size = 14)
ax.set_ylabel('Probability Density', size = 14)
ax.set(xlim=(0, 90000))
ax.legend()
<matplotlib.legend.Legend at 0x7fade073fc70>
print('The average number of daily trips departing from bikeshare stations with subway access is {}.'.format(bikeshare_station_data.loc[bikeshare_station_data['Subway Access'] == True, 'Departures'].mean()))
print('The average number of daily trips departing from bikeshare stations without subway access is {}.'.format(bikeshare_station_data.loc[bikeshare_station_data['Subway Access'] == False, 'Departures'].mean()))
The average number of daily trips departing from bikeshare stations with subway access is 15698.960784313726. The average number of daily trips departing from bikeshare stations without subway access is 12673.370078740158.
Bikeshare stations without subway access have a higher propotion of stations with few departure trips. In other words, they are more likely to be a comparatively underutilized station. Additionally, the bikeshare stations that have historically been the most popular are those with subway access.
Explore if proximity to TTC stops (including bus and streetcar stops) has an impact on ridership at bikeshare stops. TTC stop data available: https://open.toronto.ca/dataset/ttc-routes-and-schedules/). Specifically, 'stops.txt' with information on stop locations.
# Import bikeshare station data.
ttc_stations = pd.read_csv('stops.txt')
# Add geometry as a point from lat/lon
ttc_stations = gpd.GeoDataFrame(ttc_stations, geometry = gpd.points_from_xy(ttc_stations.stop_lon, ttc_stations.stop_lat))
# Drop empty columns
ttc_stations = ttc_stations[['stop_id', 'stop_code', 'stop_name', 'stop_lat', 'stop_lon', 'wheelchair_boarding', 'geometry']]
# Add coordinate system EPSG 4326.
ttc_stations.crs = {'init': 'epsg:4326'}
print('There are {} TTC stations'.format(len(ttc_stations)))
ttc_stations.head()
There are 9496 TTC stations
stop_id | stop_code | stop_name | stop_lat | stop_lon | wheelchair_boarding | geometry | |
---|---|---|---|---|---|---|---|
0 | 262 | 662 | DANFORTH RD AT KENNEDY RD | 43.714379 | -79.260939 | 2 | POINT (-79.26094 43.71438) |
1 | 263 | 929 | DAVENPORT RD AT BEDFORD RD | 43.674448 | -79.399659 | 1 | POINT (-79.39966 43.67445) |
2 | 264 | 940 | DAVENPORT RD AT DUPONT ST | 43.675511 | -79.401938 | 2 | POINT (-79.40194 43.67551) |
3 | 265 | 1871 | DAVISVILLE AVE AT CLEVELAND ST | 43.702088 | -79.378112 | 1 | POINT (-79.37811 43.70209) |
4 | 266 | 11700 | DISCO RD AT ATTWELL DR | 43.701362 | -79.594843 | 1 | POINT (-79.59484 43.70136) |
Plot these stations on a map of Toronto.
ax = neighbourhoods.plot(figsize = (15, 8), edgecolor='w', alpha=0.75)
ttc_stations.plot(ax = ax, color='red', label='Subway Stations', markersize = 2)
bikeshare_stations.plot(ax = ax, color='green', edgecolor = 'k', label='Bike Stations', markersize = 10)
plt.legend(fontsize=16, loc=2)
plt.xlabel('East, meters', fontsize=18)
plt.ylabel('North, meters', fontsize=18)
plt.show()
Determine if each bikeshare station is located within 200 meters of a TTC stop. Following a similar procedure as in the previous section, create a buffer around each TTC stop.
# Create a 200m buffer around each subway station.
ttc_stations_buffer = ttc_stations.to_crs(epsg=26917).geometry.buffer(200)
# Combine all buffers into one multipolygon for simplicity.
ttc_stations_union = ttc_stations_buffer.geometry.unary_union
# Convert to GeoDataFrame
ttc_stations_union = gpd.GeoDataFrame(geometry=[ttc_stations_union], crs='EPSG:26917')
# See if each bikeshare station has TTC stop access.
bikeshare_station_data['TTC Access'] = bikeshare_station_data.geometry.apply(
lambda x: ttc_stations_union.contains(x))
print('There are {} bikeshare stations with TTC access.'.format(sum(bikeshare_station_data['TTC Access'])))
print('{} % of bikeshare stations have TTC access'.format(
100 * sum(bikeshare_station_data['TTC Access']) / len(bikeshare_station_data['TTC Access'])))
There are 545 bikeshare stations with TTC access. 89.34426229508196 % of bikeshare stations have TTC access
Similar to TTC access, compare trips from bikeshare stations with TTC access to those without. Like with subway station access, bikeshare stations with TTC access saw a higher average number of daily departures, although the distribution of departures was much less drastic. In other words, subway stations access was found to be a stronger influence on bikeshare station popularity than TTC access.
In this section, explore some of the programs and initiatives run by Bike Share Toronto to determine their popularity and success in attracting riders. These programs include:
User Type impacts on usage behaviour
Is there a difference in usage behaviour between Casual and Annual Member riders? First we look at the overall trend in daily ridership over this period by making a new DataFrame where each row corresponds to one day, with columns of for the number of trips, annual member trips, and casual member trips.
# Group so each row coresponds to a day. Create new dataframe with the count of total, annual, and casual rides taken each day.
daily_data = trips_data.groupby(trips_data['Start Time'].dt.floor('D')).agg(
rides = ('Trip Id', 'count'),
Annual_members = ('User Type', lambda x : sum(x == 'Annual Member')),
Casual_members = ('User Type', lambda x : sum(x == 'Casual Member')))
daily_data.head()
rides | Annual_members | Casual_members | |
---|---|---|---|
Start Time | |||
2016-12-31 00:00:00-05:00 | 9 | 9 | 0 |
2017-01-01 00:00:00-05:00 | 482 | 412 | 70 |
2017-01-02 00:00:00-05:00 | 826 | 756 | 70 |
2017-01-03 00:00:00-05:00 | 871 | 853 | 18 |
2017-01-04 00:00:00-05:00 | 1395 | 1361 | 34 |
The total count of trips per day is increasing over time, with peaks in ridership during the summer months. When we plot the trips taken each day separated by user type, we can see that annual and casual members are both increasing in ridership over this period.
# Take the rolling mean for the previous 7 days to simplify plotting.
data = daily_data.rolling(7).mean()
#plt.figure()
fig, ax = plt.subplots(figsize = (8, 5))
ax = sns.lineplot(data = data, dashes = False)
ax.set_title('Daily Ridership from 2017 to 2020', size = 14)
ax.set_xlabel('Date-Time', size = 14)
ax.set_ylabel('Daily Rides', size = 14)
ax.legend(['Total Rides', 'Annual members', 'Casual members'])
plt.show()
The increase is more obvious when seasonal fluctuations are ignored. Plot the total annual number of trips by user type to see if Annual and Casual members are both increasing in ridership.
# Look at the yearly total rides by user type.
# Group by year of start time to find total annual trips.
yearly_data = daily_data.groupby(daily_data.index.year).sum()
# Drop incomplete data from 2016.
yearly_data = yearly_data[yearly_data.index != 2016]
yearly_data.head()
rides | Annual_members | Casual_members | |
---|---|---|---|
Start Time | |||
2017 | 1392020 | 1142621 | 249399 |
2018 | 1840223 | 1556369 | 283854 |
2019 | 2338532 | 1834708 | 503824 |
2020 | 2468582 | 1633170 | 835412 |
# Plot yearly data.
fig, ax = plt.subplots(figsize = (7, 5))
ax = sns.lineplot(data = yearly_data, dashes = False, linewidth = 2)
ax.set_title('Annual Ridership from 2017 to 2020', size = 14)
ax.set_xlabel('Year', size = 14)
ax.set_ylabel('Total Rides', size = 14)
ax.legend(['Total Rides', 'Annual members', 'Casual members'])
plt.show()
From the yearly data, the increase in total ridership over time is more obvious. However, 2020 saw a decrease in trips made by annual members in 2020. Given that overall ridership continues to increase, that means the uptake of trips made by casual members in the past year has increased significantly. This may reflect how travel patterns have changed due to COVID-19 restrictions. Annual members tend to make commute trips to work, and as less people are making regular commute trips to their place of work, it follows that there is a decrease in annual membership rides. People are still choosing to take bike trips with a casual membership, indicating that their usage is likely more sporadic.
2020 data only goes up to the end of October, so actual ridership for 2020 is expected to be even higher than shown in the chart here.
Trip Duration
Next, see if user type resulted in any differences in trip duration. Plot the distribution of trip durations by user type, and find the mean trip durations for each.
# Plot distribution of trip duration, split by user type.
daily_rides = sns.distplot(trips_data.loc[trips_data['User Type'] == 'Casual Member', 'Trip Duration (mins)'], label = 'Casual')
daily_rides = sns.distplot(trips_data.loc[trips_data['User Type'] == 'Annual Member', 'Trip Duration (mins)'], label = 'Annual')
daily_rides.set_title('Distribution of Trip Duration by User Type')
daily_rides.set_xlabel('Trip Duration (Minutes)')
daily_rides.set_ylabel('Probability Density')
daily_rides.legend()
<matplotlib.legend.Legend at 0x7fade01f0f40>
# Find average trip durations.
mean_duration_casual = trips_data[trips_data['User Type'] == 'Casual Member']['Trip Duration (mins)'].mean()
mean_duration_annual = trips_data[trips_data['User Type'] == 'Annual Member']['Trip Duration (mins)'].mean()
print('Casual Members mean trip duration is {} minutes'.format(mean_duration_casual))
print('Annual Members mean trip duration is {} minutes'.format(mean_duration_annual))
Casual Members mean trip duration is 17.989849864716618 minutes Annual Members mean trip duration is 11.586441197600228 minutes
Annual members in general take trips with a shorter duration than casual members, with a mean duration of 11.6 minutes. Casual members take an average trip diration of Additionally, casual members have a larger spread in the duration of trips. Given that annual members tend to be commuters, it is reasonable that the trip durations both tend to be
Weekdays vs. Weekends
Looking at the day of the week members are taking rides.
# Add colum that is TRUE if the day is a weekday.
daily_data['workday'] = daily_data.index.weekday <= 5
# Plot working and non working days for annual and casual members.
f, ax = plt.subplots(figsize = (10, 5))
working_days = sns.scatterplot(x = 'Casual_members', y = 'Annual_members', hue = 'workday', size = 'rides', sizes = (2, 60), alpha = 0.7, data = daily_data, ax = ax)
# Set labels and format
working_days.set_title('Comparison of casual members and annual members on working and non-working days')
working_days.set_xlabel('Casual Membership')
working_days.set_ylabel('Annual Membership')
plt.xlim(0, )
working_days.legend()
<matplotlib.legend.Legend at 0x7fad76e3a2b0>
Annual members make more trips on works days compared to on the weekend. On the other hand, there are many work days with few Casual member trips, and there is not a significant difference between trips taken on workdays compared to the weekend.
As the previous sections confirmed that restrictions brought on by COVID-19 were found to have an impact on ridership in 2020, plot daily trips taken in 2019 and 2020 separately. In 2019, Annual member trips show a clear distinction in ridership on workdays compared to weekends with workdays tending to have much higher ridership.
# Plot 2019
f, ax = plt.subplots(figsize = (10, 5))
working_days = sns.scatterplot(x = 'Casual_members', y = 'Annual_members', hue = 'workday', size = 'rides', sizes = (2, 60), alpha = 1.0,
data = daily_data[(daily_data.index.year == 2019)],
ax = ax)
# Set labels and format
working_days.set_title('Comparison of casual members and annual members on working and non-working days in 2019')
working_days.set_xlabel('Casual Membership')
working_days.set_ylabel('Annual Membership')
plt.xlim(0, )
working_days.legend()
<matplotlib.legend.Legend at 0x7fad76e3aa30>
Compared to 2019, 2020 shows less of a distinction between workdays and weekends, likely due to the impact of the pandemic on commuter trips taken.
# Plot 2020
f, ax = plt.subplots(figsize = (10, 5))
working_days = sns.scatterplot(x = 'Casual_members', y = 'Annual_members', hue = 'workday', sizes = (2, 60), alpha = 1.0,
data = daily_data[(daily_data.index.year == 2020)],
ax = ax)
# Set labels and format
working_days.set_title('Comparison of casual members and annual members on working and non-working days in 2020')
working_days.set_xlabel('Casual Membership')
working_days.set_ylabel('Annual Membership')
plt.xlim(0, )
working_days.legend(['Workday', 'Weekend'])
<matplotlib.legend.Legend at 0x7fad757533a0>
Next, we will look at the Free Ride Wednesday (FRW) initiative. Free Ride Wednesdays give customers unlimited 30-minute trips. The program was run in July 2017, June 2018, August 2019, and September 2020. To determine the success of the program we will look at the number of trips taken on Wednesdays of FRW months compared to the number of trips taken on other weekdays. We will compare this to non-FRW months to see if there is an increase in ridership.
# Take daily_data from previous section. Add column that for the day of the week where Monday = 0. Add another column that is True if date was running the FRW initiative.
daily_data['Weekday'] = daily_data.index.weekday
daily_data['FRW Day'] = (daily_data['Weekday'] == 2) & (
((daily_data.index.year == 2020) & (daily_data.index.month == 9)) | ((daily_data.index.year == 2019) & (daily_data.index.month == 8)) |
((daily_data.index.year == 2018) & (daily_data.index.month == 6)) | ((daily_data.index.year == 2017) & (daily_data.index.month == 7))
)
# Check that Wednesdays durring the initiatives in these four years are accounted for.
print('There are {} days running the Free Ride Wednesday Initiative with data available.'.format(sum(daily_data['FRW Day'] == True)))
There are 17 days running the Free Ride Wednesday Initiative with data available.
# Groupy by the day of the week and if the day is running the FRW initiative. Return the average number of daily trips taken on each day of the week.
mean_rides = daily_data.groupby(['Weekday', 'FRW Day'])['rides'].mean()
print(mean_rides)
Weekday FRW Day 0 False 5456.010000 1 False 5861.875000 2 False 5717.021858 True 13070.294118 3 False 5892.405000 4 False 6033.365000 5 False 5536.741294 6 False 5046.700000 Name: rides, dtype: float64
The average number of daily trips taken on FRW days is noticeably higher than on non-FRW Wednesdays and other days of the week.
However, we know that there are seasonal trends in bicycle ridership. To determine if days running the FRW initiative see a higher ridership than other days in a similar period, we can plot the number of daily trips taken each day.
fig, ax = plt.subplots(figsize=(10, 5))
ax = sns.scatterplot(x = daily_data.index, y = "rides",
hue = "FRW Day", size = 'FRW Day', sizes = (30, 10),
data = daily_data, ax = ax)
ax.set_title('Daily ridership in from 2017 to 2020')
Text(0.5, 1.0, 'Daily ridership in from 2017 to 2020')
We can see that in 2017, 2018, and 2019, the FRW initiative did result in higher than normal daily ridership compared to other days in the same year and season. This was not the case in 2020, however, which saw even fewer daily rides on FRW days than in the previous year, despite a growth in ridership overall.
This can likely be attributed to the month that the FRW initiative was run in. In 2017, 2019, and 2019, the program was run during the summer (in July, June, and August respectively) which are months that already see higher ridership. In 2020, the program was run during September, when bicycle ridership begins to decline.
Taking a closer took at 2020, like in previous years, September is past the peak of ridership during the summer months. Within September and the nearby time frame, the FRW initiative still does not have an increase in ridership. This is likely because past the summer months, electives and recreational trips made by cycling decreases.
daily_data_2020 = daily_data.loc[(daily_data.index.year == 2020) & (daily_data.index.month > 5) & (daily_data.index.weekday <= 5)]
fig, ax = plt.subplots(figsize=(10, 5))
ax = sns.scatterplot(x = daily_data_2020.index, y = "rides", hue = "FRW Day",
data = daily_data_2020, ax = ax)
ax.set_title('Daily ridership in 2020')
Text(0.5, 1.0, 'Daily ridership in 2020')
Bike Share Toronto ran a Free Ride Weekend on which offered users unlimited free rides under 30 minutes on February 29 and March 1, 2020. Investigate if the initiative was popular.
# Identify free ride weekend in 2020.
free_weekend = daily_data[(daily_data.index.date == date(2020, 2, 29)) | (daily_data.index.date == date(2020, 3, 1))]
# Look at only first half of 2020.
free_ride_period = daily_data[(daily_data.index.year == 2020) & (daily_data.index.month < 12) & (daily_data.index.weekday > 4)]
# Plot ridership on days running free ride weekends and those without.
fig, ax = plt.subplots(figsize=(8, 5))
# Plot all rides.
ax = sns.scatterplot(x = free_ride_period.index, y = "rides",
sizes = (30, 30),
data = free_ride_period, ax = ax, color='b')
# Plot rides running
ax = sns.scatterplot(x = free_weekend.index, y = "rides",
sizes = (30, 30), data = free_weekend, ax = ax, color = 'r')
ax.set_title('Daily Ridership on Weekends')
ax.legend(['Regular Day', 'Free Ride Weekend'])
plt.show()
The Free Ride Weekend did not see any increase in ridership compared to other weekend days in a similar time period. Like with the 2020 FRW initiative, this offer was run during a comparatively low-ridership month in February. As such, users may not be as willing to take elective trips to take advantage of the program during this time due to the temperature, road conditions, or other factors that dissuade ridership in the colder months. Additionally, in February 2020, pandemic restriction further limited the number of trips people were choosing to make.
The purpose of these initiatives is to encourage usage of the bike share program to attract riders to the program. One-day and weekend programs are unlikely to attract ridership from commuters, as travel behaviour for commuters is much more regular and annual members (who tend to be commuters) are more likely to have annual memberships, which already offer unlimited 30-minute trips. Based on these observations, it may be more beneficial to promote free ride initiatives during times when people choose to take elective or non-commute trips ie. during the summer months.