sample_n(USAGE, 1000) |>
mutate(month=as.character(month)) |>
::datatable() DT
Introducing the United States’ Public Transit Network: Your Journey, Your Delay
From the bustling streets of New York City to the serene landscapes of the Pacific Northwest, America’s public transit systems offer a diverse range of options to get you where you need to go… eventually.
If the American transit systems were to have a commercial brochure it would look a little like the following:
Experience the convenience and efficiency of American Public Transit with…
Subways: Navigate the heart of major cities with ease and speed… or watch as the train you were running to catch closes its doors in front of your eyes.
Buses: Explore neighborhoods and connect to popular destinations… as long as they haven’t been rerouted due to unexpected construction or traffic.
Trains: Enjoy scenic journeys and efficient transportation between cities… unless there’s a mechanical issue or a signal malfunction.
Light rail: Experience modern, eco-friendly travel in urban areas… just be prepared to wait half an hour for the next one to arrive.
Streetcars: Discover historic charm and convenient transportation options… or experience the occasional track fire.
Although the US transit systems are prone to issues as highlighted in the free brochure, they also serve as a ‘reliable’, accessible, and affordable means of transportation. (Unless you live in NYC where they just hiked fares to $2.90)
As I learned more about this topic, I began to realize just how far-reaching the US transit system was. I mean, did you know that within the United States, there are 678 different transit systems? I would’ve never guessed.
Delving deeper into this train of thought, I pulled together data from 3 different sources on the National Transit Database and cranked out quite a few interesting insights on United States Transit systems.
1) The 2022 Fare Revenue table
2) The 2022 Operating Expenses reports
3) The latest Monthly Ridership tables
1) MTA New York City Transit is the most traveled US transit system in terms of Vehicle Revenue Miles.
As a New Yorker, I’m convinced that New York City is the best city in the world. This statement however isn’t only supported by our top ranking in rats per household but also manifests itself in the number of people that take public transportation.
After analyzing data for 678 different public transit agencies in the United States, and summing their vehicle revenue miles, we find that the NYC MTA takes the lead with 10.8B traveled miles. To put that into perspective, that’s the equivalent of three one-way trips to Pluto.
#This code creates a the function "Top Agency" that allows us to group our data by transit agency and find out which has the highest total vehicle miles
<- USAGE |>
topagency group_by(Agency) |>
summarize(total_vrm = sum(`Vehicle Revenue Miles`)) |>
arrange(desc(total_vrm))
head(topagency,n=5)
# A tibble: 5 × 2
Agency total_vrm
<chr> <dbl>
1 MTA New York City Transit 10832855350
2 New Jersey Transit Corporation 5645525525
3 Los Angeles County Metropolitan Transportation Authority 4354016659
4 Washington Metropolitan Area Transit Authority 2821950701
5 Chicago Transit Authority 2806202144
If that stat didn’t sell you on the MTA’s public transit dominance, if we look at the next two most utilized public transportation systems in the US, they don’t even come close in terms of miles traveled. (NJT coming in at 5.6B and LACTA coming in at 4.3B)
To be fair, these numbers make sense if we take a look at how many trips were taken on the NYC Subway (Heavy rail) in May of 2024 by itself. By filtering our data for NYC MTA, Heavy Rail, and 2024-05-01, we can see the following number of trips taken…
#This code changes our month column to characters so we can use a filter on it
$Month <- as.character(USAGE$month)
USAGE
#In order to find out the specific number of riders, we filtered our data for NYC MTA, Heavy rail , and the date.
<- USAGE |>
NYC_HR_MAY2024 filter(USAGE$Agency == "MTA New York City Transit", USAGE$Mode=="Heavy Rail", USAGE$Month=="2024-05-01")
print(NYC_HR_MAY2024$`Unlinked Passenger Trips`)
[1] 180458819
Extrapolating and coverting this to a yearly amount result in ~2,165,505,828 trips which is a heck of a lot. In fact, this number doesn’t even account for the impact COVID-19 had on subway ridership.
COVID-19 and its Impact on NYC MTA Ridership
To analyze the impact that COVID-19 had on the MTA, we can compare NYC MTA unlinked passenger trips pre-pandemic and MTA unlinked passenger trips during the pandemic. For this example, I pulled data for the NYC MTA heavy rail for April 2019 and April 2020 and this is how it came out.
#To measure the impact of covid we can compare pre-covid to during the pandemic by filtering for our data according to our specific requirements like NYC MTA & Heavy rail
<- USAGE |>
NYC_HR_APRIL2019 filter(USAGE$Agency=="MTA New York City Transit", USAGE$Mode=="Heavy Rail", USAGE$Month=="2019-04-01")
<- USAGE |>
NYC_HR_APRIL2020 filter(USAGE$Agency=="MTA New York City Transit", USAGE$Mode=="Heavy Rail", USAGE$Month=="2020-04-01")
print(NYC_HR_APRIL2019$`Unlinked Passenger Trips`)- print(NYC_HR_APRIL2020$`Unlinked Passenger Trips`)
[1] 232223929
[1] 20254269
[1] 211969660
The first number represents pre-pandemic activity of 232M unlinked passenger trips in April of 2019 which dropped significantly to 2M in April 2020 representing 99% drop! If we were to visualize that drop it would look something like this…
#This code helps us visualize the impact of Covid by summing unlinked passenger trips in relation to the time periods they were recorded.
<- USAGE |>
NYC_HR_Seasonality group_by(month)|>
filter(Agency=="MTA New York City Transit", Mode=="Heavy Rail")|>
summarize(total_UPT = sum(`Unlinked Passenger Trips`)) |>
arrange(desc(total_UPT))
plot(NYC_HR_Seasonality,main="New York City Ridership from 2002-2023", xlab="Time", ylab="Ridership",ylim = c(0, 400000000))
As you can see, MTA Transit ridership has steadily grown from 2002-2019 until the pandemic took place. Since then, MTA ridership has not fully recovered. If we were to take this into account and use pre-pandemic numbers as an estimate of yearly ridership, we could have seen ridership volumes of up to 2.75B!
Although I could talk about New York City all day, I’m sure you’d find it more interesting if I threw some other discoveries at you.
2) The United States Loves Buses
With all the talk about heavy rails and subways, you would figure that they’d be the most popular form of public transportation in the United States… Well, you guessed wrong.
After pulling ridership data, grouping it by transportation mode, and summing the total vehicle miles… Buses came out on top with 49,444,494,088 total miles traveled! (Approximately 13.75 Pluto trips!)
#Our code here sorts by mode of transportation and then sums up the total VRM to gauge which mode traveled the most
<- USAGE |>
topmode group_by(Mode) |>
summarize(total_vrm = sum(`Vehicle Revenue Miles`)) |>
arrange(desc(total_vrm))
head(topmode,n=1)
# A tibble: 1 × 2
Mode total_vrm
<chr> <dbl>
1 Bus 49444494088
If we parse a little deeper we can see that the top and bottom contributors for this stat were..
#This will display the top bus in terms of vehicle revenue miles
<- USAGE |>
toponebus group_by(Agency)|>
filter(Mode=="Bus")|>
summarize(total_VRM = sum(`Vehicle Revenue Miles`)) |>
arrange(desc(total_VRM))
head(toponebus,n =1)
# A tibble: 1 × 2
Agency total_VRM
<chr> <dbl>
1 New Jersey Transit Corporation 3781858802
#This will display the bottom bus in terms of vehicle revenue miles
<- USAGE |>
bottomonebus group_by(Agency)|>
filter(Mode=="Bus")|>
summarize(total_VRM = sum(`Vehicle Revenue Miles`)) |>
arrange(desc(total_VRM))
tail(bottomonebus,n =1)
# A tibble: 1 × 2
Agency total_VRM
<chr> <dbl>
1 Windham Region Transit District 21265
Well, what do you know! The NJT Corporation lead with a staggering total of 3,781,858,802 vehicle revenue miles while the Windham Region Transit dragged down the average with a measly 21,265.
3) The Award for least used Public Transportation goes to the Municipality of Carolina Demand Response
Fortunately for the Winham transit, the Municipality of Carolina demand response takes the award for the least used public transportation. Contrary to its name, this transit system does not serve communities in North Carolina or South Carolina but instead operates within the municipality of Carolina, Puerto Rico. After pulling and sorting the data for the municipality in R, I found that this transit system served a total whopping 225 unlinked passenger trips!
#This will display the least popular mode of tranpsortation in terms of UPT
<- USAGE |>
leastusedupt group_by(Agency,Mode)|>
summarize(total_UPT = sum(`Unlinked Passenger Trips`)) |>
arrange(desc(total_UPT))
tail(leastusedupt,n =1)
# A tibble: 1 × 3
# Groups: Agency [1]
Agency Mode total_UPT
<chr> <chr> <dbl>
1 Municipality of Carolina Demand Response 225
4) The United States Public Transportation System faces a Significant Funding Gap.
While the United States public transit systems provide a vital service for millions of commuters, operating costs often exceed revenues. Farebox revenue, typically the largest source of income for transit systems, often falls short of covering expenses which leads to unprofitable modes of public transportation.
For example, the New York City MTA had the most unlinked passenger trips at an astounding 1,793,073,801 in 2022 but it’s farebox recovery ratio was rather lackluster 0.325. This means that the NYC MTA is an extremely unprofitable transit system and for every dollar the NYC MTA spends, it only brings in $0.325 of revenue.
#This will display the top transit system in terms of UPT for the year 2022 by grouping agency and mode & summing total_upt. Additional sorts out for larger transit systems via total_UPT>400000
<- USAGE_AND_FINANCIALS |>
topUPT2022 group_by(Agency,Mode)|>
filter(total_UPT>400000)|>
summarize(total_UPT2022 = sum(total_UPT)) |>
arrange(desc(total_UPT2022))
head(topUPT2022,n =1)
# A tibble: 1 × 3
# Groups: Agency [1]
Agency Mode total_UPT2022
<chr> <chr> <dbl>
1 MTA New York City Transit Heavy Rail 1793073801
#This will display farebox recovery for NYC by dividing total fares by expenses.
<- USAGE_AND_FINANCIALS |>
nycmtafarebox filter(Agency == "MTA New York City Transit")|>
summarize(nycfarebox = sum(`Total Fares`)/sum(Expenses)) |>
arrange(desc(nycfarebox))
head(nycmtafarebox,n =1)
# A tibble: 1 × 1
nycfarebox
<dbl>
1 0.325
As a result of low farebox revenues, transit agencies rely heavily on government subsidies, including federal, state, and local funds. While these subsidies are essential for maintaining operations, it is interesting to take a look at which transit systems are the best self-sustaining and most efficient with their resources. This, however, is not an easy task as there are many metrics to measure efficiency.
For example:
The most profitable transit system in terms of profits/expenses is the Port Imperial Ferry Corporation which serves the NY/NJ region with a farebox recovery ratio of 1.423.
#This will display the top agency in terms of farebox recovery
<- USAGE_AND_FINANCIALS |>
topfarebox group_by(Agency,Mode)|>
filter(total_UPT>400000)|>
summarize(topfarebox = sum(`Total Fares`)/sum(Expenses)) |>
arrange(desc(topfarebox))
head(topfarebox,n =1)
# A tibble: 1 × 3
# Groups: Agency [1]
Agency Mode topfarebox
<chr> <chr> <dbl>
1 Port Imperial Ferry Corporation Ferryboat 1.43
The transit system with the lowest expenses per unlinked passenger trip is North Carolina’s State University Bus which spends an average $1.18 USD spent per trip.
#This will display the agency with lowest expeneses per upt
<- USAGE_AND_FINANCIALS |>
lowexpupt group_by(Agency,Mode)|>
filter(total_UPT>400000)|>
summarize(lowexp = sum(Expenses)/sum(total_UPT)) |>
arrange(desc(lowexp))
tail(lowexpupt,n =1)
# A tibble: 1 × 3
# Groups: Agency [1]
Agency Mode lowexp
<chr> <chr> <dbl>
1 North Carolina State University Bus 1.18
The transit system with the highest revenues per UPT is the Hampton Jitney Incorporated Commuter Bus with a solid $41.3 per trip.
#This will display the agency with highest rev per UPT
<- USAGE_AND_FINANCIALS |>
highexpupt group_by(Agency,Mode)|>
filter(total_UPT>400000)|>
summarize(highexp = sum(`Total Fares`)/sum(total_UPT)) |>
arrange(desc(highexp))
head(highexpupt,n =1)
# A tibble: 1 × 3
# Groups: Agency [1]
Agency Mode highexp
<chr> <chr> <dbl>
1 Hampton Jitney, Inc. Commuter Bus 41.3
The system with the lowest expenses per traveled mile is the Metropolitan Transportation Commission Vanpool with a total $0.44 spent per every mile traveled.
#This will display the agency with lowest expense per VRM
<- USAGE_AND_FINANCIALS|>
lowexpvrm group_by(Agency,Mode)|>
filter(total_UPT>400000)|>
summarize(lvrm = sum(Expenses)/sum(total_VRM))|>
arrange(desc(lvrm))
tail(lowexpvrm, n=1)
# A tibble: 1 × 3
# Groups: Agency [1]
Agency Mode lvrm
<chr> <chr> <dbl>
1 Metropolitan Transportation Commission Vanpool 0.445
The transit system with the highest total fares per VRM is the Jacksonville Transportation Authority ferry boat with a towering $157.7 per VRM.
#This will display the agency with highest fares per VRM
<- USAGE_AND_FINANCIALS |>
highfarevrm group_by(Agency,Mode)|>
filter(total_UPT>400000)|>
summarize(highfarepvrm = sum(`Total Fares`)/sum(total_VRM)) |>
arrange(desc(highfarepvrm))
head(highfarevrm,n =1)
# A tibble: 1 × 3
# Groups: Agency [1]
Agency Mode highfarepvrm
<chr> <chr> <dbl>
1 Jacksonville Transportation Authority Ferryboat 158.
Conclusion: Port Imperial is the most efficient in terms of profitability and the NYC MTA is the most efficient in terms of sheer transportation
While efficiency is hard to measure with one metric, by far the Port Imperial Ferry Corporation blows the competition out of the water in terms of pure profitability. The agency is so profitable that it has been able to fund itself without government subsidies while serving 7m+ unlinked trips annually.
Despite Port Imperial leading the group in profitability, it serves a much smaller range of constituents than its next-door neighbor,the MTA. The NYC MTA has the capability to help commuters make over 1.8B trips annually with the capacity to grow even more. In terms of sheer transportation, it is impossible to beat and deserves a spot as one of the most efficient transit systems due the scale that is has been able to obtain.