Introducing the United States’ Public Transit Network: Your Journey, Your Delay

From the bustling streets of New York City to the serene landscapes of the Pacific Northwest, America’s public transit systems offer a diverse range of options to get you where you need to go… eventually.

If the American transit systems were to have a commercial brochure it would look a little like the following:

Experience the convenience and efficiency of American Public Transit with…

Although the US transit systems are prone to issues as highlighted in the free brochure, they also serve as a ‘reliable’, accessible, and affordable means of transportation. (Unless you live in NYC where they just hiked fares to $2.90)

As I learned more about this topic, I began to realize just how far-reaching the US transit system was. I mean, did you know that within the United States, there are 678 different transit systems? I would’ve never guessed.

Delving deeper into this train of thought, I pulled together data from 3 different sources on the National Transit Database and cranked out quite a few interesting insights on United States Transit systems.

1) The 2022 Fare Revenue table

2) The 2022 Operating Expenses reports

3) The latest Monthly Ridership tables

sample_n(USAGE, 1000) |> 
  mutate(month=as.character(month)) |> 
  DT::datatable()

1) MTA New York City Transit is the most traveled US transit system in terms of Vehicle Revenue Miles.

As a New Yorker, I’m convinced that New York City is the best city in the world. This statement however isn’t only supported by our top ranking in rats per household but also manifests itself in the number of people that take public transportation.

After analyzing data for 678 different public transit agencies in the United States, and summing their vehicle revenue miles, we find that the NYC MTA takes the lead with 10.8B traveled miles. To put that into perspective, that’s the equivalent of three one-way trips to Pluto.

#This code creates a the function "Top Agency" that allows us to group our data by transit agency and find out which has the highest total vehicle miles 
topagency <- USAGE |>
  group_by(Agency) |>
  summarize(total_vrm = sum(`Vehicle Revenue Miles`)) |>
  arrange(desc(total_vrm))
head(topagency,n=5)
# A tibble: 5 × 2
  Agency                                                     total_vrm
  <chr>                                                          <dbl>
1 MTA New York City Transit                                10832855350
2 New Jersey Transit Corporation                            5645525525
3 Los Angeles County Metropolitan Transportation Authority  4354016659
4 Washington Metropolitan Area Transit Authority            2821950701
5 Chicago Transit Authority                                 2806202144

If that stat didn’t sell you on the MTA’s public transit dominance, if we look at the next two most utilized public transportation systems in the US, they don’t even come close in terms of miles traveled. (NJT coming in at 5.6B and LACTA coming in at 4.3B)

To be fair, these numbers make sense if we take a look at how many trips were taken on the NYC Subway (Heavy rail) in May of 2024 by itself. By filtering our data for NYC MTA, Heavy Rail, and 2024-05-01, we can see the following number of trips taken…

#This code changes our month column to characters so we can use a filter on it
USAGE$Month <- as.character(USAGE$month)

#In order to find out the specific number of riders, we filtered our data for NYC MTA, Heavy rail , and the date.
NYC_HR_MAY2024 <- USAGE |>
  filter(USAGE$Agency == "MTA New York City Transit", USAGE$Mode=="Heavy Rail", USAGE$Month=="2024-05-01")
  print(NYC_HR_MAY2024$`Unlinked Passenger Trips`)
[1] 180458819

Extrapolating and coverting this to a yearly amount result in ~2,165,505,828 trips which is a heck of a lot. In fact, this number doesn’t even account for the impact COVID-19 had on subway ridership.

COVID-19 and its Impact on NYC MTA Ridership

To analyze the impact that COVID-19 had on the MTA, we can compare NYC MTA unlinked passenger trips pre-pandemic and MTA unlinked passenger trips during the pandemic. For this example, I pulled data for the NYC MTA heavy rail for April 2019 and April 2020 and this is how it came out.

#To measure the impact of covid we can compare pre-covid to during the pandemic by filtering for our data according to our specific requirements like NYC MTA & Heavy rail

NYC_HR_APRIL2019 <- USAGE |>
  filter(USAGE$Agency=="MTA New York City Transit", USAGE$Mode=="Heavy Rail", USAGE$Month=="2019-04-01")

NYC_HR_APRIL2020 <- USAGE |>
  filter(USAGE$Agency=="MTA New York City Transit", USAGE$Mode=="Heavy Rail", USAGE$Month=="2020-04-01")

print(NYC_HR_APRIL2019$`Unlinked Passenger Trips`)- print(NYC_HR_APRIL2020$`Unlinked Passenger Trips`)
[1] 232223929
[1] 20254269
[1] 211969660

The first number represents pre-pandemic activity of 232M unlinked passenger trips in April of 2019 which dropped significantly to 2M in April 2020 representing 99% drop! If we were to visualize that drop it would look something like this…

#This code helps us visualize the impact of Covid by summing unlinked passenger trips in relation to the time periods they were recorded.
NYC_HR_Seasonality <- USAGE |>
  group_by(month)|>
  filter(Agency=="MTA New York City Transit", Mode=="Heavy Rail")|>
  summarize(total_UPT = sum(`Unlinked Passenger Trips`)) |>
  arrange(desc(total_UPT))

plot(NYC_HR_Seasonality,main="New York City Ridership from 2002-2023", xlab="Time", ylab="Ridership",ylim = c(0, 400000000))

As you can see, MTA Transit ridership has steadily grown from 2002-2019 until the pandemic took place. Since then, MTA ridership has not fully recovered. If we were to take this into account and use pre-pandemic numbers as an estimate of yearly ridership, we could have seen ridership volumes of up to 2.75B!

Although I could talk about New York City all day, I’m sure you’d find it more interesting if I threw some other discoveries at you.

2) The United States Loves Buses

With all the talk about heavy rails and subways, you would figure that they’d be the most popular form of public transportation in the United States… Well, you guessed wrong.

After pulling ridership data, grouping it by transportation mode, and summing the total vehicle miles… Buses came out on top with 49,444,494,088 total miles traveled! (Approximately 13.75 Pluto trips!)

#Our code here sorts by mode of transportation and then sums up the total VRM to gauge which mode traveled the most
topmode <- USAGE |> 
  group_by(Mode) |> 
  summarize(total_vrm = sum(`Vehicle Revenue Miles`)) |> 
  arrange(desc(total_vrm)) 
head(topmode,n=1)
# A tibble: 1 × 2
  Mode    total_vrm
  <chr>       <dbl>
1 Bus   49444494088

If we parse a little deeper we can see that the top and bottom contributors for this stat were..

#This will display the top bus in terms of vehicle revenue miles
toponebus <- USAGE |>
  group_by(Agency)|>
  filter(Mode=="Bus")|>
  summarize(total_VRM = sum(`Vehicle Revenue Miles`)) |>
  arrange(desc(total_VRM))
head(toponebus,n =1)
# A tibble: 1 × 2
  Agency                          total_VRM
  <chr>                               <dbl>
1 New Jersey Transit Corporation 3781858802
#This will display the bottom bus in terms of vehicle revenue miles

bottomonebus <- USAGE |>
  group_by(Agency)|>
  filter(Mode=="Bus")|>
  summarize(total_VRM = sum(`Vehicle Revenue Miles`)) |>
  arrange(desc(total_VRM))
tail(bottomonebus,n =1)
# A tibble: 1 × 2
  Agency                          total_VRM
  <chr>                               <dbl>
1 Windham Region Transit District     21265

Well, what do you know! The NJT Corporation lead with a staggering total of 3,781,858,802 vehicle revenue miles while the Windham Region Transit dragged down the average with a measly 21,265.

3) The Award for least used Public Transportation goes to the Municipality of Carolina Demand Response

Fortunately for the Winham transit, the Municipality of Carolina demand response takes the award for the least used public transportation. Contrary to its name, this transit system does not serve communities in North Carolina or South Carolina but instead operates within the municipality of Carolina, Puerto Rico. After pulling and sorting the data for the municipality in R, I found that this transit system served a total whopping 225 unlinked passenger trips!

#This will display the least popular mode of tranpsortation in terms of UPT

leastusedupt <- USAGE |>
  group_by(Agency,Mode)|>
  summarize(total_UPT = sum(`Unlinked Passenger Trips`)) |>
  arrange(desc(total_UPT))
tail(leastusedupt,n =1)
# A tibble: 1 × 3
# Groups:   Agency [1]
  Agency                   Mode            total_UPT
  <chr>                    <chr>               <dbl>
1 Municipality of Carolina Demand Response       225

4) The United States Public Transportation System faces a Significant Funding Gap.

While the United States public transit systems provide a vital service for millions of commuters, operating costs often exceed revenues. Farebox revenue, typically the largest source of income for transit systems, often falls short of covering expenses which leads to unprofitable modes of public transportation.

For example, the New York City MTA had the most unlinked passenger trips at an astounding 1,793,073,801 in 2022 but it’s farebox recovery ratio was rather lackluster 0.325. This means that the NYC MTA is an extremely unprofitable transit system and for every dollar the NYC MTA spends, it only brings in $0.325 of revenue.

#This will display the top transit system in terms of UPT for the year 2022 by grouping agency and mode & summing total_upt. Additional sorts out for larger transit systems via total_UPT>400000

topUPT2022 <- USAGE_AND_FINANCIALS |>
  group_by(Agency,Mode)|>
  filter(total_UPT>400000)|>
  summarize(total_UPT2022 = sum(total_UPT)) |>
  arrange(desc(total_UPT2022))
head(topUPT2022,n =1)
# A tibble: 1 × 3
# Groups:   Agency [1]
  Agency                    Mode       total_UPT2022
  <chr>                     <chr>              <dbl>
1 MTA New York City Transit Heavy Rail    1793073801
#This will display farebox recovery for NYC by dividing total fares by expenses.
nycmtafarebox <- USAGE_AND_FINANCIALS |>
  filter(Agency == "MTA New York City Transit")|>
  summarize(nycfarebox = sum(`Total Fares`)/sum(Expenses)) |>
  arrange(desc(nycfarebox))
head(nycmtafarebox,n =1)
# A tibble: 1 × 1
  nycfarebox
       <dbl>
1      0.325

As a result of low farebox revenues, transit agencies rely heavily on government subsidies, including federal, state, and local funds. While these subsidies are essential for maintaining operations, it is interesting to take a look at which transit systems are the best self-sustaining and most efficient with their resources. This, however, is not an easy task as there are many metrics to measure efficiency.

For example:

The most profitable transit system in terms of profits/expenses is the Port Imperial Ferry Corporation which serves the NY/NJ region with a farebox recovery ratio of 1.423.

#This will display the top agency in terms of farebox recovery 
topfarebox <- USAGE_AND_FINANCIALS |>
  group_by(Agency,Mode)|>
  filter(total_UPT>400000)|>
  summarize(topfarebox = sum(`Total Fares`)/sum(Expenses)) |>
  arrange(desc(topfarebox))
head(topfarebox,n =1)
# A tibble: 1 × 3
# Groups:   Agency [1]
  Agency                          Mode      topfarebox
  <chr>                           <chr>          <dbl>
1 Port Imperial Ferry Corporation Ferryboat       1.43

The transit system with the lowest expenses per unlinked passenger trip is North Carolina’s State University Bus which spends an average $1.18 USD spent per trip.

#This will display the agency with lowest expeneses per upt
lowexpupt <- USAGE_AND_FINANCIALS |>
  group_by(Agency,Mode)|>
  filter(total_UPT>400000)|>
  summarize(lowexp = sum(Expenses)/sum(total_UPT)) |>
  arrange(desc(lowexp))
tail(lowexpupt,n =1)
# A tibble: 1 × 3
# Groups:   Agency [1]
  Agency                          Mode  lowexp
  <chr>                           <chr>  <dbl>
1 North Carolina State University Bus     1.18

The transit system with the highest revenues per UPT is the Hampton Jitney Incorporated Commuter Bus with a solid $41.3 per trip.

#This will display the agency with highest rev per UPT
highexpupt <- USAGE_AND_FINANCIALS |>
  group_by(Agency,Mode)|>
  filter(total_UPT>400000)|>
  summarize(highexp = sum(`Total Fares`)/sum(total_UPT)) |>
  arrange(desc(highexp))
head(highexpupt,n =1)
# A tibble: 1 × 3
# Groups:   Agency [1]
  Agency               Mode         highexp
  <chr>                <chr>          <dbl>
1 Hampton Jitney, Inc. Commuter Bus    41.3

The system with the lowest expenses per traveled mile is the Metropolitan Transportation Commission Vanpool with a total $0.44 spent per every mile traveled.

#This will display the agency with lowest expense per VRM
lowexpvrm <- USAGE_AND_FINANCIALS|>
  group_by(Agency,Mode)|>
  filter(total_UPT>400000)|>
  summarize(lvrm = sum(Expenses)/sum(total_VRM))|>
  arrange(desc(lvrm))
tail(lowexpvrm, n=1)
# A tibble: 1 × 3
# Groups:   Agency [1]
  Agency                                 Mode     lvrm
  <chr>                                  <chr>   <dbl>
1 Metropolitan Transportation Commission Vanpool 0.445

The transit system with the highest total fares per VRM is the Jacksonville Transportation Authority ferry boat with a towering $157.7 per VRM.

#This will display the agency with highest fares per VRM
highfarevrm <- USAGE_AND_FINANCIALS |>
  group_by(Agency,Mode)|>
  filter(total_UPT>400000)|>
  summarize(highfarepvrm = sum(`Total Fares`)/sum(total_VRM)) |>
  arrange(desc(highfarepvrm))
head(highfarevrm,n =1)
# A tibble: 1 × 3
# Groups:   Agency [1]
  Agency                                Mode      highfarepvrm
  <chr>                                 <chr>            <dbl>
1 Jacksonville Transportation Authority Ferryboat         158.

Conclusion: Port Imperial is the most efficient in terms of profitability and the NYC MTA is the most efficient in terms of sheer transportation

While efficiency is hard to measure with one metric, by far the Port Imperial Ferry Corporation blows the competition out of the water in terms of pure profitability. The agency is so profitable that it has been able to fund itself without government subsidies while serving 7m+ unlinked trips annually.

Despite Port Imperial leading the group in profitability, it serves a much smaller range of constituents than its next-door neighbor,the MTA. The NYC MTA has the capability to help commuters make over 1.8B trips annually with the capacity to grow even more. In terms of sheer transportation, it is impossible to beat and deserves a spot as one of the most efficient transit systems due the scale that is has been able to obtain.