STAT 223 Final Capstone
Due Monday, May 6, 2024 at 11:59 p.m. EDT
PUT YOUR NAME HERE
Please complete this exam usingRMarkdown. You are not permitted to communicate with
anyone other than the instructor regarding the capstone, nor are you permitted to post
questions to online forums, using LLM/AI, or any other outside resources. You may use
any resources from class (i.e. notes, textbooks, assignments, examples) in answering these
problems. Make sure to set a seed to 1234 so all results are reproducible (you should do this
once for the entire document in your first R chunk).
Type out the academic honesty pledge, and “sign” it by typing your name after it: “I affirm
that I will not give or receive any unauthorized help on this exam and that all work will be
my own.”
Type out pledge here:
Sign by typing name here:
1
The Department of Transportation is interested in examining public transportation through-
out the United States. In particular, they have been hearing many complaints of delays in
buses, causing people to be late, and they want to investigate how long people are waiting for
the bus to arrive. The Department has looked into bus waiting times in 10 cities, collecting
samples from each location and the average wait time. The data are presented below:
CityAverage Wait Time (Minutes) Number Sampled
NYC12.114
Boston4.716
Baltimore8.67
Charlotte2.310
Miami14.733
Denver4.421
Seattle6.814
San Diego10.213
Los Angeles17.818
Las Vegas3.510
You’ve been hired to examine the trends and report back.
We can model the wait time using an exponential distribution:
y
ij
|θ
j
∼Exp(θ
j
)
Recall that the density function for the exponential distribution is as follows:
p(y
ij
|θ
j
) =θ
j
e
−θ
j
y
ij
, fori= 1,...,n
j
Appropriate model priors to assign in this case are:
θ
j
|α,β∼Gamma(α,β)
p(α,β)∝c
Your job is to analyze the data for interesting trends, providing a report to the Department
of Transportation. Your report should have two sections:
•Section 1: Technical details
•Section 2: Non-technical findings
Section 1 should be your main analysis and the majority of your work. You should draw
inference on all model parameters using a multiple chain approach. I am expecting this
section to include:
•The implementation of an MCMC method based on multiple chains.
–If needed, you can use a proposal variance ofdiag(4,2)
2
–Your analysis should be reliable, so you will want things to have reasonably con-
verged. Make sure to choose an appropriate number of iterations for each chain.
•Assessment of multiple convergence diagnostics and relevant summaries.
•Overall discussion on whether you believe convergence has been met and any potential
shortcomings you see in this area.
•A nicely formatted table of inferential summaries for each of the different variables,
averaged across all four chains, with brief comments.
Section 2 should be a nice summarization of what you found regarding waiting times. Talk
about any trends you are noticing and any areas of concerns. This likely will be around a
half a page summary that does not include technical terms but instead could be a standalone
description of what you found for the Department of Transportation policymakers who do
not have much formal statistics training.
3