Measuring the impact proximity to subway stations has on rental prices in Manhattan and Brooklyn, and how future station openings and/or closures will impact neighborhood prices.
We gathered data from four sources:
- Location data for station entrances and line access from MTA's API
- Apartment sales from the NYC Department of Finance
- Median neighborhood rental prices and sale-to-rent ratios from Zillow
- Apartment coordinates from GoogleMaps API
Our apartment data consisted of sales instead of rentals, so we used the median neighborhood rental prices from Zillow to convert the apartment sale prices into a rent estimate that better suited our goal. We wanted to focus on rentals instead of sales because it is less static of a market and therefore would see a greater impact from changes in subway access.
In order to get the distance from each apartment to the subway entrance, we used GoogleMaps API to convert addresses into coordinates, from which we could calculate the distance in miles using the Haversine formula. From there, we found for each apartment every station with unique subway access within 0.55 miles of the apartment (roughly a 10-minute walk)
To get a sense of where our apartments were located, and to ensure that we were not focusing on a few neighborhoods, we used GeoPandas to map each neighborhood, apartment, and subway line.
We used 4 different classification models to predict whether an apartment's rental price would be above or below its neighborhood median, given its access and proximity to different lines.
The four models we used were:
- Logistic Regression
- Random Forest
- Gradient Boosting
- AdaBoost
The best performing model was the Random Forest Classifier, which had an Accuracy of 74.52% and AUC of 81.57%.
The optimal hyperparameters for the Random Forest Classifier, cross-validated using GridSearchCV, were:
- 250 estimators
- Gini impurity
- Minimum 5 sample splits
- Minimum 5 sample leafs
- Due to time constraints, we had to limit the scope of our project to Manhattan and Brooklyn, but in the future, I would love to explore both the Bronx and Queens as well
- Would like to predict how the upcoming L Train shutdown will affect rental prices in Williamsburg
- Add Citibike data to our project and measure the impact dock openings have had on rental prices
- Make the maps interactive so that when the user hovers over an apartment, it sees the rental price, address, and neighborhood median price