Each year, Rensselaer files an IRS Form 990 that contains lots of financial data about the Institute. These forms and others filed with the government have enabled me to do some of my favorite reporting, but that was only because I knew where to look and had patience to sift. This 990 data is available on ProPublica’s Nonprofit Explorer (all in PDFs and some in XML formats).
My primary goal is to make all of this 990 data a little more accessible to the RPI community. The filings date back to 2001, but everything before 2011 is only in PDF form (images only) and is effectively unusable for large-scale analysis. It's currently unclear exactly how I will accomplish this; I'm going to start by putting the pre-XML data in a better format. Once the data is all available in the same format, I would ideally like to create an API to aid community members in accessing the data meaningfully. More realistically (given my time frame), I'll probably end up making a static Github page with access to all the new datasets and other resources on how to use them.
The "Work in progress" folder is meant to document my progress and experimentation throughout the semester for my updates.