This reposity has Python code designed to download FCC data, storing it in an ElasticSearch instance. There's an additional command to tag and analyze the data further.
After a first pass in a Jupyter Notebook, I used Kibana on AWS to do most of my digging.
To install the package and run tests:
$ pip install -e .
$ python setup.py test
To crawl the comments, make sure you have a server setup, and then run:
$ fcc index --endpoint=http://localhost:9200
This will take anywhere from 2-4 hours (or wont' work at all, if the API is down).
I then take another pass on the data, appending "analysis" variables to all of the documents. This makes it a lot easier to spot trends in Kibana.
To analyze the comments:
$ fcc analyze --endpoint=http://localhost:9200