Recent disasters and crises (such as Hurricanes Irma, Harvey and Maria) have affected many regions of the globe, causing damage in the billions of dollars and hundreds of lives. Communication technologies, such as social media, can play an important role in preventing some of this damage by helping first responders and humanitarian organizations mobilize resources effectively. Social media has emerged as a powerful tool for the crisis informatics community because it serves as a field-level, rich source of real-time information that would otherwise not be available.
Unfortunately, gaining situation awareness from social media content using semi-automatic methods is a difficult task due to the velocity, volume and noise associated with such data. In this tutorial, we will cover the primary steps that can be used for analysing crisis-related social media content in real-world crisis situations.
We will present a situation-awareness data analysis pipeline that focuses on some of the challenges that occur when dealing with social media, including steps ranging from data collection and filtering to event extraction and visualization. The methods, tools and lessons covered will draw on our real-world experience in crisis informatics research, and will include interactions and code demonstrations.
- Grégoire Burel, KMi, The Open University, UK (@evhart)
- Mayank Kejriwal, University of Southern California, USA (@kejriwal_mayank)
- Prashant Khare, KMi, The Open University, UK (@prash_khare)
- Slides Part 1: Introduction: pptx / pdf
- Slides Part 2: Crisis Data Collection and Filtering: pptx / pdf
- Slides Part 3: Concepts and Entities Extraction: pptx / pdf
- Slides Part 4: Classification and Categorisation: pptx / pdf
- Slides Part 5: Event Extraction: pptx / pdf
- Slides Part 6: Visualisation: pptx / pdf
The references of the important papers used in the slides and relevant to the tutorial are available as separate bib files (and as markdown for each presentation):
- References Part 1: Introduction: bib / md
- References Part 2: Crisis Data Collection and Filtering: bib / md
- References Part 3: Concepts and Entities Extraction: bib / md
- References Part 4: Classification and Categorisation: bib / md
- References Part 5: Event Extraction: bib / md
- References Part 6: Visualisation: bib / md
The tutorial hands-on sessions are mostly coded in Python and provided as Jupyter notebooks. The notebooks can be locally installed or visualsised online.
There is different methods for runing and studying the notebooks:
- Interactive notebook environment on mybinder
- Rendered HTML notebooks on nbviewer (non-interactive)
- Interactive local notebook environment:
- Docker build or DockerHub (see below)
- Native installation
- Notebooks: Data Collection: ipynb / nbviewer / mybinder / jupyter
- Notebooks: Entity Extraction:
- Notebooks: Classification: ipynb / nbviewer / nbviewer / jupyter
You can either run the tutorial using docker or installing all the required software manually. It is recommanded to use docker.
You can either install the notebooks and the code using docker or run the code natively.
The docker installation is the recommended approach as it is more likely to work without any configuration issues.
You can either pull the image directly from Docker Hub or build it from the source.
For installing the image from Docker Hub, you need to execute the following command:
docker pull evhart/smasac-tutorial
If you prefer you can clone this git repository and then build the image from the source using the following command:
docker build -t evhart/smasac-tutorial:latest .
The Jupyter server can be started by starting the container using the following command:
docker run -it -p 8888:8888 -p 5000:5000 --name smasac-tutorial evhart/smasac-tutorial:latest
Then, the Jupyter server should start and output the URI that needs to be used for connecting to the server. For example:
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=<SOME_TOKEN>
Simply copy the URI from your terminal into your web browser for accessing the Jupyter server.
If you need to edit the notebooks directly, edit them or do not have Docker installed, you can configure the tutorial directly on your machine.
The code used for the tutorial requires Python 3 (tested on Python 3.5 and 3.7) and Jupyter. The different libraries used for the tutorial are listed in requirements.txt (You can install them using pip3).
There is some potential installation issues when using pip v10 so we recommand to use pip 9.x and have git and cython installed as well as a c++11 compliler.
Geopandas also requires some additional dependecies. Please check geopandas' website.
Before starting the Jupyter server and installing the required libraries, it is recommended to create a virtual environment using venv.
Create a virtual environment for the tutorial in the current directory (you may need to install venv: e.g., apt-get install python3-venv):
python3 -m venv smasac-env
And, activate the environemnt:
source smasac-env/bin/activate
Finally install all the required libraries:
pip install -r requirements.txt
You will also need to activate the Jupyter widgets:
jupyter nbextension enable --py widgetsnbextension
After completing the previous stesps, you can start the Jupyter notebook using the following command:
jupyter notebook
Then, the Jupyter server should start and output the URI that needs to be used for connecting to the server. For example:
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=<SOME_TOKEN>
Simply copy the URI from your terminal into your web browser for accessing the Jupyter server.
This tutorial has received support from the European Union's Horizon 2020 research and innovation programme under grant agreement No 687847 (COMRADES).