Splunk app to process NTSB Safety Data
This is a work in progress.
By far the easiest method is to install this app to a local Splunk Enterprise environment. But if you want, the instructions for Splunk Cloud are also included.
- Create an index on your Splunk instance, I called mine "ntsb_csv".
- Head over to the NTSB and get the CSV data.
- Point your file input monitor at your data folder; select "ntsb_csv" as the sourcetype.
- Install the app.
- Navigate to $SPLUNK_HOME/etc/apps
- Execute:
git clone https://github.com/csyvenky/all_ntsb.git
- Restart Splunk: via Splunk Web or any other way you know how. Alternatively, you can use the Ansible Playbook restart_splunk.yml - a quick and dirty restart tool for splunkd.
- If you don't have an account, open a free trial.
- Browse to your instance then: Apps | Manage Apps | Create App Name: "ntsb_csv" Folder: "ntsb_csv" Version: "1.0" Visible: "Yes" Description: "*All NTSB Aviation Accident Database Template: "barebones"
- Access your instance then: Settings | Indexes | New Index Name: "ntsb_csv" Size: "35 MB" Retention: "14,600 days" (40 years)
- Head over to the NTSB and download the CSV data. Save a copy to your local workstation.
- After downloading the data from NTSB.
- Import into Excel | Data | From Text/CSV.
- Delimit on the "|" char.
- Create four new columns before the "Country" column.
- Split the "Location" field to create new columns: "State", "Location2", "Location3", "last/blank" fields from the (Text to Columns feature while delimiting on ",").
- Delete the "last/blank" field (unused) field, it is likely still called "Column4".
- Save as AviationData.csv and move the data file to a location where your Splunk's data file input configuration can import it. That's it for Excel.
- Access your instance then: Settings | Source Types | New Source Type Name: "ntsb_csv" Description: "Process CSV for NTSB Aviation Accident Database" Destination App: The app created above; "ntsb_csv" Indexed Extractions: "csv" Timestamp: Advanced | Format: "%Y-%m-%d" | Timestamp Field: "Event Date" Delimit on the "," char.
This step may be required if you have imported this same dataset another time - with the current configuration Splunk will happily import dupicate events. As the entire dataset is in each download we can safely import the entire dataset each time.
- Access the file upload dialog.
- Upload the CSV (alternatively you could use a Splunk Forwarder to push the content of this file). Next.
- Select "ntsb_csv" as source type. The preview of the data on the right hand side should be accurate with no warnings. Next.
- Input settings | Host can be left as-is.
- Input settings | Index should be set to "ntsb_csv". Review.
- Submit.
- Access the app homepage.
- Run the following search: "index=ntsb_csv sourcetype=ntsb_csv". In my instance I have > 82,000 events.
- After all that work cleaning up the data, we should make sure that state was parsed correctly. Run the following search: "index=ntsb_csv sourcetype=ntsb_csv | stats count by State | sort - count"
- Access the Dashboards page | Create New Dashboard. Title: "All NTSB Dashboard" Permissions: "Share in App"
- Immediately edit the Dashboard in "Source" mode.
- Copy and paste the code from the github repo. Save.
- Download the data file. Upload it via the control in Splunk Web.
- Access the Lookups page. Add new. Destination App: "all_ntsb" Destnation FileName: "ourairports_data.csv"
- Access the Lookups page. Add new. Destination App: "all_ntsb" Name: "OurAirports.com" Lookup File: "ourairports_data.csv"
- Access the app homepage.
- Run the following search: "| inputlookup ourairports_data.csv".
This project includes some utility Ansible Playbooks for the Management of the app and data.
Playbooks:
- get_date_file.yml - this script downloads the main datafile from the NTSB, there are two data formats (CSV and XML). The CSV file is a much smaller payload.
ansible-playbook -i hosts get_data_file.yml --extra-vars "format=[csv|xml]" --ask-pass
- wipe_splunk_index.yml - since the NTSB provides the data sets via an all-in-one file, we need a utitlity to wipe the index to prevent importing duplicate data into the index.
ansible-playbook -i hosts wipe_splunk_index.yml --ask-pass
- restart_splunk.yml - a quick and dirty restart tool for splunkd.
ansible-playbook -i hosts restart_splunk.yml --ask-pass