#Data 1
Instructor: Tom Meagher, deputy managing editor at The Marshall Project
How to reach me: The best way to contact me is by email at tfm2101@columbia.edu. You can also find me on Twitter at @ultracasual
###Course description##
In this seven-week introduction to data journalism, you'll be exposed to the history of the craft of numerate reporting. You'll develop and finetune your bullshit detector for bad data reporting and writing. You'll be introduced to modern data practices, including software for examining spreadsheets, databases, PDFs and other common forms of data. You'll also become comfortable with the basic steps of an analysis that you can integrate into your reporting.
###Requirements I work in a newsroom. I assume that you hope to do the same as quickly after graduation as possible. I'm going to treat you as I would any of my newsroom colleagues. I'll expect you to show up on time, do your homework, pay attention and ask questions. You can expect the same from me. When you hear something that doesn't sound right, or if something doesn't make sense or is unclear, which is entirely possible given the subject matter, ask me about it.
In addition to doing the assigned readings and participating in the weekly class discussions and exercises, you will be required to report and write two data memos and to file one request for data under state or federal Freedom of Information laws. We'll discuss those in our first couple of classes.
##Schedule, resources and readings
###Class 1 - Nov. 4
Let's get acquainted. What is data journalism? A little history.
Hands-on tutorials on Excel basics, sorting and filtering.
Sample data for exercises:
- Appointments.xls
- Crony.xls
- citybud.xls
- Patskuls.xls
- Voterregs.xls
- countyovertimes.xls
- 2010-2011hs-prresults.xlsx
- DemographicSnapshotEdited.xlsx
Tipsheets:
Assigned readings for next week:
- "The Myth of the Machine" by Michael Berens in "When Words & Nerds Collide," pp. 14-15, 1999, Poynter Institute.
- "Journalism and the Scientific Tradition," Chapter 1 of "The New Precision Journalism", 1991, by Philip Meyer.
- "You Must Learn; 5 lessons from the history of data journalism," lightning talk at IRE's 2014 CAR conference, Ben Welsh.
###Class 2 - Nov. 11
How do you find data? You need to develop your data state of mind.
Talk about the data memo assignment and the memo template.
Hands-on tutorial on pivot tables in Excel
Guest speaker: MaryJo Webster, computer-assisted reporting editor at the Minneapolis Star Tribune
Sample data for exercises:
Tipsheets:
Assigned readings for next week:
- "Hundreds of Police Killings Are Uncounted in Federal Stats" by Rob Barry and Coulter Jones, Wall Street Journal
- "If A Data Point Has No Context, Does It Have Any Meaning?" by Erin Simpson.
- "The Five Stages of Terrible Data", lightning talk at IRE's 2015 CAR conference, Steven Rich.
###Class 3 - Nov. 18 Assignment due: Data memo #1
All Data Is Terrible; aka How the hell do you "clean" data?
Talk about the FOIA assignment and the sample FOIA letter.
Bulletproofing tips. How to avoid rookie mistakes.
Hands-on tutorials on Excel Magic and Open Refine
Sample data for exercises:
- ExcelMagic.xlsx
- payrolldelimited3.txt
- lotterywinners.csv
- nj-hospital-2012.xls
- nassau_police_union_contribs.xls
Tipsheets:
- MaryJo Webster's Excel "Magic"
- Key functions in Open Refine
- How to avoid rookie mistakes
- Checklist to bulletproof your work
Guest speaker: Coulter Jones, data reporter at Medpage Today and project manager of FOIA Machine.
Assigned readings for next week:
- "Big Trouble in Little Data," by Noah Veltman, WNYC
- "Bulletproofing the data project," by Sarah Cohen, The New York Times
- "Using People, Not Numbers, To Tell The Story" by Sarah Cohen in "When Words and Nerds Collide," pp. 16-18, 1999, Poynter Institute.
###Class 4 - Nov. 24 FOIA strategies and avoiding rookie data mistakes.
Tutorials practicing Excel and OpenRefine.
Guest speaker: Stacy Jones, Data Editor at Fortune.com
Tipsheets:
- Jennifer LaFleur's advice on "Negotiating for Electronic Records" (pp. 4-11)
Sample data for exercises:
###Class 5 - Dec. 2 Assignment due: FOIA request and receipt
Moving to the next level, an introduction to R.
If you're using your own machine, be sure to install R and RStudio Desktop before we start the tutorial.
Guest speaker: Carla Astudillo, data journalist at the International Business Times.
Sample data for exercises:
Tipsheets:
- The official R Studio Data Wrangling cheat sheet
- A gentle introductory tutorial to R by Annie Waldman, Ryann Grochowski Jones and Coulter Jones
- Resources for doing data journalism with R
Assigned readings for next week:
- "Basic Steps in Working With Data" chapter by Steve Doig of Data Journalism Handbook
- "Using Data Visualization to Find Insights in Data" chapter by Gregor Aisch of Data Journalism Handbook
###Class 6 - Dec. 9 Assignment due: Data Memo #2
More on working with R dataframes and joining them together.
Another hands-on tutorial on R and R Studio.
Guest speaker: Ryann Grochowski Jones, data reporter at ProPublica
Tipsheets:
- "Learn the Command Line the Hard Way," Zed Shaw's excellent class on getting comfortable with the command line
Assigned reading for next week:
###Class 7 - Dec. 16
More work with R and reporting with data from start to finish.
Tipsheets:
- "A Gentle Introduction to SQL" by Troy Thibodeaux
- "SQLBolt" Introduction to SQL
- "Data analysis and visualization using R" by Variance Explained
- "Try R" by Code School
- "Beginner's Guide to R" by Sharon Machlis
- "Scraping PDFs with Tabula"
- "R Course" by Paul Hiemstra