Pratt Institute, Center for Continuing and Professional Studies Spatial Analysis and Visualization Initiative (SAVI)
Instructor: Richard Dunks
Location: ISC Building, Lower Level, Room 003
Continuing Education Units (C.E.U.s): 3.0
Click for more information and to register
- Course Overview
- Learning Objectives
- Course Requirements
- Course Readings
- Class Format
- Submitting Assignments
- Assessment
- Class Policies
- Resources
- Course Outline
- Suggested Reading
This course introduces the tools, techniques, and general approaches used to acquire, clean, analyze, and visualize open data, with particular emphasis on using web-based technologies and open-source tools at each step of the process.
We will be working with the community preservation group Save Harlem Now! to help collect, organize, and visualize data related to historic preservation in Harlem. There is no requirement to participate in this project and each student is free to pursue their own projects in class. The work with Save Harlem Now! is an opportunity to work on a real-world problem related to the collection, analysis, and visualization of data.
- You will learn to formulate and articulate a meaningful research question with public open data, as well as meaningfully critique the work of others
- You will learn how to acquire data through open data portals, application programmer interfaces (APIs), and scraping data from web sites
- You will learn how to clean data using open source tools in preparation for analysis and visualization
- You will learn how to conduct exploratory data analysis using descriptive statistics
- You will learn to visualize your analytical findings in meaningful and visually-engaging graphics, as well as meaningfully critique the work of others
- You will learn the basics of cartographic design as it relates to visualizing open data
All students will need to bring their own laptop for exercises during class. Time will be set aside to help install, configure, and run the programs necessary for all assignments, projects, and exercises. Where possible, all programs will be free and open-source. All assigned work using services hosted online can be run using free accounts. Please update your system to the latest version of your prefered operating system prior to the first day of class to ensure you're able to successfully install and use the tools in class.
You will be required to have free accounts with the following services:
Time will be set aside to help you register and setup these accounts, but please try to come to the first session having already registered for these servies.
In addition, please install the following applications prior to class:
- Slack
- OpenRefine
- A free text editor of your choice
- Sublime Text (All systems)
- TextWrangler (All systems)
- Notepad++ (Windows)
The required readings for this course consist of book chapters, newspaper articles, and short blog posts. The intention is to help give you a foundation in the critical skills ahead of class lectures. All required readings are available online or will be made available through the class portal. Recommended readings are suggestions if you wish to study further the topics covered in class. The books listed in the Suggested Readings section below offer even more depth and an extended discussion of the material we cover in class. Readings are due for the class under which they're listed.
Class runs from 6:30pm to 9:30pm, with the class time broken up into two 85-minute blocks with a single 10-minute break around the half-way point of the class. Class will be a mix of lecture and practical exercise work, emphasizing the application of skills covered in the lecture portion of the class.
I will also be available for questions or further assistance before and after class. You will have ample time in class to work on practical exercises based on the information presented in lectures. When possible, the final half hour of class will be set aside for any additional questions or additional tutorials in tools, skills, or techniques. Please plan on attending the full class time.
All assignments will be submitted by adding your content to the class page and issuing a "pull request" in the class repository. All of this will be explained, setup, and otherwise clarified on the first day of class. Assignments aren't considered submitted until the pull request has been issued. We will have ample time in class to address any technical issues and a reference guide for the process.
Area | Total Points |
---|---|
Attendance | 20 |
Class Participation | 20 |
Visualization Critiques | 20 |
Visualizations | 20 |
Final Project | 20 |
Total | 100 |
I expect you to attend every class, arriving on time and staying for the entire duration of class. Daily attendance counts 2 points toward your final grade. Excused absences won't result in points being lost.
I expect you to be fully engaged while you’re in class. This means asking questions when necessary, engaging in class discussions, participating in class exercises, and completing all assigned work. Learning will occur in this class only when you actively use the tools, techniques, and skills described in the lectures. I will provide you ample time and resources to accomplish the goals of this course and expect you to take full advantage of what’s offered. Daily participation counts 2 points toward your final grade.
All assignments are to be due before the start of class to be presented in class. Points will be taken off late assignments.
I won’t be holding regular office hours, but I’m happy to set up a time to meet in person, over the phone, or via Skype/Google Hangout if you have any problems. Please use Slack to reach out to me. I will also be available before or after class to provide any assistance you may need.
- Technical
- Stack Overflow Q&A community of technology pros
- GIS Stack Exchange (same as above but just for mapping)
- (Some) Open Data Sources
- Visualizations
- Reference
Topics will be covered that day in class. Reading Assignments are to be read before class in preparation of the lecture and exercises. Assignments are due before the start of the next class and build on the information presented in class.
- What is open data?
- Data on the web
- Introduction to mapping
- Introduction to open source tools and services for mapping and visualization
- Complete the visualization started in class with data from an open data portal. Style the map in CartoDB and have it ready to present in class.
- Find an interesting or visually compelling visualization online and write 2-3 paragraphs on the visualization, discussing the data source(s), the visual style, and how well the data was represented. Feel free to use the visualization resources listed above. Submit your text to the class page following the example shown.
- Introduction to HTML and CSS
- Introduction to Git and Github
- Guest lecture on Save Harlem Now
- Interactive Data Visualization for the Web, pg 15 – 23
- Matt A.V. Chaban, "Much to Save in Harlem, but Historic Preservation Lags, a Critic Says"
- Complete the online CartoDB “Online Mapping for Beginners” course.
- Create a second visualization or improve on your first, using new data or explore a data set from Save Harlem Now Project. Write 2-3 paragraphs discussing any challenges you encountered working with the data and/or creating your visualization in CartoDB.
- Codecademy HTML and CSS Course
- W3Schools HTML Tutorial
- A tutorial for getting started with Git and Github
- Web scraping
- Introduction to APIs
- Introduction to OpenRefine
- Chris Whong "Foiling NYC's Taxi Trip Data"
- Thomas Levine, Introduction to web scraping
- Introduction to APIs ch 1-5
- Find an interesting or visually compelling visualization online and write 2-3 paragraphs on the visualization, discussing the data source(s), the visual style, and how well the data was represented. Feel free to use the visualization resources listed above. Submit your text to the class page following the example shown.
- Identify a question or topic you'd like to explore in this class, with the intention of creating a map related to the topics as part of your final project in this class. Write 2-3 paragraphs on why the topic is interesting to you, what data you'd like to explore using, and what you hope to contribute with your work.
- Overview of social media data
- Collecting social media data from APIs
- Introduction to Python for querying APIs
- TBD
- Using an API, either of an open data portal such as the NYC Open Data Portal or some other open data source, create a visualization of the data in CartoDB. Write a short (2-3 paragraph) description of the data, the API you used to access it, how you styled it, and the resulting visualization. Discuss other data you'd like to use or other techniques of cleaning the data to get your desired result. Submit your API code via the Slack channel in the format "lastname-assignment2.py" if you do your API query in Python or "lastname-assignment2.txt" if you did you query in OpenRefine.
- Update your project plan for your final project with additional questions, data sources, ideas for visualizing, or other issues/challenges you've discovered.
- CartoDB Academy
- McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O'Reilly Media, Inc., 2012, "Appendix Python Language Essentials"
- Codecademy Python Course
- MIT Introduction to Computer Science and Programming with Python (free course)
- Codecademy Learn to Code for APIs
- Introduction to SQL for cleaning data
- Cleaning Data with APIs
- Obe, Regina, and Leo Hsu. PostGIS in action. Manning Publications Co., 2011, Pg 3-8.
- Find an interesting or visually compelling visualization online and write 2-3 paragraphs on the visualization, discussing the data source(s), the visual style, and how well the data was represented. Feel free to use the visualization resources listed above. Submit your text to the class page following the example shown.
- Complete the SQL and PostGIS in CartoDB course.
- Python for querying Geoclient API
- SQL for cleaning and analysis
- TBD
- Create a new visualization or improve on your previous visualization with additional data and provide analysis of the data you've found. Write 2-3 paragraphs on the visualization, discussing the data source(s), the visual style, and how well the data was represented.
- A (re-)introduction to statistics
- Introduction to visualization design
- Hon, Keone. “An Introduction to Statistics.” Ch. 1 and 2.
- Ben Wellington "Mapping the Sharing Economy"
- Find an interesting or visually compelling visualization online and write 2-3 paragraphs on the visualization, discussing the data source(s), the visual style, and how well the data was represented. Feel free to use the visualization resources listed above. Submit your text to the class page following the example shown.
- Advanced CartoDB (guest lecture)
- Heer, Jeffrey, Michael Bostock, and Vadim Ogievetsky. "A tour through the visualization zoo." Commun. ACM 53.6 (2010): 59-67.
- Munzer, Tamara. Chapter 27 – “Visualization”, p 675-707, of Fundamentals of Graphics, Third Edition. by Peter Shirley and Steve Marschner. AK Peters, 2009.
- CartoDB “Introduction to Map Design”
- Course review
- Advanced topics, to possibly include:
- Introduction to Interactive Visualization of Data with D3 and Leaflet
- Introduction to Spatial Databases
- Visualizing social media data
- Find an interesting or visually compelling visualization online and write 2-3 paragraphs on the visualization, discussing the data source(s), the visual style, and how well the data was represented. Feel free to use the visualization resources listed above. Submit your text to the class page following the example shown.
- Final presentations
- Fry, Ben. Visualizing Data: Exploring and Explaining Data with the Processing Environment. O'Reilly Media, Inc., 2007.
- Garrad, Chris. Geoprocessing with Python. Manning Publications Co., forthcoming. Janert, Philipp K. Data analysis with open source tools. O'Reilly Media, Inc., 2010.
- McCallum, Q. Ethan. Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work. O'Reilly Media, Inc., 2012.
- McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O'Reilly Media, Inc., 2012.
- Munzner, Tamara. Visualization Analysis and Design. AK Peters, 2014.
- Murray, Scott. Interactive data visualization for the Web. O'Reilly Media, Inc., 2013.
- Tufte, Edward R., and P. R. Graves-Morris. The visual display of quantitative information. Vol. 2. Cheshire, CT: Graphics press, 1983.