- Objective This course will cover big data analysis and visualization and their foundation, principles, and elements.
- Expectation The class will include lectures on topics relating to big data technologies and on supplemental paper reading assignments. Programming assignments will augment the theoretical aspects of big data. Please understand that this is a graduate-level and upper-division engineering course. As such, students are expected to devote a large amount of time to the programming assignments and course project.
- Instructor John J. Tran
- Lecture Sunday @ ET 220
- Office Hours Sunday 08:00 AM to 9:10 AM or by appointment
- Meeting Room Students are welcome to participate in lively real time discussion in a virtual conference center. Please note that this conference room requires a github account and is open to the general public for viewing (aka it's world readable).
- Text Book Doing Data Science: Straight Talk from the Frontline by Cathy O'Neil and Rachel Schutt. ISBN: 978-1449358655. Can be obtained from Amazon or the CSULA book store.
- Quizz (4) - 20 points
- Homework (4) - 60 points
- Project - 15 points
- Data Science Recipe - 5 points
Final Project These homework assignments culminate a final project: a complete data science workflow. Students will engage in a team or individual projects to leverage the opensource projects to facilitate their project development. Successful completion of the course project is a requirement for passing this course.
- A: 94 to 100
- A-: 90 to 93
- B+: 85 to 89
- B: 80 to 84 Graduate students will receive NC (No Credit) for scores below 80
- B-: 77 to 79
- C+: 74 to 76
- C: 70 to 73 Undergraduates will receive NC (No Credit) for scores below 70
Reasonable accommodation will be provided to any student who is registered with the Office of Students with Disabilities and requests needed accommodation.
Cheating on assignments and exams will not be tolerated. All parties involved will receive a grade of F for the course and be reported to the Computer Science Department.
The schedule below is tentative and is subject to change.
-
4/5/2015 Introduction to Big Data. Reading Assignment: "Cognitive Augmentation" and "Data (Science) Pipelines" can be found in [Big Data Now: 2014 Edition, Current Perspectives] (http://www.oreilly.com/data/free/big-data-now-2014-edition.csp) from O'Reilly Media. We will have a quiz on this paper on 4/12. Please be prepared to answer specific questions about the reading materials. Assignment: (1) Get a github account, (2) do the reading assignment, (3) form a team, and (4) research on dashboard using bootstrap [Note that this is an ungraded assignment].
-
4/12/2015 We will not have meet today. There will be an online quiz (available 1000 Sunday morning. Due 2359 Sunday evening). Please submit
answer.txt
anddata_duck.py
to CSNS. [Quiz 1] -
4/19/2015 Data Acquisition. We will focus on the various methods on obtaining and acquiring data. Please come to class with a domain you would like to explore for your data acquisition. Assignment: Data Acquisition.
-
4/26/2015 Data Storage. Technologies covered: SQL, NoSQL, JSON, and other storage technologies. Assignment: Data Storage.
-
5/3/2015 We will not meet today. Regarding lecture04.pdf, please ignore the "Learn Latex" directive on the last slide. BTW, you can get a free Graph Databases book from O'Reilly. You will not be quized on this book; however, do take advantage of the offer. Finally, please watch these two videos on datascience:
-
5/10/2015 Data Analysis. We will have a short quiz [Quiz 2] on the videos (listed above) and the class notes lecture03.md and lecture04.pdf. The quiz will consist of 10 short answer questions. Lecture: part 1. Technologies covered: Map Reduce and SPARK. Assignment: Data Storage.
-
5/17/2015 Data Analysis: part 2. We will have an in class quiz on data analysis. Assignment: Data Analytic models.
-
5/24/2015 Data Visiualization. Building the dash-board. Assignment: Visualization. [Quiz 3]
-
5/31/2015 Project Demonstration
-
6/7/2015 Final Exam [Quiz 4]