Here are some code and the projects I made during my journey through Udacity & Bertelsmann Data Science Scholarship 2018/2019.
r-and-python-in-rmarkdown - an example of combining R and Python in one R Markdown document, made for a forum discussion.
See output in html
ubdsc-group-projects - files and scripts created for group project activities. Some exploration analysis was conducted on Boston fires data. The analysis of marketing freelancers offering their services online was submitted as a group project. My part was data cleaning with Python, data exploration with Python/R and data visualisations.
DFND Syllabus | DFND Certificate
After the challenge phase I was accepted to the full scholarship but due to the sorting process got into Data Foundations Nanodegree (recently rebranded as Business Analytics Nanodegree). It took me 5 days to finish it so after the graduation I was granted an upgrade to Data Analyst Nanodegree. Here are the topics covered and the projects I made for DFND.
dfnd-descriptive-stats - the project required using spreadsheets to practice descriptive statistics and analyse Udacity students survey data. I cleaned the students survey data in spreadsheets and performed exploratory analysis. I examined the characteristics of the respondents, their course preferences and the time they spent to complete their projects.
dfnd-sql - the project required applying SQL to explore data in Chinook sample database. I used SQL and R together in R Markdown to pull data from the database with queries and to use the queries' results for further analysis and visualisations (the output included in the final presentation can be seen here. I explored the distribution of music albums by price and genres, the popularity of different genres in the USA and the most favoured composers.
DFND Tableau project can be found here. I used the data of US Census 2015 to visualise regional differences in the United States in terms of population, income and other aspects and presented them as a Tableau story.
DAND Syllabus | DAND Certificate
dand-sql - the project required applying SQL to obtain data for chosen cities from the database of average temperatures in the student workspace as .csv files, and describing the trends. I recreated the database locally for selected data to conduct EDA in R and produce the report using R Markdown.
dand-intro-to-python - the project required to investigate US Bikeshare data using Python basics: using built-in data structures, writing functions, using libraries, etc. I explored the data in terms of time patterns, location differences and client status.
dand-data-investigation - the project required applying the methods of exploratory data analysis to the chosen data from Gapminder.org, using Python libraries - pandas
, numpy
and matplotlib
. I chose maternal mortality data for the data investigation project because of my previous experience in demographic studies. I collected from Gapminder data on maternal mortality and several related topics and wrangled and analysed them in Python using pandas, NumPy and matplotlib. I examined and visualised global tendencies in maternal mortality in 1980-2013, identified regional and economic patterns and also a number of possible influencing factors.
dand-practical-stats - the program included two projects which required statistics methods applications for hypothesis testing, the project structure was pre-set in both cases.
For the first project I performed A/B tests to compare conversion from the old and new versions of a web page. Also I applied logistic regression to estimate the conversion rate depending on the page version and user locations.
For the second project I conducted some statistical tests to make conclusions about the results of an experiment that explored the perceptual phenomenon called Stroop effect. It describes the delay in reaction when reading words whose ink colors don't match the meaning (e.g. "red" written in blue). The tests proved the difference in reading time for such words to be statistically significant.
dand-EDA-in-R - the project required performing exploratory data analysis on a chosen data set, using R and R Markdown. I did data cleaning in R and explored Prosper's loans data in different dimensions with univariate, bivariate and multivariate plots created with ggplot2 library. I determined three stages in the company's performance with specific features, differences between loans based on loan terms, explored borrowers' characteristics and number of investors for different loan amounts.
dand-data-wrangling - the project required applying the methods of data gathering, assessing and cleaning to the data from @dog_rates Twitter account, using Python, and performing exploratory data analysis on the cleaned data. For this project I gathered data programmatically from various sources, including Twitter API, assessed data to prepare the list of data cleaning steps and performed data cleaning. After that I explored WeRateDogs tweets for popularity in terms of likes and retweets, dog ratings and dog stages. If you'd like to have a short break and some fun right now, check the project report for the most popular tweet of WeRateDogs. Don't forget to come back!
DAND Tableau project can be found here. It is based on exploratory analysis of Prosper's loans data, which was the project for EDA in R course (see above). Tableau project write-up, explaining design choices, which also contains EDA summary, is available here.