Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy to make read_csv typo results in different data #553

Closed
ebolyen opened this issue Jul 24, 2019 · 4 comments
Closed

Easy to make read_csv typo results in different data #553

ebolyen opened this issue Jul 24, 2019 · 4 comments

Comments

@ebolyen
Copy link

ebolyen commented Jul 24, 2019

This is really just a note for the future.

At STAMPS we were teaching section 3 of this lesson, and some people were getting a different number of rows after this command.

This is a result of a simple typo using read.csv instead of read_csv which results in a dataframe with a different number of rows.

@ryanpeek
Copy link
Contributor

Perhaps we could create a dataset where whether we use read.csv or read_csv we get the same solution...particularly because it is a fairly easy fix and it can really sidetrack instructors/students from getting at the meat of the lesson. I'll try to work on a PR that does this.

@tracykteal
Copy link
Contributor

tracykteal commented Jul 24, 2019

Thanks @ryanpeek. If we could fix those entries in the dataset, that would be good and help with that confusion for instructors and students. It would be good to keep the overall number of rows the same because those are referenced in other parts of the lesson. We would need to update the dataset on Figshare, but we can do that.

@ryanpeek
Copy link
Contributor

Hi folks (@tracykteal @fmichonneau) ,
So attached is the "revised" portals dataset...I've vetted it with both the data lessons (02-starting-with-data.Rmd & 03-dplyr.Rmd. I replaced the blanks in the surveys$sex column with "blank", which then explicitly reads in as a character or a factor using either read_csv/read.csv. All rows/cols remain the same, and the section on factors with the barplots still works, since it's indexed by number and not by name.

The only thing that needs to be updated is the barplot figure in the "starting with data lesson". I'm not sure how the overall R-ecology site is built/knitted, but if someone reknits the Rmd it should take care of itself (no code changes needed).

The code I used to make this change and write it out is below, let me know if you all need anything else! :)

download.file(url="https://ndownloader.figshare.com/files/2292169",
              # destfile = "data/portal_data_joined.csv")
surveys.csv <- read.csv("data/portal_data_joined.csv")

# Change the "" in the sex to a character level called "blank". This would still permit using barplot in the starting with data lesson
library(dplyr)
surveys.csv.revised <- surveys.csv %>% 
  mutate(
    sex = as.character(sex),
    sex = case_when(
    grepl("^$", sex) ~ "blank",
    TRUE ~ sex))

str(surveys.csv.revised$sex)
summary(as.factor(surveys.csv.revised$sex))

write.csv(surveys.csv.revised, "data/portal_data_joined_revised.csv", row.names = FALSE, na = "")

portal_data_joined_revised.csv.txt

@fmichonneau
Copy link
Member

addressed with #663

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants