Easy to make read_csv typo results in different data #553

ebolyen · 2019-07-24T14:17:11Z

This is really just a note for the future.

At STAMPS we were teaching section 3 of this lesson, and some people were getting a different number of rows after this command.

This is a result of a simple typo using read.csv instead of read_csv which results in a dataframe with a different number of rows.

The text was updated successfully, but these errors were encountered:

ryanpeek · 2019-07-24T14:43:02Z

Perhaps we could create a dataset where whether we use read.csv or read_csv we get the same solution...particularly because it is a fairly easy fix and it can really sidetrack instructors/students from getting at the meat of the lesson. I'll try to work on a PR that does this.

tracykteal · 2019-07-24T16:36:26Z

Thanks @ryanpeek. If we could fix those entries in the dataset, that would be good and help with that confusion for instructors and students. It would be good to keep the overall number of rows the same because those are referenced in other parts of the lesson. We would need to update the dataset on Figshare, but we can do that.

ryanpeek · 2019-07-24T20:08:26Z

Hi folks (@tracykteal @fmichonneau) ,
So attached is the "revised" portals dataset...I've vetted it with both the data lessons (02-starting-with-data.Rmd & 03-dplyr.Rmd. I replaced the blanks in the surveys$sex column with "blank", which then explicitly reads in as a character or a factor using either read_csv/read.csv. All rows/cols remain the same, and the section on factors with the barplots still works, since it's indexed by number and not by name.

The only thing that needs to be updated is the barplot figure in the "starting with data lesson". I'm not sure how the overall R-ecology site is built/knitted, but if someone reknits the Rmd it should take care of itself (no code changes needed).

The code I used to make this change and write it out is below, let me know if you all need anything else! :)

download.file(url="https://ndownloader.figshare.com/files/2292169",
              # destfile = "data/portal_data_joined.csv")
surveys.csv <- read.csv("data/portal_data_joined.csv")

# Change the "" in the sex to a character level called "blank". This would still permit using barplot in the starting with data lesson
library(dplyr)
surveys.csv.revised <- surveys.csv %>% 
  mutate(
    sex = as.character(sex),
    sex = case_when(
    grepl("^$", sex) ~ "blank",
    TRUE ~ sex))

str(surveys.csv.revised$sex)
summary(as.factor(surveys.csv.revised$sex))

write.csv(surveys.csv.revised, "data/portal_data_joined_revised.csv", row.names = FALSE, na = "")

portal_data_joined_revised.csv.txt

fmichonneau · 2020-10-20T09:45:26Z

addressed with #663

Teebusch mentioned this issue Oct 2, 2020

Starting with data episode generates a few issues in R 4.0.0 #609

Closed

Teebusch mentioned this issue Oct 15, 2020

Use readr::read_csv() throughout the lesson #663

Merged

fmichonneau closed this as completed Oct 20, 2020

vanilink mentioned this issue May 9, 2021

read_csv vs read.csv #710

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easy to make read_csv typo results in different data #553

Easy to make read_csv typo results in different data #553

ebolyen commented Jul 24, 2019

ryanpeek commented Jul 24, 2019

tracykteal commented Jul 24, 2019 •

edited

Loading

ryanpeek commented Jul 24, 2019

fmichonneau commented Oct 20, 2020

Easy to make read_csv typo results in different data #553

Easy to make read_csv typo results in different data #553

Comments

ebolyen commented Jul 24, 2019

ryanpeek commented Jul 24, 2019

tracykteal commented Jul 24, 2019 • edited Loading

ryanpeek commented Jul 24, 2019

fmichonneau commented Oct 20, 2020

tracykteal commented Jul 24, 2019 •

edited

Loading