Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting with data episode generates a few issues in R 4.0.0 #609

Closed
zenrabbit opened this issue Apr 29, 2020 · 6 comments
Closed

Starting with data episode generates a few issues in R 4.0.0 #609

zenrabbit opened this issue Apr 29, 2020 · 6 comments

Comments

@zenrabbit
Copy link

zenrabbit commented Apr 29, 2020

R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4
Starting with data episode

  1. In the first challenge, 'Based on the output of str(surveys), can you answer the following questions?', there are 40 species not 48 as proposed in the answer.

  2. In the Renaming factors subsection, plot(surveys$sex)returns an error and warning messages because read.csv read in the vector sex as character not factor in R 4.0.0. Subsequent code such as levels(sex) etc return NULL so this solution would be a good learning revision
    sex <- factor(surveys$sex)

@zenrabbit zenrabbit changed the title Starting with data episode generates a few errors in R 4.0.0 Starting with data episode generates a few issues in R 4.0.0 Apr 29, 2020
@justinshaffer
Copy link
Contributor

justinshaffer commented Jun 30, 2020

Hello!

Regarding (1), it seems that with categorical data read in as characters vs. factors, the answer to the third question in the first challenge cannot be discerned as the output from str(surveys) does not provide the number of levels for surveys$species. This Challenge could be re-tooled to accommodate this question, or alternatively differences between character vs. factor could be explored elsewhere as you suggest in (2).

Also regarding (1), if you run the following code you will see that indeed there are 48 species:

surveys$species_factor <- as.factor(surveys$species)
str(surveys)

Thoughts?

@jebyrnes
Copy link
Contributor

jebyrnes commented Jul 9, 2020

Hrm....

surveys$species_factor <- as.factor(surveys$species)
str(surveys)

....

$ species_factor : Factor w/ 40 levels "albigula","audubonii",..: 1 1 1 1 1 1 1 1 1 1 ...

Also

> length(unique(surveys$species))
[1] 40

I think the issue is, there are 48 if you combine genus and species

> length(unique(paste(surveys$genus, surveys$species)))
[1] 48

@jebyrnes
Copy link
Contributor

jebyrnes commented Jul 9, 2020

OK, I'm an idiot, and I apologies - the species_id column has 48 unique entries - perhaps we can clarify which column they need to look at. HA!

@Teebusch
Copy link
Contributor

Teebusch commented Oct 2, 2020

1 - Indeed, the species column has multiple entries that are "sp.". For non-ecologists, the hierarchy of genus and species may not be obvious. I think this issue can be solved easily by adding a short note to the challenge (e.g., "hint: use the species_id column" to find all unique species)

2 - The issue with stringsAsFactors in R>4.0 is larger. One temporary and not too confusing workaround would be to first show table(surveys$sex) and then barplot(table(surveys$sex)). This works regardless of whether the column is a factor or a string vector and it that sense it might be most "future proof".

@Teebusch
Copy link
Contributor

Teebusch commented Oct 2, 2020

Related to Issues #471 #608 #553 #555 #559 #616

@fmichonneau
Copy link
Member

addressed with #663

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants