Skip to content

Commit

Permalink
Merge pull request #139 from ErinBecker/gh-pages
Browse files Browse the repository at this point in the history
Fix exercise formatting
  • Loading branch information
tracykteal authored Apr 28, 2017
2 parents d4c6dbb + 1850798 commit 2e6f521
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 4 deletions.
10 changes: 9 additions & 1 deletion _episodes/01-working-with-openrefine.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,9 @@ along with a number representing how many times that value occurs in the column.
4. Try sorting this facet by name and by count. Do you notice any problems with the data? What are they?
5. Hover the mouse over one of the names in the `Facet` list. You should see that you have an `edit` function available.
6. You could use this to fix an error immediately, and OpenRefine will ask whether you want to make the same correction to every value it finds like that one. But OpenRefine offers even better ways to find and fix these errors, which we'll use instead. We'll learn about these when we talk about clustering.

> ## Solution
>
> There will be several near-identical entries in `scientificName`. For example, there is one entry for `Ammospermophilis harrisi` and
> one entry for `Ammospermophilus harrisii`. These are both misspellings of `Ammospermophilus harrisi`. We will see how to correct these
> misspelled and mistyped entries in a later exercise.
Expand All @@ -78,14 +80,17 @@ along with a number representing how many times that value occurs in the column.
> 2. Is the column formatted as Number, Date, or Text? How does changing the format change the faceting display?
>
> 3. Which years have the most and least observations?
>
> > ## Solution
> >
> > 1. For the column `yr` do `Facet` > `Text facet`. A box will appear in the left panel showing that there are 26 unique entries in
> > this column.
> > 2. By default, the column `yr` is formatted as Text. You can change the format by doing `Edit cells` > `Common transforms` >
> > `To number`. Doing `Facet` > `Numeric facet` creates a box in the left panel that shows a histogram of the number of
> > entries per year. Notice that the data is shown as a number, not a date. If you instead transform the column to a date, the
> > program will assume all entries are on January 1st of the year.
> > 3. After creating a facet, click `Sort by count` in the facet box. The year with the most observations is 1997. The least is 1977.
> > 3. After creating a facet, click `Sort by count` in the facet box. The year with the most observations is 1997. The least is 1977.
> >
> {: .solution}
{: .challenge}

Expand Down Expand Up @@ -120,6 +125,7 @@ If data in a column needs to be split into multiple columns, and the parts are s
5. Click `OK`. You'll get some new columns called `scientificName 1`, `scientificName 2`, and so on.
6. Notice that in some cases `scientificName 1` and `scientificName 2` are empty. Why is this? What do you think we
can do to fix this?

> ## Solution
>
> The entries that have data in `scientificName 3` and `scientificName 4` but not the first two `scientificName` columns
Expand All @@ -131,6 +137,7 @@ can do to fix this?
> ## Exercise
>
> Try to change the name of the second new column to "species". How can you correct the problem you encounter?
>
> > ## Solution
> >
> > On the `scientificName 2` column, click the down arrow and then `Edit column` > `Rename this column`. Type "species" into the box
Expand Down Expand Up @@ -160,6 +167,7 @@ Words with spaces at the beginning or end are particularly hard for we humans to
1. In the header for the column `scientificName`, choose `Edit cells` > `Common transforms` > `Trim leading and trailing whitespace`.
2. Notice that the `Split` step has now disappeared from the `Undo / Redo` pane on the left and is replaced with a `Text transform on 3 cells`
3. Perform the same `Split` operation on `scientificName` that you undid earlier. This time you should only get two new columns. Why?

> ## Solution
>
> Removing the leading white spaces means that each entry in this column has exactly one space (between the genus and species names).
Expand Down
12 changes: 10 additions & 2 deletions _episodes/02-filter-exclude-sort.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,14 @@ There are many entries in our data table. We can filter it to work on a subset o
>
> 1. What scientific names (genus and species) are selected by this procedure?
> 2. How would you restrict this to one of the species selected?
>
> > ## Solution
> > 1. Do `Facet` > `Text facet` on the `scientificName` column after filtering. This will show that
> > two names match your filter criteria. They are `Baiomys taylori` and `Chaetodipus baileyi`.
> > 2. To restrict to only one of these two species, you could make the search case sensitive or
> > you could split the `scientificName` column into species and genus before filtering or
> > you could include more letters in your filter.
> >
> {: .solution}
{: .challenge}

Expand All @@ -56,7 +58,8 @@ is currently selected, while filtering allows you to select a subset of your dat
> > 2. Click `include`. This will explicitly include this species, and exclude others that are not expicitly included. Notice that the
> option now changes to `exclude`.
> > 3. Click `include` and `exclude` on the other species (`Chaetodipus baileyi`) and notice how the two entries appear and disappear
> from the table.
> > from the table.
> >
> {: .solution}
{: .challenge}

Expand All @@ -82,9 +85,12 @@ If you try to re-sort a column that you have already used, the drop-down menu ch
* > `Sort` > `Remove sort` - This option allows you to undo your sort.
> ## Exercise
>
> Sort the data by `plot`. What year(s) were observations recorded for plot 1 in this filtered dataset.
>
> > ## Solution
> > In the `plot` column, select `Sort...` > `numbers` and select `smallest first`. The years represented are 1990 and 1995.
> >
> {: .solution}
{: .challenge}

Expand All @@ -98,14 +104,16 @@ You can sort by multiple columns by performing sort on additional columns. The s
> You might like to look for trends in your data by month of collection across years.
> 1. How do you sort your data by month?
> 2. How would you do this differently if you were instead trying to see all of your entries in chronological order?
>
> > ## Solution
> >
> > 1. For the `mo` column, click on `Sort...` and then `numbers`. This will group all entries made in, for example, January,
> > together, regardless of the year that entry was collected.
> > 2. For the `yr` column, click on `Sort` > `Sort...` > `numbers` and select `sort by this column alone`. This will undo the
> > sorting by month step. Once you've sorted by `yr` you can then apply another sorting step to sort by month within year. To do this
> > for the `mo` column, click on `Sort` > `numbers` but do not select `sort by this column alone`. To ensure that all entries are shown
> > chronologically, you will need to add a third sorting step by day within month.
> > chronologically, you will need to add a third sorting step by day within month.
> >
> {: .solution}
{: .challenge}

Expand Down
5 changes: 4 additions & 1 deletion _episodes/03-numbers.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,13 @@ To transform cells in the `recordID` column to numbers, click the down arrow for
> ## Exercise
>
> Transform three more columns, including `period`, from text to numbers. Can all columns be transformed to numbers?
>
> > ## Solution
> >
> > Only observations that include only numerals (0-9) can be transformed to numbers. If you apply a number transformation to
> > a column that doesn't meet this criteria, and then click the `Undo / Redo` tab, you will see a step that starts with
> > `Text transform on 0 cells`. This means that the data in that column was not transformed.
> >
> {: .solution}
{: .challenge}

Expand Down Expand Up @@ -59,7 +62,7 @@ Now that we have multiple columns representing numbers, we can see how they rela

## Examine pair of columns in detail

We can examine one pair of columns by clicking on its square in the `Scatterplot Matrix`` A new facet with only that pair will appear in the left margin.
We can examine one pair of columns by clicking on its square in the `Scatterplot Matrix` A new facet with only that pair will appear in the left margin.

> ## Exercise
>
Expand Down

0 comments on commit 2e6f521

Please sign in to comment.