Skip to content

Commit

Permalink
data set -> dataset, closes #75
Browse files Browse the repository at this point in the history
  • Loading branch information
mine-cetinkaya-rundel committed May 28, 2024
1 parent 9f0ce66 commit d2a9b72
Show file tree
Hide file tree
Showing 123 changed files with 190 additions and 185 deletions.
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: openintro
Title: Data Sets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Version: 2.4.0
Title: Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Version: 2.5.0
Authors@R: c(
person("Mine", "\u00C7etinkaya-Rundel", email = "cetinkaya.mine@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0001-6452-2420")),
person("David", "Diez", email = "david.m.diez@gmail.com", role = c("aut")),
Expand All @@ -13,7 +13,7 @@ Authors@R: c(
)
Description: Supplemental functions and data for 'OpenIntro' resources, which
includes open-source textbooks and resources for introductory statistics
(<https://www.openintro.org/>). The package contains data sets used in our
(<https://www.openintro.org/>). The package contains datasets used in our
open-source textbooks along with custom plotting functions for reproducing
book figures. Note that many functions and examples include color
transparency; some plotting elements may not show up properly (or at all)
Expand Down
6 changes: 3 additions & 3 deletions R/buildAxis.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
#' of labels on the axis. This function is still under development.
#'
#' The primary reason behind building this function was to allow a plot to be
#' created with similar features but with different data sets. For instance, if
#' a set of code was written for one data set and the function \code{axis} had
#' created with similar features but with different datasets. For instance, if
#' a set of code was written for one dataset and the function \code{axis} had
#' been utilized with pre-specified values, the axis may not match the plot of
#' a new set of data. The function \code{buildAxis} addresses this problem by
#' allowing the number of axis labels to be specified and controlled.
Expand All @@ -15,7 +15,7 @@
#' with the best score.
#'
#' @param side The side of the plot where to add the axis.
#' @param limits Either lower and upper limits on the axis or a data set.
#' @param limits Either lower and upper limits on the axis or a dataset.
#' @param n The preferred number of axis labels.
#' @param nMin The minimum number of axis labels.
#' @param nMax The maximum number of axis labels.
Expand Down
2 changes: 1 addition & 1 deletion R/data-absenteeism.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
#' @source Venables WN, Ripley BD. 2002. Modern Applied Statistics with S.
#' Fourth Edition. New York: Springer.
#'
#' Data can also be found in the R `MASS` package under the data set name
#' Data can also be found in the R `MASS` package under the dataset name
#' `quine`.
#' @keywords datasets
#' @examples
Expand Down
2 changes: 1 addition & 1 deletion R/data-ami_occurrences.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Acute Myocardial Infarction (Heart Attack) Events
#'
#' This data set is simulated but contains realistic occurrences of AMI in NY
#' This dataset is simulated but contains realistic occurrences of AMI in NY
#' City.
#'
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-arbuthnot.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
#' \item{boys}{number of male christenings (births)}
#' \item{girls}{number of female christenings (births)}
#' }
#' @source These data are excerpted from the `Arbuthnot` data set in the
#' @source These data are excerpted from the `Arbuthnot` dataset in the
#' [HistData](https://CRAN.R-project.org/package=HistData) package.
#' @examples
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-association.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Simulated data for association plots
#'
#' Simulated data set.
#' Simulated dataset.
#'
#'
#' @name association
Expand Down
2 changes: 1 addition & 1 deletion R/data-ball_bearing.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Lifespan of ball bearings
#'
#' A simulated data set on lifespan of ball bearings.
#' A simulated dataset on lifespan of ball bearings.
#'
#'
#' @name ball_bearing
Expand Down
6 changes: 3 additions & 3 deletions R/data-births14.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#' US births
#'
#' Every year, the US releases to the public a large data set containing
#' information on births recorded in the country. This data set has been of
#' Every year, the US releases to the public a large dataset containing
#' information on births recorded in the country. This dataset has been of
#' interest to medical researchers who are studying the relation between habits
#' and practices of expectant mothers and the birth of their children. This is a
#' random sample of 1,000 cases from the data set released in 2014.
#' random sample of 1,000 cases from the dataset released in 2014.
#'
#' @source United States Department of Health and Human Services.
#' Centers for Disease Control and Prevention.
Expand Down
2 changes: 1 addition & 1 deletion R/data-books.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Sample of books on a shelf
#'
#' Simulated data set.
#' Simulated dataset.
#'
#'
#' @name books
Expand Down
2 changes: 1 addition & 1 deletion R/data-cars93.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' cars93
#'
#' A data frame with 54 rows and 6 columns. This data is a subset of the
#' \code{Cars93} data set from the \code{MASS} package.
#' \code{Cars93} dataset from the \code{MASS} package.
#'
#' These cars represent a random sample for 1993 models that were in both
#' \emph{Consumer Reports} and \emph{PACE Buying Guide}. Only vehicles of type
Expand Down
6 changes: 3 additions & 3 deletions R/data-children_gender_stereo.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' Stereotypes are common, but at what age do they start? This study
#' investigates stereotypes in young children aged 5-7 years old. There are
#' four studies reported in the paper, and all four data sets are provided here.
#' four studies reported in the paper, and all four datasets are provided here.
#'
#' The structure of the data object is a little unusual, so we recommend
#' reviewing the Examples section before starting your analysis.
Expand All @@ -22,7 +22,7 @@
#' that are among the following:
#' \describe{
#' \item{subject}{Subject ID. Note that Subject 1 in the first data frame
#' (data set) does \bold{not} correspond to Subject 1 in the second data frame.}
#' (dataset) does \bold{not} correspond to Subject 1 in the second data frame.}
#' \item{gender}{Gender of the subject.}
#' \item{age}{Age of the subject, in years.}
#' \item{trait}{The trait that the children were making a judgement about,
Expand Down Expand Up @@ -55,7 +55,7 @@
#' @keywords datasets
#' @examples
#'
#' # This data set is a little funny to work with.
#' # This dataset is a little funny to work with.
#' # If wanting to review the data for a study, we
#' # recommend first assigning the corresponding
#' # data frame to a new variable. For instance,
Expand Down
2 changes: 1 addition & 1 deletion R/data-climate70.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
#' # Data sampled are from the US, Europe, and Australia.
#' # This geographic limitation may be due to the particular
#' # years considered, since locations without both 1948 and
#' # 2018 were discarded for this (simple) data set.
#' # 2018 were discarded for this (simple) dataset.
#' plot(climate70$longitude, climate70$latitude)
#'
#' plot(climate70$dx70_1948, climate70$dx70_2018)
Expand Down
4 changes: 2 additions & 2 deletions R/data-corr_match.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Sample data sets for correlation problems
#' Sample datasets for correlation problems
#'
#' Simulated data.
#'
Expand All @@ -18,7 +18,7 @@
#' \item{y7}{a numeric vector}
#' \item{y8}{a numeric vector}
#' }
#' @source Simulated data set.
#' @source Simulated dataset.
#' @keywords datasets
#' @examples
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-cpr.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' CPR data set
#' CPR dataset
#'
#' These patients were randomly divided into a treatment group where they
#' received a blood thinner or the control group where they did not receive a
Expand Down
2 changes: 1 addition & 1 deletion R/data-credits.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' College credits.
#'
#' A simulated data set of number of credits taken by college students each
#' A simulated dataset of number of credits taken by college students each
#' semester.
#'
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-drone_blades.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Quadcopter Drone Blades
#'
#' Quality control data set for quadcopter drone blades, where this data has
#' Quality control dataset for quadcopter drone blades, where this data has
#' been made up for an example.
#'
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-email50.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Sample of 50 emails
#'
#' This is a subsample of the \code{\link{email}} data set.
#' This is a subsample of the \code{\link{email}} dataset.
#'
#'
#' @name email50
Expand Down
2 changes: 1 addition & 1 deletion R/data-env_regulation.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
#'
#' The actual sample size was 1012. However, the original data were not from a
#' simple random sample; after accounting for the design, the equivalent sample
#' size was about 705, which was what was used for the data set here to keep
#' size was about 705, which was what was used for the dataset here to keep
#' things simpler for intro stat analyses.
#'
#' @name env_regulation
Expand Down
2 changes: 1 addition & 1 deletion R/data-esi.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Environmental Sustainability Index 2005
#'
#' This data set comes from the 2005 Environmental Sustainability Index:
#' This dataset comes from the 2005 Environmental Sustainability Index:
#' Benchmarking National Environmental Stewardship. Countries are given an
#' overall sustainability score as well as scores in each of several different
#' environmental areas.
Expand Down
2 changes: 1 addition & 1 deletion R/data-family_college.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Simulated sample of parent / teen college attendance
#'
#' A simulated data set based on real population summaries.
#' A simulated dataset based on real population summaries.
#'
#'
#' @name family_college
Expand Down
2 changes: 1 addition & 1 deletion R/data-friday.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Friday the 13th
#'
#' This data set addresses issues of how superstitions regarding Friday the
#' This dataset addresses issues of how superstitions regarding Friday the
#' 13th affect human behavior, and whether Friday the 13th is an unlucky day.
#' Scanlon, et al. collected data on traffic and shopping patterns and accident
#' frequency for Fridays the 6th and 13th between October of 1989 and November
Expand Down
4 changes: 2 additions & 2 deletions R/data-gradestv.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#' Simulated data for analyzing the relationship between watching TV and grades
#'
#' This is a simulated data set to be used to estimate the relationship between
#' This is a simulated dataset to be used to estimate the relationship between
#' number of hours per week students watch TV and the grade they got in a
#' statistics class.
#'
#' There are a few potential outliers in this data set. When analyzing the data
#' There are a few potential outliers in this dataset. When analyzing the data
#' one should consider how (if at all) these outliers may affect the estimates
#' of correlation coefficient and regression parameters.
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-housing.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Simulated data set on student housing
#' Simulated dataset on student housing
#'
#' Each observation represents a simulated rent price for a student.
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-ipod.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Length of songs on an iPod
#'
#' A simulated data set on lengths of songs on an iPod.
#' A simulated dataset on lengths of songs on an iPod.
#'
#'
#' @name ipod
Expand Down
4 changes: 2 additions & 2 deletions R/data-jury.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Simulated juror data set
#' Simulated juror dataset
#'
#' Simulated data set of registered voters proportions and representation on
#' Simulated dataset of registered voters proportions and representation on
#' juries.
#'
#'
Expand Down
4 changes: 2 additions & 2 deletions R/data-loans_full_schema.R
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
#' Loan data from Lending Club
#'
#' This data set represents thousands of loans made through the Lending Club
#' This dataset represents thousands of loans made through the Lending Club
#' platform, which is a platform that allows individuals to lend to other
#' individuals. Of course, not all loans are created equal. Someone who is a
#' essentially a sure bet to pay back a loan will have an easier time getting a
#' loan with a low interest rate than someone who appears to be riskier. And
#' for people who are very risky? They may not even get a loan offer, or they
#' may not have accepted the loan offer due to a high interest rate. It is
#' important to keep that last part in mind, since this data set only
#' important to keep that last part in mind, since this dataset only
#' represents loans actually made, i.e. do not mistake this data for loan
#' applications!
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-london_murders.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#' recorded in the Greater London area by the Metropolitan Police from January
#' 1, 2006 to September 7, 2011.
#'
#' To visualize this data set using a map, see the
#' To visualize this dataset using a map, see the
#' \code{\link{london_boroughs}} dataset, which contains the latitude and
#' longitude of polygons that define the boundaries of the 32 boroughs of
#' Greater London.
Expand Down
2 changes: 1 addition & 1 deletion R/data-mammals.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Sleep in Mammals
#'
#' This data set includes data for 39 species of mammals distributed over 13
#' This dataset includes data for 39 species of mammals distributed over 13
#' orders. The data were used for analyzing the relationship between
#' constitutional and ecological factors and sleeping in mammals. Two
#' qualitatively different sleep variables (dreaming and non dreaming) were
Expand Down
6 changes: 3 additions & 3 deletions R/data-mariokart.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
#' one should do when encountering an outlier: examine the data point and
#' remove it only if there is a good reason. In these two cases, we can see
#' from the auction titles that they included other items in their auctions
#' besides the game, which justifies removing them from the data set.
#' besides the game, which justifies removing them from the dataset.
#'
#' This data set includes all auctions for a full week in October 2009.
#' Auctions were included in the data set if they satisfied a number of
#' This dataset includes all auctions for a full week in October 2009.
#' Auctions were included in the dataset if they satisfied a number of
#' conditions. (1) They were included in a search for "wii mario kart" on
#' ebay.com, (2) items were in the Video Games > Games > Nintendo Wii section
#' of Ebay, (3) the listing was an auction and not exclusively a "Buy it Now"
Expand Down
6 changes: 3 additions & 3 deletions R/data-military.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
#' This dataset contains demographic information on every member of the US
#' armed forces including gender, race, and rank.
#'
#' The branches covered by this data set include the Army, Navy, Air Force, and
#' Marine Corps. Demographic information on the Coast Guard is contained in
#' the original data set but has not been included here.
#' The branches covered by this dataset include the Army, Navy, Air Force, and
#' Marine Corps. Demographic information on the Coast Guard is contained in
#' the original dataset but has not been included here.
#'
#' @name military
#' @docType data
Expand Down
2 changes: 1 addition & 1 deletion R/data-mlb_teams.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Major League Baseball Teams Data.
#'
#' A subset of data on Major League Baseball teams from
#' Lahman's Baseball Database. The full data set is available
#' Lahman's Baseball Database. The full dataset is available
#' in the [Lahman R package](https://github.com/cdalzell/Lahman).
#'
#' @name mlb_teams
Expand Down
2 changes: 1 addition & 1 deletion R/data-movies.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' movies
#'
#' A data set with information about movies released in 2003.
#' A dataset with information about movies released in 2003.
#'
#' @name movies
#' @docType data
Expand Down
2 changes: 1 addition & 1 deletion R/data-mtl.R
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
#' \doi{10.1371/journal.pone.0195549}.
#'
#' Thank you to Professor Silas Bergen of Winona State University for pointing
#' us to this data set!
#' us to this dataset!
#' @keywords datasets
#' @examples
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-nba_finals.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' NBA Finals History
#'
#' This data set contains information about the teams who played in the NBA Finals from 1950 - 2022.
#' This dataset contains information about the teams who played in the NBA Finals from 1950 - 2022.
#'
#' @name nba_finals
#' @docType data
Expand Down
2 changes: 1 addition & 1 deletion R/data-nba_finals_teams.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' NBA Finals Team Summary
#'
#' A data set with individual team summaries for the NBA Finals series from 1950 to 2022. To win the Finals, a team must win 4 games. The maximum number of games in a series is 7.
#' A dataset with individual team summaries for the NBA Finals series from 1950 to 2022. To win the Finals, a team must win 4 games. The maximum number of games in a series is 7.
#'
#' Notes:
#' 1. The Chicago Stags folded in 1950, the Washington Capitols in 1951 and the Baltimore Bullets in 1954.
Expand Down
6 changes: 3 additions & 3 deletions R/data-ncbirths.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#' North Carolina births, 1000 cases
#'
#' In 2004, the state of North Carolina released to the public a large data set
#' containing information on births recorded in this state. This data set has
#' In 2004, the state of North Carolina released to the public a large dataset
#' containing information on births recorded in this state. This dataset has
#' been of interest to medical researchers who are studying the relation
#' between habits and practices of expectant mothers and the birth of their
#' children. This is a random sample of 1,000 cases from this data set.
#' children. This is a random sample of 1,000 cases from this dataset.
#'
#' @name ncbirths
#' @docType data
Expand Down
4 changes: 3 additions & 1 deletion R/data-nyc.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#' nyc
#'
#' Zagat is a public survey where anyone can provide scores to a restaurant. The scores from the general public are then gathered to produce ratings. This data set contains a list of 168 NYC restaurants and their Zagat Ratings.
#' Zagat is a public survey where anyone can provide scores to a restaurant.
#' The scores from the general public are then gathered to produce ratings.
#' This dataset contains a list of 168 NYC restaurants and their Zagat Ratings.
#'
#' For each category the scales are as follows:
#'
Expand Down
2 changes: 1 addition & 1 deletion R/data-outliers.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Simulated data sets for different types of outliers
#' Simulated datasets for different types of outliers
#'
#' Data sets for showing different types of outliers
#'
Expand Down
Loading

0 comments on commit d2a9b72

Please sign in to comment.