data set -> dataset, closes #75

OpenIntroStat · May 28, 2024 · d2a9b72 · d2a9b72
1 parent 9f0ce66
commit d2a9b72
Show file tree

Hide file tree

Showing 123 changed files with 190 additions and 185 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: openintro
-Title: Data Sets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
-Version: 2.4.0
+Title: Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
+Version: 2.5.0
 Authors@R: c(
     person("Mine", "\u00C7etinkaya-Rundel", email = "cetinkaya.mine@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0001-6452-2420")),
     person("David", "Diez", email = "david.m.diez@gmail.com", role = c("aut")),
@@ -13,7 +13,7 @@ Authors@R: c(
     )
 Description: Supplemental functions and data for 'OpenIntro' resources, which 
     includes open-source textbooks and resources for introductory statistics 
-    (<https://www.openintro.org/>). The package contains data sets used in our 
+    (<https://www.openintro.org/>). The package contains datasets used in our 
     open-source textbooks along with custom plotting functions for reproducing 
     book figures. Note that many functions and examples include color 
     transparency; some plotting elements may not show up properly (or at all) 

diff --git a/R/buildAxis.R b/R/buildAxis.R
@@ -4,8 +4,8 @@
 #' of labels on the axis. This function is still under development.
 #'
 #' The primary reason behind building this function was to allow a plot to be
-#' created with similar features but with different data sets. For instance, if
-#' a set of code was written for one data set and the function \code{axis} had
+#' created with similar features but with different datasets. For instance, if
+#' a set of code was written for one dataset and the function \code{axis} had
 #' been utilized with pre-specified values, the axis may not match the plot of
 #' a new set of data. The function \code{buildAxis} addresses this problem by
 #' allowing the number of axis labels to be specified and controlled.
@@ -15,7 +15,7 @@
 #' with the best score.
 #'
 #' @param side The side of the plot where to add the axis.
-#' @param limits Either lower and upper limits on the axis or a data set.
+#' @param limits Either lower and upper limits on the axis or a dataset.
 #' @param n The preferred number of axis labels.
 #' @param nMin The minimum number of axis labels.
 #' @param nMax The maximum number of axis labels.

diff --git a/R/data-absenteeism.R b/R/data-absenteeism.R
@@ -20,7 +20,7 @@
 #' @source Venables WN, Ripley BD. 2002. Modern Applied Statistics with S.
 #' Fourth Edition. New York: Springer.
 #'
-#' Data can also be found in the R `MASS` package under the data set name
+#' Data can also be found in the R `MASS` package under the dataset name
 #' `quine`.
 #' @keywords datasets
 #' @examples

diff --git a/R/data-ami_occurrences.R b/R/data-ami_occurrences.R
@@ -1,6 +1,6 @@
 #' Acute Myocardial Infarction (Heart Attack) Events
 #'
-#' This data set is simulated but contains realistic occurrences of AMI in NY
+#' This dataset is simulated but contains realistic occurrences of AMI in NY
 #' City.
 #'
 #'

diff --git a/R/data-arbuthnot.R b/R/data-arbuthnot.R
@@ -15,7 +15,7 @@
 #'   \item{boys}{number of male christenings (births)}
 #'   \item{girls}{number of female christenings (births)}
 #' }
-#' @source These data are excerpted from the `Arbuthnot` data set in the
+#' @source These data are excerpted from the `Arbuthnot` dataset in the
 #' [HistData](https://CRAN.R-project.org/package=HistData) package.
 #' @examples
 #'

diff --git a/R/data-association.R b/R/data-association.R
@@ -1,6 +1,6 @@
 #' Simulated data for association plots
 #'
-#' Simulated data set.
+#' Simulated dataset.
 #'
 #'
 #' @name association

diff --git a/R/data-ball_bearing.R b/R/data-ball_bearing.R
@@ -1,6 +1,6 @@
 #' Lifespan of ball bearings
 #'
-#' A simulated data set on lifespan of ball bearings.
+#' A simulated dataset on lifespan of ball bearings.
 #'
 #'
 #' @name ball_bearing

diff --git a/R/data-births14.R b/R/data-births14.R
@@ -1,10 +1,10 @@
 #' US births
 #'
-#' Every year, the US releases to the public a large data set containing
-#' information on births recorded in the country. This data set has been of
+#' Every year, the US releases to the public a large dataset containing
+#' information on births recorded in the country. This dataset has been of
 #' interest to medical researchers who are studying the relation between habits
 #' and practices of expectant mothers and the birth of their children. This is a
-#' random sample of 1,000 cases from the data set released in 2014.
+#' random sample of 1,000 cases from the dataset released in 2014.
 #'
 #' @source United States Department of Health and Human Services.
 #' Centers for Disease Control and Prevention.

diff --git a/R/data-books.R b/R/data-books.R
@@ -1,6 +1,6 @@
 #' Sample of books on a shelf
 #'
-#' Simulated data set.
+#' Simulated dataset.
 #'
 #'
 #' @name books

diff --git a/R/data-cars93.R b/R/data-cars93.R
@@ -1,7 +1,7 @@
 #' cars93
 #'
 #' A data frame with 54 rows and 6 columns. This data is a subset of the
-#' \code{Cars93} data set from the \code{MASS} package.
+#' \code{Cars93} dataset from the \code{MASS} package.
 #'
 #' These cars represent a random sample for 1993 models that were in both
 #' \emph{Consumer Reports} and \emph{PACE Buying Guide}. Only vehicles of type

diff --git a/R/data-children_gender_stereo.R b/R/data-children_gender_stereo.R
@@ -2,7 +2,7 @@
 #'
 #' Stereotypes are common, but at what age do they start? This study
 #' investigates stereotypes in young children aged 5-7 years old. There are
-#' four studies reported in the paper, and all four data sets are provided here.
+#' four studies reported in the paper, and all four datasets are provided here.
 #'
 #' The structure of the data object is a little unusual, so we recommend
 #' reviewing the Examples section before starting your analysis.
@@ -22,7 +22,7 @@
 #' that are among the following:
 #' \describe{
 #'   \item{subject}{Subject ID. Note that Subject 1 in the first data frame
-#'   (data set) does \bold{not} correspond to Subject 1 in the second data frame.}
+#'   (dataset) does \bold{not} correspond to Subject 1 in the second data frame.}
 #'   \item{gender}{Gender of the subject.}
 #'   \item{age}{Age of the subject, in years.}
 #'   \item{trait}{The trait that the children were making a judgement about,
@@ -55,7 +55,7 @@
 #' @keywords datasets
 #' @examples
 #'
-#' # This data set is a little funny to work with.
+#' # This dataset is a little funny to work with.
 #' # If wanting to review the data for a study, we
 #' # recommend first assigning the corresponding
 #' # data frame to a new variable. For instance,

diff --git a/R/data-climate70.R b/R/data-climate70.R
@@ -28,7 +28,7 @@
 #' # Data sampled are from the US, Europe, and Australia.
 #' # This geographic limitation may be due to the particular
 #' # years considered, since locations without both 1948 and
-#' # 2018 were discarded for this (simple) data set.
+#' # 2018 were discarded for this (simple) dataset.
 #' plot(climate70$longitude, climate70$latitude)
 #'
 #' plot(climate70$dx70_1948, climate70$dx70_2018)

diff --git a/R/data-corr_match.R b/R/data-corr_match.R
@@ -1,4 +1,4 @@
-#' Sample data sets for correlation problems
+#' Sample datasets for correlation problems
 #'
 #' Simulated data.
 #'
@@ -18,7 +18,7 @@
 #'   \item{y7}{a numeric vector}
 #'   \item{y8}{a numeric vector}
 #'   }
-#' @source Simulated data set.
+#' @source Simulated dataset.
 #' @keywords datasets
 #' @examples
 #'

diff --git a/R/data-cpr.R b/R/data-cpr.R
@@ -1,4 +1,4 @@
-#' CPR data set
+#' CPR dataset
 #'
 #' These patients were randomly divided into a treatment group where they
 #' received a blood thinner or the control group where they did not receive a

diff --git a/R/data-credits.R b/R/data-credits.R
@@ -1,6 +1,6 @@
 #' College credits.
 #'
-#' A simulated data set of number of credits taken by college students each
+#' A simulated dataset of number of credits taken by college students each
 #' semester.
 #'
 #'

diff --git a/R/data-drone_blades.R b/R/data-drone_blades.R
@@ -1,6 +1,6 @@
 #' Quadcopter Drone Blades
 #'
-#' Quality control data set for quadcopter drone blades, where this data has
+#' Quality control dataset for quadcopter drone blades, where this data has
 #' been made up for an example.
 #'
 #'

diff --git a/R/data-email50.R b/R/data-email50.R
@@ -1,6 +1,6 @@
 #' Sample of 50 emails
 #'
-#' This is a subsample of the \code{\link{email}} data set.
+#' This is a subsample of the \code{\link{email}} dataset.
 #'
 #'
 #' @name email50

diff --git a/R/data-env_regulation.R b/R/data-env_regulation.R
@@ -12,7 +12,7 @@
 #'
 #' The actual sample size was 1012. However, the original data were not from a
 #' simple random sample; after accounting for the design, the equivalent sample
-#' size was about 705, which was what was used for the data set here to keep
+#' size was about 705, which was what was used for the dataset here to keep
 #' things simpler for intro stat analyses.
 #'
 #' @name env_regulation

diff --git a/R/data-esi.R b/R/data-esi.R
@@ -1,6 +1,6 @@
 #' Environmental Sustainability Index 2005
 #'
-#' This data set comes from the 2005 Environmental Sustainability Index:
+#' This dataset comes from the 2005 Environmental Sustainability Index:
 #' Benchmarking National Environmental Stewardship.  Countries are given an
 #' overall sustainability score as well as scores in each of several different
 #' environmental areas.

diff --git a/R/data-family_college.R b/R/data-family_college.R
@@ -1,6 +1,6 @@
 #' Simulated sample of parent / teen college attendance
 #'
-#' A simulated data set based on real population summaries.
+#' A simulated dataset based on real population summaries.
 #'
 #'
 #' @name family_college

diff --git a/R/data-friday.R b/R/data-friday.R
@@ -1,6 +1,6 @@
 #' Friday the 13th
 #'
-#' This data set addresses issues of how superstitions regarding Friday the
+#' This dataset addresses issues of how superstitions regarding Friday the
 #' 13th affect human behavior, and whether Friday the 13th is an unlucky day.
 #' Scanlon, et al. collected data on traffic and shopping patterns and accident
 #' frequency for Fridays the 6th and 13th between October of 1989 and November

diff --git a/R/data-gradestv.R b/R/data-gradestv.R
@@ -1,10 +1,10 @@
 #' Simulated data for analyzing the relationship between watching TV and grades
 #'
-#' This is a simulated data set to be used to estimate the relationship between
+#' This is a simulated dataset to be used to estimate the relationship between
 #' number of hours per week students watch TV and the grade they got in a
 #' statistics class.
 #'
-#' There are a few potential outliers in this data set. When analyzing the data
+#' There are a few potential outliers in this dataset. When analyzing the data
 #' one should consider how (if at all) these outliers may affect the estimates
 #' of correlation coefficient and regression parameters.
 #'

diff --git a/R/data-housing.R b/R/data-housing.R
@@ -1,4 +1,4 @@
-#' Simulated data set on student housing
+#' Simulated dataset on student housing
 #'
 #' Each observation represents a simulated rent price for a student.
 #'

diff --git a/R/data-ipod.R b/R/data-ipod.R
@@ -1,6 +1,6 @@
 #' Length of songs on an iPod
 #'
-#' A simulated data set on lengths of songs on an iPod.
+#' A simulated dataset on lengths of songs on an iPod.
 #'
 #'
 #' @name ipod

diff --git a/R/data-jury.R b/R/data-jury.R
@@ -1,6 +1,6 @@
-#' Simulated juror data set
+#' Simulated juror dataset
 #'
-#' Simulated data set of registered voters proportions and representation on
+#' Simulated dataset of registered voters proportions and representation on
 #' juries.
 #'
 #'

diff --git a/R/data-loans_full_schema.R b/R/data-loans_full_schema.R
@@ -1,13 +1,13 @@
 #' Loan data from Lending Club
 #'
-#' This data set represents thousands of loans made through the Lending Club
+#' This dataset represents thousands of loans made through the Lending Club
 #' platform, which is a platform that allows individuals to lend to other
 #' individuals. Of course, not all loans are created equal. Someone who is a
 #' essentially a sure bet to pay back a loan will have an easier time getting a
 #' loan with a low interest rate than someone who appears to be riskier. And
 #' for people who are very risky? They may not even get a loan offer, or they
 #' may not have accepted the loan offer due to a high interest rate. It is
-#' important to keep that last part in mind, since this data set only
+#' important to keep that last part in mind, since this dataset only
 #' represents loans actually made, i.e. do not mistake this data for loan
 #' applications!
 #'

diff --git a/R/data-london_murders.R b/R/data-london_murders.R
@@ -4,7 +4,7 @@
 #' recorded in the Greater London area by the Metropolitan Police from January
 #' 1, 2006 to September 7, 2011.
 #'
-#' To visualize this data set using a map, see the
+#' To visualize this dataset using a map, see the
 #' \code{\link{london_boroughs}} dataset, which contains the latitude and
 #' longitude of polygons that define the boundaries of the 32 boroughs of
 #' Greater London.

diff --git a/R/data-mammals.R b/R/data-mammals.R
@@ -1,6 +1,6 @@
 #' Sleep in Mammals
 #'
-#' This data set includes data for 39 species of mammals distributed over 13
+#' This dataset includes data for 39 species of mammals distributed over 13
 #' orders. The data were used for analyzing the relationship between
 #' constitutional and ecological factors and sleeping in mammals. Two
 #' qualitatively different sleep variables (dreaming and non dreaming) were

diff --git a/R/data-mariokart.R b/R/data-mariokart.R
@@ -8,10 +8,10 @@
 #' one should do when encountering an outlier: examine the data point and
 #' remove it only if there is a good reason. In these two cases, we can see
 #' from the auction titles that they included other items in their auctions
-#' besides the game, which justifies removing them from the data set.
+#' besides the game, which justifies removing them from the dataset.
 #'
-#' This data set includes all auctions for a full week in October 2009.
-#' Auctions were included in the data set if they satisfied a number of
+#' This dataset includes all auctions for a full week in October 2009.
+#' Auctions were included in the dataset if they satisfied a number of
 #' conditions. (1) They were included in a search for "wii mario kart" on
 #' ebay.com, (2) items were in the Video Games > Games > Nintendo Wii section
 #' of Ebay, (3) the listing was an auction and not exclusively a "Buy it Now"

diff --git a/R/data-military.R b/R/data-military.R
@@ -3,9 +3,9 @@
 #' This dataset contains demographic information on every member of the US
 #' armed forces including gender, race, and rank.
 #'
-#' The branches covered by this data set include the Army, Navy, Air Force, and
-#' Marine Corps.  Demographic information on the Coast Guard is contained in
-#' the original data set but has not been included here.
+#' The branches covered by this dataset include the Army, Navy, Air Force, and
+#' Marine Corps. Demographic information on the Coast Guard is contained in
+#' the original dataset but has not been included here.
 #'
 #' @name military
 #' @docType data

diff --git a/R/data-mlb_teams.R b/R/data-mlb_teams.R
@@ -1,7 +1,7 @@
 #' Major League Baseball Teams Data.
 #'
 #' A subset of data on Major League Baseball teams from
-#' Lahman's Baseball Database. The full data set is available
+#' Lahman's Baseball Database. The full dataset is available
 #' in the [Lahman R package](https://github.com/cdalzell/Lahman).
 #'
 #' @name mlb_teams

diff --git a/R/data-movies.R b/R/data-movies.R
@@ -1,6 +1,6 @@
 #' movies
 #'
-#' A data set with information about movies released in 2003.
+#' A dataset with information about movies released in 2003.
 #'
 #' @name movies
 #' @docType data

diff --git a/R/data-mtl.R b/R/data-mtl.R
@@ -48,7 +48,7 @@
 #' \doi{10.1371/journal.pone.0195549}.
 #'
 #' Thank you to Professor Silas Bergen of Winona State University for pointing
-#' us to this data set!
+#' us to this dataset!
 #' @keywords datasets
 #' @examples
 #'

diff --git a/R/data-nba_finals.R b/R/data-nba_finals.R
@@ -1,6 +1,6 @@
 #' NBA Finals History
 #'
-#' This data set contains information about the teams who played in the NBA Finals from 1950 - 2022.
+#' This dataset contains information about the teams who played in the NBA Finals from 1950 - 2022.
 #'
 #' @name nba_finals
 #' @docType data

diff --git a/R/data-nba_finals_teams.R b/R/data-nba_finals_teams.R
@@ -1,6 +1,6 @@
 #' NBA Finals Team Summary
 #'
-#' A data set with individual team summaries for the NBA Finals series from 1950 to 2022. To win the Finals, a team must win 4 games. The maximum number of games in a series is 7.
+#' A dataset with individual team summaries for the NBA Finals series from 1950 to 2022. To win the Finals, a team must win 4 games. The maximum number of games in a series is 7.
 #'
 #' Notes:
 #' 1. The Chicago Stags folded in 1950, the Washington Capitols in 1951 and the Baltimore Bullets in 1954.

diff --git a/R/data-ncbirths.R b/R/data-ncbirths.R
@@ -1,10 +1,10 @@
 #' North Carolina births, 1000 cases
 #'
-#' In 2004, the state of North Carolina released to the public a large data set
-#' containing information on births recorded in this state. This data set has
+#' In 2004, the state of North Carolina released to the public a large dataset
+#' containing information on births recorded in this state. This dataset has
 #' been of interest to medical researchers who are studying the relation
 #' between habits and practices of expectant mothers and the birth of their
-#' children. This is a random sample of 1,000 cases from this data set.
+#' children. This is a random sample of 1,000 cases from this dataset.
 #'
 #' @name ncbirths
 #' @docType data

diff --git a/R/data-nyc.R b/R/data-nyc.R
@@ -1,6 +1,8 @@
 #' nyc
 #'
-#' Zagat is a public survey where anyone can provide scores to a restaurant. The scores from the general public are then gathered to produce ratings. This data set contains a list of 168 NYC restaurants and their Zagat Ratings.
+#' Zagat is a public survey where anyone can provide scores to a restaurant.
+#' The scores from the general public are then gathered to produce ratings.
+#' This dataset contains a list of 168 NYC restaurants and their Zagat Ratings.
 #'
 #' For each category the scales are as follows:
 #'

diff --git a/R/data-outliers.R b/R/data-outliers.R
@@ -1,4 +1,4 @@
-#' Simulated data sets for different types of outliers
+#' Simulated datasets for different types of outliers
 #'
 #' Data sets for showing different types of outliers
 #'