From 2ef160a7ec434777c763359d717e360354c3e365 Mon Sep 17 00:00:00 2001 From: Gianmarco Alberti Date: Sun, 4 Feb 2018 09:01:15 +0100 Subject: [PATCH] v0.20 --- DESCRIPTION | 2 +- NAMESPACE | 1 + R/cols_cntr.R | 3 +- R/cols_corr.R | 1 + R/groupBycoord.R | 2 +- R/malinvaud.R | 65 ++++++++++++++++++++--------------- R/rescale.R | 36 +++++++++++++++++++ R/rows_cntr.R | 1 + R/rows_corr.R | 1 + README.md | 19 +++++++--- man/CAinterprTools-package.Rd | 2 +- man/cols.cntr.Rd | 3 +- man/cols.corr.Rd | 1 + man/groupBycoord.Rd | 2 +- man/malinvaud.Rd | 6 ++-- man/rescale.Rd | 32 +++++++++++++++++ man/rows.cntr.Rd | 1 + man/rows.corr.Rd | 1 + 18 files changed, 139 insertions(+), 40 deletions(-) create mode 100644 R/rescale.R create mode 100644 man/rescale.Rd diff --git a/DESCRIPTION b/DESCRIPTION index ddc5555..96d9987 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: CAinterprTools Title: Package for graphical aid in Correspondence Analysis interpretation and significance testing -Version: 0.19 +Version: 0.20 Authors@R: "Gianmarco ALberti [aut, cre]" Description: A number of interesting packages are available to perform Correspondence Analysis in R. At the best of my knowledge, they lack diff --git a/NAMESPACE b/NAMESPACE index a7c1c88..42857d8 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -14,6 +14,7 @@ export(cols.corr.scatter) export(cols.qlt) export(groupBycoord) export(malinvaud) +export(rescale) export(rows.cntr) export(rows.cntr.scatter) export(rows.corr) diff --git a/R/cols_cntr.R b/R/cols_cntr.R index c811bf3..c037c60 100644 --- a/R/cols_cntr.R +++ b/R/cols_cntr.R @@ -4,7 +4,8 @@ #' #' The function displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension. #' The parameter sort=TRUE sorts the categories in descending order of contribution to the inertia of the selected dimension. -#' At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. +#' At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. +#' The categories are grouped into two groups: 'major' and 'minor' contributors to the inertia of the selected dimension. #' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) reports the correlation (sqrt(COS2)) of the row categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each row category is correlated. #' @param data: name of the dataset (must be in dataframe format). #' @param x: dimension for which the column categories contribution is returned (1st dimension by default). diff --git a/R/cols_corr.R b/R/cols_corr.R index fcdddcc..9da3d8a 100644 --- a/R/cols_corr.R +++ b/R/cols_corr.R @@ -4,6 +4,7 @@ #' #' The function displays the correlation of the column categories with the selected dimension; the parameter sort=TRUE arrange the categories in decreasing order of correlation. #' At the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative. +#' The categories are grouped into two groups: categories correlated with the positive ('pole +') or negative ('pole -') pole of the selected dimension. #' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) indicates the row categories' contribution (in permills) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively. #' Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension. #' @param data: name of the dataset (must be in dataframe format). diff --git a/R/groupBycoord.R b/R/groupBycoord.R index c868b5f..8caf62f 100644 --- a/R/groupBycoord.R +++ b/R/groupBycoord.R @@ -7,7 +7,7 @@ #' The function also returns a dataframe storing the categories' coordinates on the selected dimension and the group each category belongs to. #' @param data: name of the dataset (must be in dataframe format). #' @param x: dimension whose coordinates are used to build the partitions. -#' @param which: speficy if rows ("rows") or columns ("cols") must be grouped. +#' @param which: speficy if rows ("rows"; default) or columns ("cols") must be grouped. #' @param cex.labls: set the size of the labels of the dotchart (0.75 by default). #' @keywords columns contribution #' @export diff --git a/R/malinvaud.R b/R/malinvaud.R index 3eee20e..dd39a8b 100644 --- a/R/malinvaud.R +++ b/R/malinvaud.R @@ -1,7 +1,9 @@ #' Malinvaud's test for significance of the CA dimensions #' -#' This function allows you to perform the Malinvaud's test, which assesses the significance of the CA dimensions. -#' The function returns both a table in the R console and a plot. The former lists some values, among which the significance of each CA dimension. Dimensions whose p-value is below the 0.05 threshold (displayed in RED in the returned plot) are significant. +#' This function allows you to perform the Malinvaud's test, which assesses the significance of the CA dimensions. +#' +#' The function returns both a table in the R console and a plot. The former lists relevant information, among which the significance of each CA dimension. +#' The dotchart graphically represents the p-value of each dimension; dimensions are grouped by level of significance; a red reference lines indicates the 0.05 threshold. #' @param data: name of the datset (must be in dataframe format). #' @keywords Malinvaud test #' @export @@ -10,35 +12,44 @@ #' malinvaud(greenacre_data) #perform the Malinvaud test using the 'greenacre_data' dataset #' res <- malinvaud(greenacre_data) #perform the Malinvaud test using the 'greenacre_data' dataset and store the output table in a object named 'res' #' -malinvaud <- function(data) { +malinvaud <- function (data) { grandtotal <- sum(data) nrows <- nrow(data) ncols <- ncol(data) - numb.dim.cols<-ncol(data)-1 - numb.dim.rows<-nrow(data)-1 - a <- min(numb.dim.cols, numb.dim.rows) #dimensionality of the table - labs<-c(1:a) #set the number that will be used as x-axis' labels on the scatterplots - res.ca<-CA(data, ncp=a, graph=FALSE) - + numb.dim.cols <- ncol(data) - 1 + numb.dim.rows <- nrow(data) - 1 + a <- min(numb.dim.cols, numb.dim.rows) + labs <- c(1:a) + res.ca <- CA(data, ncp = a, graph = FALSE) malinv.test.rows <- a - malinv.test.cols <- 6 - malinvt.output <-as.data.frame(matrix(ncol= malinv.test.cols, nrow=malinv.test.rows)) - colnames(malinvt.output) <- c("K", "Dimension", "Eigenvalue", "Chi-square", "df", "p-value") - - malinvt.output[,1] <- c(0:(a-1)) - malinvt.output[,2] <- c(1:a) - - for(i in 1:malinv.test.rows){ - k <- -1+i - malinvt.output[i,3] <- res.ca$eig[i,1] - malinvt.output[i,5] <- (nrows-k-1)*(ncols-k-1) + malinv.test.cols <- 7 + malinvt.output <- as.data.frame(matrix(ncol = malinv.test.cols, nrow = malinv.test.rows)) + colnames(malinvt.output) <- c("K", "Dimension", "Eigenvalue", "Chi-square", "df", "p-value", "p-class") + malinvt.output[,1] <- c(0:(a - 1)) + malinvt.output[,2] <- paste0("dim. ",c(1:a)) + for (i in 1:malinv.test.rows) { + k <- -1 + i + malinvt.output[i,3] <- res.ca$eig[i, 1] + malinvt.output[i,5] <- (nrows - k - 1) * (ncols - k - 1) } - - malinvt.output[,4] <- rev(cumsum(rev(malinvt.output[,3])))*grandtotal - pvalue <- round(pchisq(malinvt.output[,4], malinvt.output[,5], lower.tail=FALSE), digits=6) - malinvt.output[,6] <- ifelse(pvalue < 0.001, "< 0.001", ifelse(pvalue < 0.01, "< 0.01", ifelse(pvalue < 0.05, "< 0.05", round(pvalue, 3)))) - - dotchart2(pvalue, labels=malinvt.output[,2], sort=FALSE,lty=2, xlim=c(0,1), xlab=paste("p-value after Malinvaud's test"), ylab="Dimensions") - abline(v=0.05, lty=2, col="RED") + malinvt.output[,4] <- rev(cumsum(rev(malinvt.output[, 3]))) * grandtotal + pvalue <- pchisq(malinvt.output[,4], malinvt.output[,5], lower.tail = FALSE) + malinvt.output[,6] <- pvalue + malinvt.output[,7] <- ifelse(pvalue < 0.001, "p < 0.001", + ifelse(pvalue < 0.01, "p < 0.01", + ifelse(pvalue < 0.05, "p < 0.05", + "p > 0.05"))) + dotchart2(pvalue, + labels = malinvt.output[,2], + groups=malinvt.output[,7], + sort = FALSE, + lty = 2, + xlim = c(0, 1), + main="Malinvaud's test for the significance of CA dimensions", + xlab = paste("p-value"), + ylab = "Dimensions", + cex.main=0.9, + cex.labels=0.75) + abline(v = 0.05, lty = 2, col = "RED") return(malinvt.output) } \ No newline at end of file diff --git a/R/rescale.R b/R/rescale.R new file mode 100644 index 0000000..eaaa2b4 --- /dev/null +++ b/R/rescale.R @@ -0,0 +1,36 @@ +#' Rescaling row/column categories coordinates between a minimum and maximum value +#' +#' This function allows to rescale the coordinates of a selected dimension to be constrained between a minimum and a maximum user-defined value. +#' +#' The rationale of the function is that users may wish to use the coordinates on a given dimension to devise a scale, along the lines of what is accomplished in:\cr +#' Greenacre M 2002, "The Use of Correspondence Analysis in the Exploration of Health Survey Data", Documentos de Trabajo 5, Fundacion BBVA, pp. 7-39\cr +#' The function returns a chart representing the row/column categories against the rescaled coordinates from the selected dimension. A dataframe is also returned containing the original values (i.e., the coordinates) and the corresponding rescaled values. +#' @param data: name of the dataset (must be in dataframe format). +#' @param x: dimension for which the row categories contribution is returned (1st dimension by default). +#' @param which: speficy if rows ("rows", default) or columns ("cols") must be grouped. +#' @param min.v: minimum value of the new scale (0 by default). +#' @param max.v: maximum value of the new scale (100 by default). +#' @keywords +#' @export +#' @examples +#' data(greenacre_data) +#' res <- rescale(greenacre_data, which="rows", min.v=0, max.v=10) +#' +rescale <- function (data, x=1, which="rows", min.v=0, max.v=100) { + res <- CA(data, graph=FALSE) + ifelse(which=="rows", + coord.x <- res$row$coord[,x], + coord.x <- res$col$coord[,x]) + resc.v <- ((coord.x-min(coord.x))*(max.v-min.v)/(max(coord.x)-min(coord.x)))+min.v + df <- data.frame(category=rownames(as.data.frame(coord.x)), orignal.v=coord.x, rescaled.v=resc.v) + plot(sort(df$rescaled.v), + xaxt="n", + xlab="categories", + ylab=paste0(x, " Dim. rescaled coordinates"), + pch=20, + type="b", + main=paste0("Plot of ", ifelse(which=="rows", "row", "column"), " categories against ", x, " Dim. coordinates rescaled between ", min.v, " and ", max.v), + cex.main=0.95) + axis(1, at=1:nrow(df), labels=df$category) + return(subset(df, , -c(category))) +} diff --git a/R/rows_cntr.R b/R/rows_cntr.R index 568a8d8..95b8c50 100644 --- a/R/rows_cntr.R +++ b/R/rows_cntr.R @@ -5,6 +5,7 @@ #' The function displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension. #' The parameter sort=TRUE sorts the categories in descending order of contribution to the inertia of the selected dimension. #' At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. +#' The categories are grouped into two groups: 'major' and 'minor' contributors to the inertia of the selected dimension. #' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) reports the correlation (sqrt(COS2)) of the column categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each column category is correlated. #' @param data: name of the dataset (must be in dataframe format). #' @param x: dimension for which the row categories contribution is returned (1st dimension by default). diff --git a/R/rows_corr.R b/R/rows_corr.R index ea6c1f7..a6b549b 100644 --- a/R/rows_corr.R +++ b/R/rows_corr.R @@ -4,6 +4,7 @@ #' #' The function displays the correlation of the row categories with the selected dimension; the parameter sort=TRUE arrange the categories in decreasing order of correlation. #' At the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative. +#' The categories are grouped into two groups: categories correlated with the positive ('pole +') or negative ('pole -') pole of the selected dimension. #' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) indicates the column categories' contribution (in permills) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively. #' Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension. #' @param data: name of the dataset (must be in dataframe format). diff --git a/README.md b/README.md index a35a58e..4ea02c7 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ # CAinterprTools -vers 0.19 +vers 0.20 A number of interesting packages are available to perform Correspondence Analysis in R. At the best of my knowledge, however, they lack some tools to help users to eyeball some critical CA aspects (e.g., contribution of rows/cols categories to the principal axes, quality of the display,correlation of rows/cols categories with dimensions, etc). Besides providing those facilities, this package allows calculating the significance of the CA dimensions by means of the 'Average Rule', the Malinvaud test, and by permutation test. Further, it allows to also calculate the permuted significance of the CA total inertia. @@ -30,6 +30,7 @@ The package comes with some datasets drawn from literature: * `cols.qlt()`: chart of columns quality of the display. * `groupBycoord()`: define groups of categories on the basis of a selected partition into k groups employing the Jenks' natural break method on the selected dimension's coordinates. * `malinvaud()`: Malinvaud's test for significance of the CA dimensions. +* `rescale()`: rescale row/column categories coordinates between a minimum and maximum value. * `rows.cntr()`: rows contribution chart. * `rows.cntr.scatter()`: scatterplot for row categories contribution to dimensions. * `rows.qlt()`: chart of rows quality of the display. @@ -69,7 +70,7 @@ groupBycoord(greenacre_data) [![Rplot.jpg](https://s10.postimg.org/wct1ty2ix/Rplot.jpg)](https://postimg.org/image/sgfpxyhj9/)

-`malinvaud()`: performs the Malinvaud test and returns the test's result (among which the significance of the CA dimensions); a plot is also provided, wherein a reference line (in RED) indicates the 0.05 threshold: +`malinvaud()`: performs the Malinvaud test returns both a table in the R console and a plot. The former lists relevant information, among which the significance of each CA dimension. The dotchart graphically represents the p-value of each dimension; dimensions are grouped by level of significance; a red reference lines indicates the 0.05 threshold: ```r malinvaud(greenacre_data) ``` @@ -90,7 +91,7 @@ sig.dim.perm.scree(greenacre_data) [![image.jpg](https://s1.postimg.org/1fq7jpjhin/image.jpg)](https://postimg.org/image/9bayh25isr/)

-`rows.cntr()`: calculates the contribution of the row categories to a selected dimension. It displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension. The parameter `sort=TRUE` sorts the categories in descending order of contribution to the inertia of the selected dimension. At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. At the right-hand side, a legend (which is enabled/disabled using the `leg` parameter) reports the correlation (sqrt(COS2)) of the column categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each column category is correlated: +`rows.cntr()`: calculates the contribution of the row categories to a selected dimension. It displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension. The parameter `sort=TRUE` sorts the categories in descending order of contribution to the inertia of the selected dimension. At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. The categories are grouped into two groups: 'major' and 'minor' contributors to the inertia of the selected dimension. At the right-hand side, a legend (which is enabled/disabled using the `leg` parameter) reports the correlation (sqrt(COS2)) of the column categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each column category is correlated: ```r rows.cntr(greenacre_data,1,cti=TRUE,sort=TRUE) ``` @@ -111,7 +112,7 @@ rows.qlt(greenacre_data,1,2) [![image.jpg](https://s1.postimg.org/6p0jre1nov/image.jpg)](https://postimg.org/image/9qwfsm2zvv/)

-`rows.corr()`: calculates and graphically displays the correlation (sqrt(COS2)) of the row categories with the selected dimension. The parameter sort=TRUE arranges the categories in decreasing order of correlation. In the returned chart, at the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative. At the right-hand side, a legend indicates the column categories' contribution (in permils) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively. Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension: +`rows.corr()`: calculates and graphically displays the correlation (sqrt(COS2)) of the row categories with the selected dimension. The parameter sort=TRUE arranges the categories in decreasing order of correlation. In the returned chart, at the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative. The categories are grouped into two groups: categories correlated with the positive ('pole +') or negative ('pole -') pole of the selected dimension. At the right-hand side, a legend indicates the column categories' contribution (in permils) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively. Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension: ```r rows.corr(greenacre_data,1) ``` @@ -124,6 +125,10 @@ rows.corr.scatter(greenacre_data,1,2) ``` [![image.jpg](https://s1.postimg.org/57vt8txpjj/image.jpg)](https://postimg.org/image/5iin1zcxor/) +

+`rescale()`: allows to rescale the coordinates of a selected dimension to be constrained between a minimum and a maximum user-defined value. +The rationale of the function is that users may wish to use the coordinates on a given dimension to devise a scale, along the lines of what is accomplished in: Greenacre M 2002, *The Use of Correspondence Analysis in the Exploration of Health Survey Data*, Documentos de Trabajo 5, Fundacion BBVA, pp. 7-39. The function returns a chart representing the row/column categories against the rescaled coordinates from the selected dimension. A dataframe is also returned containing the original values (i.e., the coordinates) and the corresponding rescaled values. +

`table.collapse()`: allows to collapse the rows and columns of the input contingency table on the basis of the results of a hierarchical clustering. The function returns a list containing the input table, the rows-collapsed table, the columns-collapsed table, and a table with both rows and columns collapsed. It optionally returns two dendrograms (one for the row profiles, one for the column profiles) representing the clusters. The hierarchical clustering is obtained using the `FactoMineR`s `HCPC()` function. *Rationale*: Clustering rows and/or columns of a table could interest the users who want to know where a *significant association is concentrated* by *collecting together similar rows (or columns) in discrete groups* (Greenacre M, *Correspondence Analysis in Practice*, Boca Raton-London-New York, Chapman&Hall/CRC 2007, pp. 116, 120). Rows and/or columns are progressively aggregated in a way in which every successive merging produces the smallest change in the table’s inertia. The underlying logic lies in the fact that rows (or columns) whose merging produces a small change in table’s inertia have similar profiles. This procedure can be thought of as maximizing the between-group inertia and minimizing the within-group inertia. A method essentially similar is that provided by the `FactoMineR` package (Husson F, Le S, Pages J, *Exploratory Multivariate Analysis by Example Using R*, Boca Raton-London-New York, CRC Press, pp. 177-185). The cluster solution is based on the following rationale: a division into Q (i.e., a given number of) clusters is suggested when the increase in between-group inertia attained when passing from a Q-1 to a Q partition is greater than that from a Q to a Q+1 clusters partition. In other words, during the process of rows (or columns) merging, if the following agggregation raises highly the within-group inertia, it means that at the further step very different profiles are being aggregated. @@ -296,6 +301,10 @@ New in `version 0.19`: improvements and typos fixes to the help documentation; `groupBycoord()` added; the `rows.cntr()` and `cols.cntr()` functions have been modified: in the output chart, categories are now divided in two groups (major and minor contributors to the definition of the selected dimension). In the same function, the parameter `cti` has been removed. In the chart returned by the `rows.corr()` and `cols.corr()` functions, the categories are now grouped in two groups according to whether the correlation is with the positive (pole +) or negative (pole -) side of the selected dimension. In the `rows.cntr()`, `cols.cntr()`, `rows.corr()`, and `cols.corr()` functions the legend to the right-hand side of the chart is now optional. +New in `version 0.20`: + +improvements and typos fixes to the help documentation; improvements to the chart returned by the `malinvaud()` function; `rescale()` function added. + ## Installation To install the package in R, just follow the few steps listed below: @@ -309,7 +318,7 @@ library(devtools) ``` 3) download the 'CAinterprTools' package from GitHub via the 'devtools''s command: ```r -install_github("gianmarcoalberti/CAinterprTools@v0.19") +install_github("gianmarcoalberti/CAinterprTools@v0.20") ``` 4) load the package: ```r diff --git a/man/CAinterprTools-package.Rd b/man/CAinterprTools-package.Rd index 810a881..e584638 100644 --- a/man/CAinterprTools-package.Rd +++ b/man/CAinterprTools-package.Rd @@ -20,7 +20,7 @@ A number of interesting packages are available to perform \tabular{ll}{ Package: \tab CAinterprTools\cr Type: \tab Package\cr -Version: \tab 0.19\cr +Version: \tab 0.20\cr Date: \tab 2018-02\cr License: \tab GPL\cr } diff --git a/man/cols.cntr.Rd b/man/cols.cntr.Rd index 720ad9a..7a39255 100644 --- a/man/cols.cntr.Rd +++ b/man/cols.cntr.Rd @@ -35,7 +35,8 @@ This function allows to calculate the contribution of the column categories to t \details{ The function displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension. The parameter sort=TRUE sorts the categories in descending order of contribution to the inertia of the selected dimension. -At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. +At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. +The categories are grouped into two groups: 'major' and 'minor' contributors to the inertia of the selected dimension. At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) reports the correlation (sqrt(COS2)) of the row categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each row category is correlated. } \examples{ diff --git a/man/cols.corr.Rd b/man/cols.corr.Rd index 2c4de3b..90c5c4b 100644 --- a/man/cols.corr.Rd +++ b/man/cols.corr.Rd @@ -34,6 +34,7 @@ This function allows to calculate the correlation (sqrt(COS2)) of the column cat The function displays the correlation of the column categories with the selected dimension; the parameter sort=TRUE arrange the categories in decreasing order of correlation. At the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative. +The categories are grouped into two groups: categories correlated with the positive ('pole +') or negative ('pole -') pole of the selected dimension. At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) indicates the row categories' contribution (in permills) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively. Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension. } diff --git a/man/groupBycoord.Rd b/man/groupBycoord.Rd index 3e75cf7..52653e1 100644 --- a/man/groupBycoord.Rd +++ b/man/groupBycoord.Rd @@ -11,7 +11,7 @@ groupBycoord(data, x = 1, k = 3, which = "rows", cex.labls = 0.75) \item{x:}{dimension whose coordinates are used to build the partitions.} -\item{which:}{speficy if rows ("rows") or columns ("cols") must be grouped.} +\item{which:}{speficy if rows ("rows"; default) or columns ("cols") must be grouped.} \item{cex.labls:}{set the size of the labels of the dotchart (0.75 by default).} } diff --git a/man/malinvaud.Rd b/man/malinvaud.Rd index db15547..55e2209 100644 --- a/man/malinvaud.Rd +++ b/man/malinvaud.Rd @@ -10,8 +10,10 @@ malinvaud(data) \item{data:}{name of the datset (must be in dataframe format).} } \description{ -This function allows you to perform the Malinvaud's test, which assesses the significance of the CA dimensions. -The function returns both a table in the R console and a plot. The former lists some values, among which the significance of each CA dimension. Dimensions whose p-value is below the 0.05 threshold (displayed in RED in the returned plot) are significant. +This function allows you to perform the Malinvaud's test, which assesses the significance of the CA dimensions. + +The function returns both a table in the R console and a plot. The former lists relevant information, among which the significance of each CA dimension. +The dotchart graphically represents the p-value of each dimension; dimensions are grouped by level of significance; a red reference lines indicates the 0.05 threshold. } \examples{ data(greenacre_data) diff --git a/man/rescale.Rd b/man/rescale.Rd new file mode 100644 index 0000000..8e863c6 --- /dev/null +++ b/man/rescale.Rd @@ -0,0 +1,32 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/rescale.R +\name{rescale} +\alias{rescale} +\title{Rescaling row/column categories coordinates between a minimum and maximum value} +\usage{ +rescale(data, x = 1, which = "rows", min.v = 0, max.v = 100) +} +\arguments{ +\item{data:}{name of the dataset (must be in dataframe format).} + +\item{x:}{dimension for which the row categories contribution is returned (1st dimension by default).} + +\item{which:}{speficy if rows ("rows", default) or columns ("cols") must be grouped.} + +\item{min.v:}{minimum value of the new scale (0 by default).} + +\item{max.v:}{maximum value of the new scale (100 by default).} +} +\description{ +This function allows to rescale the coordinates of a selected dimension to be constrained between a minimum and a maximum user-defined value. +} +\details{ +The rationale of the function is that users may wish to use the coordinates on a given dimension to devise a scale, along the lines of what is accomplished in:\cr +Greenacre M 2002, "The Use of Correspondence Analysis in the Exploration of Health Survey Data", Documentos de Trabajo 5, Fundacion BBVA, pp. 7-39\cr +The function returns a chart representing the row/column categories against the rescaled coordinates from the selected dimension. A dataframe is also returned containing the original values (i.e., the coordinates) and the corresponding rescaled values. +} +\examples{ +data(greenacre_data) +res <- rescale(greenacre_data, which="rows", min.v=0, max.v=10) + +} diff --git a/man/rows.cntr.Rd b/man/rows.cntr.Rd index 533fa37..4294cf0 100644 --- a/man/rows.cntr.Rd +++ b/man/rows.cntr.Rd @@ -36,6 +36,7 @@ This function allows to calculate the contribution of the row categories to the The function displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension. The parameter sort=TRUE sorts the categories in descending order of contribution to the inertia of the selected dimension. At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. +The categories are grouped into two groups: 'major' and 'minor' contributors to the inertia of the selected dimension. At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) reports the correlation (sqrt(COS2)) of the column categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each column category is correlated. } \examples{ diff --git a/man/rows.corr.Rd b/man/rows.corr.Rd index 27af699..2fa85ad 100644 --- a/man/rows.corr.Rd +++ b/man/rows.corr.Rd @@ -34,6 +34,7 @@ This function allows to calculate the correlation (sqrt(COS2)) of the row catego The function displays the correlation of the row categories with the selected dimension; the parameter sort=TRUE arrange the categories in decreasing order of correlation. At the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative. +The categories are grouped into two groups: categories correlated with the positive ('pole +') or negative ('pole -') pole of the selected dimension. At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) indicates the column categories' contribution (in permills) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively. Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension. }