v0.20

gianmarcoalberti · Feb 4, 2018 · 2ef160a · 2ef160a
1 parent cd5ce86
commit 2ef160a
Show file tree

Hide file tree

Showing 18 changed files with 139 additions and 40 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: CAinterprTools
 Title: Package for graphical aid in Correspondence Analysis interpretation and
     significance testing
-Version: 0.19
+Version: 0.20
 Authors@R: "Gianmarco ALberti <gianmarcoalberti@tin.it> <gianmarcoalberti@gmail.com>[aut, cre]"
 Description: A number of interesting packages are available to perform
     Correspondence Analysis in R. At the best of my knowledge, they lack

diff --git a/NAMESPACE b/NAMESPACE
@@ -14,6 +14,7 @@ export(cols.corr.scatter)
 export(cols.qlt)
 export(groupBycoord)
 export(malinvaud)
+export(rescale)
 export(rows.cntr)
 export(rows.cntr.scatter)
 export(rows.corr)

diff --git a/R/cols_cntr.R b/R/cols_cntr.R
@@ -4,7 +4,8 @@
 #' 
 #' The function displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension. 
 #' The parameter sort=TRUE sorts the categories in descending order of contribution to the inertia of the selected dimension. 
-#' At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. 
+#' At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively.
+#' The categories are grouped into two groups: 'major' and 'minor' contributors to the inertia of the selected dimension. 
 #' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) reports the correlation (sqrt(COS2)) of the row categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each row category is correlated.
 #' @param data: name of the dataset (must be in dataframe format).
 #' @param x: dimension for which the column categories contribution is returned (1st dimension by default).

diff --git a/R/cols_corr.R b/R/cols_corr.R
@@ -4,6 +4,7 @@
 #'  
 #' The function displays the correlation of the column categories with the selected dimension; the parameter sort=TRUE arrange the categories in decreasing order of correlation. 
 #' At the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative. 
+#' The categories are grouped into two groups: categories correlated with the positive ('pole +') or negative ('pole -') pole of the selected dimension.
 #' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) indicates the row categories' contribution (in permills) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively. 
 #' Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension.
 #' @param data: name of the dataset (must be in dataframe format).

diff --git a/R/groupBycoord.R b/R/groupBycoord.R
@@ -7,7 +7,7 @@
 #' The function also returns a dataframe storing the categories' coordinates on the selected dimension and the group each category belongs to.
 #' @param data: name of the dataset (must be in dataframe format).
 #' @param x: dimension whose coordinates are used to build the partitions.
-#' @param which: speficy if rows ("rows") or columns ("cols") must be grouped.
+#' @param which: speficy if rows ("rows"; default) or columns ("cols") must be grouped.
 #' @param cex.labls: set the size of the labels of the dotchart (0.75 by default).
 #' @keywords columns contribution
 #' @export

diff --git a/R/malinvaud.R b/R/malinvaud.R
@@ -1,7 +1,9 @@
 #' Malinvaud's test for significance of the CA dimensions
 #'
-#' This function allows you to perform the Malinvaud's test, which assesses the significance of the CA dimensions. 
-#' The function returns both a table in the R console and a plot. The former lists some values, among which the significance of each CA dimension. Dimensions whose p-value is below the 0.05 threshold (displayed in RED in the returned plot) are significant. 
+#' This function allows you to perform the Malinvaud's test, which assesses the significance of the CA dimensions.
+#'  
+#' The function returns both a table in the R console and a plot. The former lists relevant information, among which the significance of each CA dimension. 
+#' The dotchart graphically represents the p-value of each dimension; dimensions are grouped by level of significance; a red reference lines indicates the 0.05 threshold.
 #' @param data: name of the datset (must be in dataframe format).
 #' @keywords Malinvaud test
 #' @export
@@ -10,35 +12,44 @@
 #' malinvaud(greenacre_data) #perform the Malinvaud test using the 'greenacre_data' dataset
 #' res <- malinvaud(greenacre_data) #perform the Malinvaud test using the 'greenacre_data' dataset and store the output table in a object named 'res'
 #' 
-malinvaud <- function(data) {
+malinvaud <- function (data) {
   grandtotal <- sum(data)
   nrows <- nrow(data)
   ncols <- ncol(data)
-  numb.dim.cols<-ncol(data)-1
-  numb.dim.rows<-nrow(data)-1
-  a <- min(numb.dim.cols, numb.dim.rows) #dimensionality of the table
-  labs<-c(1:a) #set the number that will be used as x-axis' labels on the scatterplots
-  res.ca<-CA(data, ncp=a, graph=FALSE)
-
+  numb.dim.cols <- ncol(data) - 1
+  numb.dim.rows <- nrow(data) - 1
+  a <- min(numb.dim.cols, numb.dim.rows)
+  labs <- c(1:a)
+  res.ca <- CA(data, ncp = a, graph = FALSE)
   malinv.test.rows <- a
-  malinv.test.cols <- 6
-  malinvt.output <-as.data.frame(matrix(ncol= malinv.test.cols, nrow=malinv.test.rows))
-  colnames(malinvt.output) <- c("K", "Dimension", "Eigenvalue", "Chi-square", "df", "p-value")
-
-  malinvt.output[,1] <- c(0:(a-1))
-  malinvt.output[,2] <- c(1:a)
-
-  for(i in 1:malinv.test.rows){
-    k <- -1+i
-    malinvt.output[i,3] <- res.ca$eig[i,1]
-    malinvt.output[i,5] <- (nrows-k-1)*(ncols-k-1)
+  malinv.test.cols <- 7
+  malinvt.output <- as.data.frame(matrix(ncol = malinv.test.cols, nrow = malinv.test.rows))
+  colnames(malinvt.output) <- c("K", "Dimension", "Eigenvalue", "Chi-square", "df", "p-value", "p-class")
+  malinvt.output[,1] <- c(0:(a - 1))
+  malinvt.output[,2] <- paste0("dim. ",c(1:a))
+  for (i in 1:malinv.test.rows) {
+    k <- -1 + i
+    malinvt.output[i,3] <- res.ca$eig[i, 1]
+    malinvt.output[i,5] <- (nrows - k - 1) * (ncols - k - 1)
   }
-
-  malinvt.output[,4] <- rev(cumsum(rev(malinvt.output[,3])))*grandtotal
-  pvalue <- round(pchisq(malinvt.output[,4], malinvt.output[,5], lower.tail=FALSE), digits=6)
-  malinvt.output[,6] <- ifelse(pvalue < 0.001, "< 0.001", ifelse(pvalue < 0.01, "< 0.01", ifelse(pvalue < 0.05, "< 0.05", round(pvalue, 3))))
-
-  dotchart2(pvalue, labels=malinvt.output[,2], sort=FALSE,lty=2, xlim=c(0,1), xlab=paste("p-value after Malinvaud's test"), ylab="Dimensions")
-  abline(v=0.05, lty=2, col="RED")
+  malinvt.output[,4] <- rev(cumsum(rev(malinvt.output[, 3]))) * grandtotal
+  pvalue <- pchisq(malinvt.output[,4], malinvt.output[,5], lower.tail = FALSE)
+  malinvt.output[,6] <- pvalue
+  malinvt.output[,7] <- ifelse(pvalue < 0.001, "p < 0.001", 
+                               ifelse(pvalue < 0.01, "p < 0.01", 
+                                      ifelse(pvalue < 0.05, "p < 0.05", 
+                                             "p > 0.05")))
+  dotchart2(pvalue, 
+            labels = malinvt.output[,2], 
+            groups=malinvt.output[,7],
+            sort = FALSE, 
+            lty = 2, 
+            xlim = c(0, 1), 
+            main="Malinvaud's test for the significance of CA dimensions",
+            xlab = paste("p-value"), 
+            ylab = "Dimensions",
+            cex.main=0.9,
+            cex.labels=0.75)
+  abline(v = 0.05, lty = 2, col = "RED")
   return(malinvt.output)
 }
diff --git a/R/rescale.R b/R/rescale.R
@@ -0,0 +1,36 @@
+#' Rescaling row/column categories coordinates between a minimum and maximum value
+#'
+#' This function allows to rescale the coordinates of a selected dimension to be constrained between a minimum and a maximum user-defined value.
+#' 
+#' The rationale of the function is that users may wish to use the coordinates on a given dimension to devise a scale, along the lines of what is accomplished in:\cr
+#' Greenacre M 2002, "The Use of Correspondence Analysis in the Exploration of Health Survey Data", Documentos de Trabajo 5, Fundacion BBVA, pp. 7-39\cr
+#' The function returns a chart representing the row/column categories against the rescaled coordinates from the selected dimension. A dataframe is also returned containing the original values (i.e., the coordinates) and the corresponding rescaled values.
+#' @param data: name of the dataset (must be in dataframe format).
+#' @param x: dimension for which the row categories contribution is returned (1st dimension by default).
+#' @param which: speficy if rows ("rows", default) or columns ("cols") must be grouped.
+#' @param min.v: minimum value of the new scale (0 by default).
+#' @param max.v: maximum value of the new scale (100 by default).
+#' @keywords
+#' @export
+#' @examples
+#' data(greenacre_data)
+#' res <- rescale(greenacre_data, which="rows", min.v=0, max.v=10)
+#' 
+rescale <- function (data, x=1, which="rows", min.v=0, max.v=100) {
+  res <- CA(data, graph=FALSE)
+  ifelse(which=="rows",
+         coord.x <- res$row$coord[,x],
+         coord.x <- res$col$coord[,x])
+  resc.v <- ((coord.x-min(coord.x))*(max.v-min.v)/(max(coord.x)-min(coord.x)))+min.v
+  df <- data.frame(category=rownames(as.data.frame(coord.x)), orignal.v=coord.x, rescaled.v=resc.v)
+  plot(sort(df$rescaled.v), 
+       xaxt="n", 
+       xlab="categories", 
+       ylab=paste0(x, " Dim. rescaled coordinates"), 
+       pch=20, 
+       type="b",
+       main=paste0("Plot of ", ifelse(which=="rows", "row", "column"), " categories against ", x, " Dim. coordinates rescaled between ", min.v, " and ", max.v),
+       cex.main=0.95)
+  axis(1, at=1:nrow(df), labels=df$category)
+  return(subset(df, , -c(category)))
+}
diff --git a/R/rows_cntr.R b/R/rows_cntr.R
@@ -5,6 +5,7 @@
 #' The function displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension.
 #' The parameter sort=TRUE sorts the categories in descending order of contribution to the inertia of the selected dimension. 
 #' At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively. 
+#' The categories are grouped into two groups: 'major' and 'minor' contributors to the inertia of the selected dimension.
 #' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) reports the correlation (sqrt(COS2)) of the column categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each column category is correlated.
 #' @param data: name of the dataset (must be in dataframe format).
 #' @param x: dimension for which the row categories contribution is returned (1st dimension by default).

diff --git a/R/rows_corr.R b/R/rows_corr.R
@@ -4,6 +4,7 @@
 #'  
 #' The function displays the correlation of the row categories with the selected dimension; the parameter sort=TRUE arrange the categories in decreasing order of correlation. 
 #' At the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative. 
+#' The categories are grouped into two groups: categories correlated with the positive ('pole +') or negative ('pole -') pole of the selected dimension.
 #' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) indicates the column categories' contribution (in permills) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively. 
 #' Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension.
 #' @param data: name of the dataset (must be in dataframe format).