Skip to content

Commit

Permalink
v0.20
Browse files Browse the repository at this point in the history
  • Loading branch information
gianmarcoalberti committed Feb 4, 2018
1 parent cd5ce86 commit 2ef160a
Show file tree
Hide file tree
Showing 18 changed files with 139 additions and 40 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: CAinterprTools
Title: Package for graphical aid in Correspondence Analysis interpretation and
significance testing
Version: 0.19
Version: 0.20
Authors@R: "Gianmarco ALberti <gianmarcoalberti@tin.it> <gianmarcoalberti@gmail.com>[aut, cre]"
Description: A number of interesting packages are available to perform
Correspondence Analysis in R. At the best of my knowledge, they lack
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ export(cols.corr.scatter)
export(cols.qlt)
export(groupBycoord)
export(malinvaud)
export(rescale)
export(rows.cntr)
export(rows.cntr.scatter)
export(rows.corr)
Expand Down
3 changes: 2 additions & 1 deletion R/cols_cntr.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
#'
#' The function displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension.
#' The parameter sort=TRUE sorts the categories in descending order of contribution to the inertia of the selected dimension.
#' At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively.
#' At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively.
#' The categories are grouped into two groups: 'major' and 'minor' contributors to the inertia of the selected dimension.
#' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) reports the correlation (sqrt(COS2)) of the row categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each row category is correlated.
#' @param data: name of the dataset (must be in dataframe format).
#' @param x: dimension for which the column categories contribution is returned (1st dimension by default).
Expand Down
1 change: 1 addition & 0 deletions R/cols_corr.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#'
#' The function displays the correlation of the column categories with the selected dimension; the parameter sort=TRUE arrange the categories in decreasing order of correlation.
#' At the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative.
#' The categories are grouped into two groups: categories correlated with the positive ('pole +') or negative ('pole -') pole of the selected dimension.
#' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) indicates the row categories' contribution (in permills) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively.
#' Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension.
#' @param data: name of the dataset (must be in dataframe format).
Expand Down
2 changes: 1 addition & 1 deletion R/groupBycoord.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
#' The function also returns a dataframe storing the categories' coordinates on the selected dimension and the group each category belongs to.
#' @param data: name of the dataset (must be in dataframe format).
#' @param x: dimension whose coordinates are used to build the partitions.
#' @param which: speficy if rows ("rows") or columns ("cols") must be grouped.
#' @param which: speficy if rows ("rows"; default) or columns ("cols") must be grouped.
#' @param cex.labls: set the size of the labels of the dotchart (0.75 by default).
#' @keywords columns contribution
#' @export
Expand Down
65 changes: 38 additions & 27 deletions R/malinvaud.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
#' Malinvaud's test for significance of the CA dimensions
#'
#' This function allows you to perform the Malinvaud's test, which assesses the significance of the CA dimensions.
#' The function returns both a table in the R console and a plot. The former lists some values, among which the significance of each CA dimension. Dimensions whose p-value is below the 0.05 threshold (displayed in RED in the returned plot) are significant.
#' This function allows you to perform the Malinvaud's test, which assesses the significance of the CA dimensions.
#'
#' The function returns both a table in the R console and a plot. The former lists relevant information, among which the significance of each CA dimension.
#' The dotchart graphically represents the p-value of each dimension; dimensions are grouped by level of significance; a red reference lines indicates the 0.05 threshold.
#' @param data: name of the datset (must be in dataframe format).
#' @keywords Malinvaud test
#' @export
Expand All @@ -10,35 +12,44 @@
#' malinvaud(greenacre_data) #perform the Malinvaud test using the 'greenacre_data' dataset
#' res <- malinvaud(greenacre_data) #perform the Malinvaud test using the 'greenacre_data' dataset and store the output table in a object named 'res'
#'
malinvaud <- function(data) {
malinvaud <- function (data) {
grandtotal <- sum(data)
nrows <- nrow(data)
ncols <- ncol(data)
numb.dim.cols<-ncol(data)-1
numb.dim.rows<-nrow(data)-1
a <- min(numb.dim.cols, numb.dim.rows) #dimensionality of the table
labs<-c(1:a) #set the number that will be used as x-axis' labels on the scatterplots
res.ca<-CA(data, ncp=a, graph=FALSE)

numb.dim.cols <- ncol(data) - 1
numb.dim.rows <- nrow(data) - 1
a <- min(numb.dim.cols, numb.dim.rows)
labs <- c(1:a)
res.ca <- CA(data, ncp = a, graph = FALSE)
malinv.test.rows <- a
malinv.test.cols <- 6
malinvt.output <-as.data.frame(matrix(ncol= malinv.test.cols, nrow=malinv.test.rows))
colnames(malinvt.output) <- c("K", "Dimension", "Eigenvalue", "Chi-square", "df", "p-value")

malinvt.output[,1] <- c(0:(a-1))
malinvt.output[,2] <- c(1:a)

for(i in 1:malinv.test.rows){
k <- -1+i
malinvt.output[i,3] <- res.ca$eig[i,1]
malinvt.output[i,5] <- (nrows-k-1)*(ncols-k-1)
malinv.test.cols <- 7
malinvt.output <- as.data.frame(matrix(ncol = malinv.test.cols, nrow = malinv.test.rows))
colnames(malinvt.output) <- c("K", "Dimension", "Eigenvalue", "Chi-square", "df", "p-value", "p-class")
malinvt.output[,1] <- c(0:(a - 1))
malinvt.output[,2] <- paste0("dim. ",c(1:a))
for (i in 1:malinv.test.rows) {
k <- -1 + i
malinvt.output[i,3] <- res.ca$eig[i, 1]
malinvt.output[i,5] <- (nrows - k - 1) * (ncols - k - 1)
}

malinvt.output[,4] <- rev(cumsum(rev(malinvt.output[,3])))*grandtotal
pvalue <- round(pchisq(malinvt.output[,4], malinvt.output[,5], lower.tail=FALSE), digits=6)
malinvt.output[,6] <- ifelse(pvalue < 0.001, "< 0.001", ifelse(pvalue < 0.01, "< 0.01", ifelse(pvalue < 0.05, "< 0.05", round(pvalue, 3))))

dotchart2(pvalue, labels=malinvt.output[,2], sort=FALSE,lty=2, xlim=c(0,1), xlab=paste("p-value after Malinvaud's test"), ylab="Dimensions")
abline(v=0.05, lty=2, col="RED")
malinvt.output[,4] <- rev(cumsum(rev(malinvt.output[, 3]))) * grandtotal
pvalue <- pchisq(malinvt.output[,4], malinvt.output[,5], lower.tail = FALSE)
malinvt.output[,6] <- pvalue
malinvt.output[,7] <- ifelse(pvalue < 0.001, "p < 0.001",
ifelse(pvalue < 0.01, "p < 0.01",
ifelse(pvalue < 0.05, "p < 0.05",
"p > 0.05")))
dotchart2(pvalue,
labels = malinvt.output[,2],
groups=malinvt.output[,7],
sort = FALSE,
lty = 2,
xlim = c(0, 1),
main="Malinvaud's test for the significance of CA dimensions",
xlab = paste("p-value"),
ylab = "Dimensions",
cex.main=0.9,
cex.labels=0.75)
abline(v = 0.05, lty = 2, col = "RED")
return(malinvt.output)
}
36 changes: 36 additions & 0 deletions R/rescale.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#' Rescaling row/column categories coordinates between a minimum and maximum value
#'
#' This function allows to rescale the coordinates of a selected dimension to be constrained between a minimum and a maximum user-defined value.
#'
#' The rationale of the function is that users may wish to use the coordinates on a given dimension to devise a scale, along the lines of what is accomplished in:\cr
#' Greenacre M 2002, "The Use of Correspondence Analysis in the Exploration of Health Survey Data", Documentos de Trabajo 5, Fundacion BBVA, pp. 7-39\cr
#' The function returns a chart representing the row/column categories against the rescaled coordinates from the selected dimension. A dataframe is also returned containing the original values (i.e., the coordinates) and the corresponding rescaled values.
#' @param data: name of the dataset (must be in dataframe format).
#' @param x: dimension for which the row categories contribution is returned (1st dimension by default).
#' @param which: speficy if rows ("rows", default) or columns ("cols") must be grouped.
#' @param min.v: minimum value of the new scale (0 by default).
#' @param max.v: maximum value of the new scale (100 by default).
#' @keywords
#' @export
#' @examples
#' data(greenacre_data)
#' res <- rescale(greenacre_data, which="rows", min.v=0, max.v=10)
#'
rescale <- function (data, x=1, which="rows", min.v=0, max.v=100) {
res <- CA(data, graph=FALSE)
ifelse(which=="rows",
coord.x <- res$row$coord[,x],
coord.x <- res$col$coord[,x])
resc.v <- ((coord.x-min(coord.x))*(max.v-min.v)/(max(coord.x)-min(coord.x)))+min.v
df <- data.frame(category=rownames(as.data.frame(coord.x)), orignal.v=coord.x, rescaled.v=resc.v)
plot(sort(df$rescaled.v),
xaxt="n",
xlab="categories",
ylab=paste0(x, " Dim. rescaled coordinates"),
pch=20,
type="b",
main=paste0("Plot of ", ifelse(which=="rows", "row", "column"), " categories against ", x, " Dim. coordinates rescaled between ", min.v, " and ", max.v),
cex.main=0.95)
axis(1, at=1:nrow(df), labels=df$category)
return(subset(df, , -c(category)))
}
1 change: 1 addition & 0 deletions R/rows_cntr.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#' The function displays the contribution of the categories as a dotplot. A reference line indicates the threshold above which a contribution can be considered important for the determination of the selected dimension.
#' The parameter sort=TRUE sorts the categories in descending order of contribution to the inertia of the selected dimension.
#' At the left-hand side of the plot, the categories' labels are given a symbol (+ or -) according to wheather each category is actually contributing to the definition of the positive or negative side of the dimension, respectively.
#' The categories are grouped into two groups: 'major' and 'minor' contributors to the inertia of the selected dimension.
#' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) reports the correlation (sqrt(COS2)) of the column categories with the selected dimension. A symbol (+ or -) indicates with which side of the selected dimension each column category is correlated.
#' @param data: name of the dataset (must be in dataframe format).
#' @param x: dimension for which the row categories contribution is returned (1st dimension by default).
Expand Down
1 change: 1 addition & 0 deletions R/rows_corr.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#'
#' The function displays the correlation of the row categories with the selected dimension; the parameter sort=TRUE arrange the categories in decreasing order of correlation.
#' At the left-hand side, the categories' labels show a symbol (+ or -) according to which side of the selected dimension they are correlated, either positive or negative.
#' The categories are grouped into two groups: categories correlated with the positive ('pole +') or negative ('pole -') pole of the selected dimension.
#' At the right-hand side, a legend (which is enabled/disabled using the 'leg' parameter) indicates the column categories' contribution (in permills) to the selected dimension (value enclosed within round brackets), and a symbol (+ or -) indicating whether they are actually contributing to the definition of the positive or negative side of the dimension, respectively.
#' Further, an asterisk (*) flags the categories which can be considered major contributors to the definition of the dimension.
#' @param data: name of the dataset (must be in dataframe format).
Expand Down
Loading

0 comments on commit 2ef160a

Please sign in to comment.