Skip to content

Commit

Permalink
v0.16
Browse files Browse the repository at this point in the history
  • Loading branch information
gianmarcoalberti committed Nov 5, 2017
1 parent be52051 commit ca4286b
Show file tree
Hide file tree
Showing 11 changed files with 48 additions and 25 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: CAinterprTools
Title: Package for graphical aid in Correspondence Analysis interpretation and
significance testing
Version: 0.15
Version: 0.16
Authors@R: "Gianmarco ALberti <gianmarcoalberti@tin.it> <gianmarcoalberti@gmail.com>[aut, cre]"
Description: A number of interesting packages are available to perform
Correspondence Analysis in R. At the best of my knowledge, they lack
Expand Down
14 changes: 9 additions & 5 deletions R/cols_cntr_scatter.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Scatterplot for column categories contribution to dimensions
#'
#' This function allows to plot a scatterplot of the contribution of column categories to two selected dimensions. Two references lines (in RED) indicate the threshold above which the contribution can be considered important for the determination of the dimensions. A diagonal line (in BLACK) is a visual aid to eyeball whether a category is actually contributing more (in relative terms) to either of the two dimensions.
#' This function allows to plot a scatterplot of the contribution of column categories to two selected dimensions. Two references lines (in RED) indicate the threshold above which the contribution can be considered important for the determination of the dimensions. A diagonal line is a visual aid to eyeball whether a category is actually contributing more (in relative terms) to either of the two dimensions.
#' The column categories' labels are coupled with + or - symbols within round brackets indicating which to side of the two selected dimensions the contribution values that can be read off from the chart are actually referring.
#' The first symbol (i.e., the one to the left), either + or -, refers to the first of the selected dimensions (i.e., the one reported on the x-axis). The second symbol (i.e., the one to the right) refers to the second of the selected dimensions (i.e., the one reported on the y-axis).
#' @param data: name of the dataset (must be in dataframe format).
Expand Down Expand Up @@ -31,9 +31,13 @@ cols.cntr.scatter <- function (data, x = 1, y = 2, filter=FALSE, cex.labls=3) {
limit <- max(xmax, ymax)
ifelse(filter==FALSE, dfr <- dfr, dfr <- subset(dfr, cntr1>(100/ncols)*10 | cntr2>(100/ncols)*10))
p <- ggplot(dfr, aes(x = cntr1, y = cntr2)) + geom_point(alpha = 0.8) +
geom_hline(yintercept = round((100/ncols) * 10, digits = 0),
colour = "red", linetype = "dashed") + geom_vline(xintercept = round((100/ncols) * 10, digits = 0), colour = "red", linetype = "dashed") +
scale_y_continuous(limit = c(0, limit)) + scale_x_continuous(limit = c(0,limit)) + geom_abline(intercept = 0, slope = 1) + theme(panel.background = element_rect(fill="white", colour="black")) +
geom_text_repel(data = dfr, aes(label = labels.final), size = cex.labls) + labs(x = paste("Column categories' contribution (permills) to Dim.",x), y = paste("Column categories' contribution (permills) to Dim.", y))
geom_hline(yintercept = round((100/ncols) * 10, digits = 0), colour = "red", linetype = "dashed") +
geom_vline(xintercept = round((100/ncols) * 10, digits = 0), colour = "red", linetype = "dashed") +
scale_y_continuous(limit = c(0, limit)) + scale_x_continuous(limit = c(0,limit)) +
geom_abline(intercept = 0, slope = 1, colour="#00000088") +
theme(panel.background = element_rect(fill="white", colour="black")) +
geom_text_repel(data = dfr, aes(label = labels.final), size = cex.labls) +
labs(x = paste("Column categories' contribution (permills) to Dim.",x), y = paste("Column categories' contribution (permills) to Dim.", y)) +
coord_fixed(ratio = 1, xlim = NULL, ylim = NULL, expand = TRUE)
return(p)
}
12 changes: 9 additions & 3 deletions R/cols_corr_scatter.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Scatterplot for column categories correlation with dimensions
#'
#' This function allows to plot a scatterplot of the correlation (sqrt(COS2)) of column categories with two selected dimensions. A diagonal line (in BLACK) is a visual aid to eyeball whether a category is actually more correlated (in relative terms) to either of the two dimensions.
#' This function allows to plot a scatterplot of the correlation (sqrt(COS2)) of column categories with two selected dimensions. A diagonal line is a visual aid to eyeball whether a category is actually more correlated (in relative terms) to either of the two dimensions.
#' The column categories' labels are coupled with two + or - symbols within round brackets indicating to which side of the two selected dimensions the correlation values that can be read off from the chart are actually referring.
#' The first symbol (i.e., the one to the left), either + or -, refers to the first of the selected dimensions (i.e., the one reported on the x-axis). The second symbol (i.e., the one to the right) refers to the second of the selected dimensions (i.e., the one reported on the y-axis).
#' @param data: Name of the dataset (must be in dataframe format).
Expand All @@ -25,7 +25,13 @@ cols.corr.scatter <- function (data, x = 1, y = 2, cex.labls=3) {
dfr$labels1 <- ifelse(dfr$coord1 < 0, "-", "+")
dfr$labels2 <- ifelse(dfr$coord2 < 0, "-", "+")
dfr$labels.final <- paste0(dfr$lab, " (",dfr$labels1,",",dfr$labels2, ")")
p <- ggplot(dfr, aes(x = corr1, y = corr2)) + geom_point(alpha = 0.8) + scale_y_continuous(limit = c(0, 1)) + scale_x_continuous(limit = c(0,1)) + geom_abline(intercept = 0, slope = 1) + theme(panel.background = element_rect(fill="white", colour="black")) +
geom_text_repel(data = dfr, aes(label = labels.final), size = cex.labls) + labs(x = paste("Column categories' correlation with Dim.", x), y = paste("Column categories' correlation with Dim.",y))
p <- ggplot(dfr, aes(x = corr1, y = corr2)) +
geom_point(alpha = 0.8) + scale_y_continuous(limit = c(0, 1)) +
scale_x_continuous(limit = c(0,1)) +
geom_abline(intercept = 0, slope = 1, colour="#00000088") +
theme(panel.background = element_rect(fill="white", colour="black")) +
geom_text_repel(data = dfr, aes(label = labels.final), size = cex.labls) +
labs(x = paste("Column categories' correlation with Dim.", x), y = paste("Column categories' correlation with Dim.",y)) +
coord_fixed(ratio = 1, xlim = NULL, ylim = NULL, expand = TRUE)
return(p)
}
15 changes: 10 additions & 5 deletions R/rows_cntr_scatter.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Scatterplot for row categories contribution to dimensions
#'
#' This function allows to plot a scatterplot of the contribution of row categories to two selected dimensions. Two references lines (in RED) indicate the threshold above which the contribution can be considered important for the determination of the dimensions. A diagonal line (in BLACK) is a visual aid to eyeball whether a category is actually contributing more (in relative terms) to either of the two dimensions.
#' This function allows to plot a scatterplot of the contribution of row categories to two selected dimensions. Two references lines (in RED) indicate the threshold above which the contribution can be considered important for the determination of the dimensions. A diagonal line is a visual aid to eyeball whether a category is actually contributing more (in relative terms) to either of the two dimensions.
#' The row categories' labels are coupled with + or - symbols within round brackets indicating to which side of the two selected dimensions the contribution values that can be read off from the chart are actually referring.
#' The first symbol (i.e., the one to the left), either + or -, refers to the first of the selected dimensions (i.e., the one reported on the x-axis). The second symbol (i.e., the one to the right) refers to the second of the selected dimensions (i.e., the one reported on the y-axis).
#' @param data: name of the dataset (must be in dataframe format).
Expand Down Expand Up @@ -31,9 +31,14 @@ rows.cntr.scatter <- function (data, x = 1, y = 2, filter=FALSE, cex.labls=3) {
limit <- max(xmax, ymax)
ifelse(filter==FALSE, dfr <- dfr, dfr <- subset(dfr, cntr1>(100/nrows)*10 | cntr2>(100/nrows)*10))
p <- ggplot(dfr, aes(x = cntr1, y = cntr2)) + geom_point(alpha = 0.8) +
geom_hline(yintercept = round((100/nrows) * 10, digits = 0),
colour = "red", linetype = "dashed") + geom_vline(xintercept = round((100/nrows) *10, digits = 0), colour = "red", linetype = "dashed") +
scale_y_continuous(limit = c(0, limit)) + scale_x_continuous(limit = c(0,limit)) + geom_abline(intercept = 0, slope = 1) + theme(panel.background = element_rect(fill="white", colour="black")) +
geom_text_repel(data = dfr, aes(label = labels.final), size = cex.labls) + labs(x = paste("Row categories' contribution (permills) to Dim.",x), y = paste("Row categories' contribution (permills) to Dim.", y))
geom_hline(yintercept = round((100/nrows) * 10, digits = 0), colour = "red", linetype = "dashed") +
geom_vline(xintercept = round((100/nrows) *10, digits = 0), colour = "red", linetype = "dashed") +
scale_y_continuous(limit = c(0, limit)) +
scale_x_continuous(limit = c(0,limit)) +
geom_abline(intercept = 0, slope = 1, colour="#00000088") +
theme(panel.background = element_rect(fill="white", colour="black")) +
geom_text_repel(data = dfr, aes(label = labels.final), size = cex.labls) +
labs(x = paste("Row categories' contribution (permills) to Dim.",x), y = paste("Row categories' contribution (permills) to Dim.", y)) +
coord_fixed(ratio = 1, xlim = NULL, ylim = NULL, expand = TRUE)
return(p)
}
10 changes: 7 additions & 3 deletions R/rows_corr_scatter.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Scatterplot for row categories correlation with dimensions
#'
#' This function allows to plot a scatterplot of the correlation (sqrt(COS2)) of row categories with two selected dimensions. A diagonal line (in BLACK) is a visual aid to eyeball whether a category is actually more correlated (in relative terms) to either of the two dimensions.
#' This function allows to plot a scatterplot of the correlation (sqrt(COS2)) of row categories with two selected dimensions. A diagonal line is a visual aid to eyeball whether a category is actually more correlated (in relative terms) to either of the two dimensions.
#' The row categories' labels are coupled with two + or - symbols within round brackets indicating to which side of the two selected dimensions the correlation values that can be read off from the chart are actually referring.
#' The first symbol (i.e., the one to the left), either + or -, refers to the first of the selected dimensions (i.e., the one reported on the x-axis). The second symbol (i.e., the one to the right) refers to the second of the selected dimensions (i.e., the one reported on the y-axis).
#' @param data: name of the dataset (must be in dataframe format).
Expand All @@ -26,7 +26,11 @@ rows.corr.scatter <- function (data, x = 1, y = 2, cex.labls=3) {
dfr$labels2 <- ifelse(dfr$coord2 < 0, "-", "+")
dfr$labels.final <- paste0(dfr$lab, " (",dfr$labels1,",",dfr$labels2, ")")
p <- ggplot(dfr, aes(x = corr1, y = corr2)) + geom_point(alpha = 0.8) +
scale_y_continuous(limit = c(0, 1)) + scale_x_continuous(limit = c(0,1)) + geom_abline(intercept = 0, slope = 1) + theme(panel.background = element_rect(fill="white", colour="black")) +
geom_text_repel(data = dfr, aes(label = labels.final), size = cex.labls) + labs(x = paste("Row categories' correlation with Dim.",x), y = paste("Row categories' correlation with Dim.",y))
scale_y_continuous(limit = c(0, 1)) + scale_x_continuous(limit = c(0,1)) +
geom_abline(intercept = 0, slope = 1, colour="#00000088") +
theme(panel.background = element_rect(fill="white", colour="black")) +
geom_text_repel(data = dfr, aes(label = labels.final), size = cex.labls) +
labs(x = paste("Row categories' correlation with Dim.",x), y = paste("Row categories' correlation with Dim.",y)) +
coord_fixed(ratio = 1, xlim = NULL, ylim = NULL, expand = TRUE)
return(p)
}
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# CAinterprTools
vers 0.15
vers 0.16

A number of interesting packages are available to perform Correspondence Analysis in R. At the best of my knowledge, however, they lack some tools to help users to eyeball some critical CA aspects (e.g., contribution of rows/cols categories to the principal axes, quality of the display,correlation of rows/cols categories with dimensions, etc). Besides providing those facilities, this package allows calculating the significance of the CA dimensions by means of the 'Average Rule', the Malinvaud test, and by permutation test. Further, it allows to also calculate the permuted significance of the CA total inertia.

Expand Down Expand Up @@ -251,6 +251,10 @@ New in `version 0.15`:

minor adjustments to the plots' axes labels formatting; the `malinvaud()` function now returns the output table; the `fire_loss` and `breakfast` dataset have been added.

New in `version 0.16`:

the charts returned by the `cols.cntr.scatter()`, `cols.corr.scatter()`, `rows.cntr.scatter()`, and `rows.corr.scatter()` have been modified in order to be set with a ratio of 1 (i.e., 1 unit on the x-axis is equal to 1 unit on the y-axis). In these same plots, the diagonal line has been given a transparent black colour.


## Installation
To install the package in R, just follow the few steps listed below:
Expand All @@ -265,7 +269,7 @@ library(devtools)
```
3) download the 'CAinterprTools' package from GitHub via the 'devtools''s command:
```r
install_github("gianmarcoalberti/CAinterprTools@v0.15")
install_github("gianmarcoalberti/CAinterprTools@v0.16")
```
4) load the package:
```r
Expand Down
4 changes: 2 additions & 2 deletions man/CAinterprTools-package.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ A number of interesting packages are available to perform
\tabular{ll}{
Package: \tab CAinterpretation\cr
Type: \tab Package\cr
Version: \tab 0.7\cr
Date: \tab 2016-12\cr
Version: \tab 0.16\cr
Date: \tab 2017-11\cr
License: \tab GPL\cr
}
The package allows to plot a number of Correspondence Analysis information such as the contribution of rows and columns categories to the principal axes, the quality of points display on selected dimensions, the correlation of row and column categories to selected dimensions, etc. It also allows to assess which dimension(s) is important for the data structure interpretation by means of the so called 'Average Rule'. Moreover, it implements the Malinvaud test, which test the significance of the table dimensions. The package also offers the facility to plot the permuted distribution of the table total inertia as well as of the inertia accounted for by pairs of selected dimensions. The two latter facilities allows to test the significance of the total inertia and of the dimensions the user is interest in.
Expand Down
2 changes: 1 addition & 1 deletion man/cols.cntr.scatter.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/cols.corr.scatter.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/rows.cntr.scatter.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/rows.corr.scatter.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit ca4286b

Please sign in to comment.