Skip to content

Commit

Permalink
Updated README to include more detailed, universal example
Browse files Browse the repository at this point in the history
  • Loading branch information
bwringe committed May 10, 2016
1 parent e6251a3 commit 64d0a79
Show file tree
Hide file tree
Showing 2 changed files with 119 additions and 58 deletions.
63 changes: 51 additions & 12 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,16 @@
```
devtools::install_github("bwringe/parallelnewhybrid")
```
<span style = "color:red"> <strong>NOTE:</strong></span> : **parallelnewhybrid** relies on functions from the R packages *parallel*, *plyr*, *stringr*, and *tidyr*. The user should ensure these are installed from CRAN prior to installing **parallelnh**.
<span style = "color:red"> <strong>NOTE:</strong></span> : **parallelnewhybrid** relies on functions from the R packages *parallel*, *plyr*, *stringr*, and *tidyr*. The user should ensure these are installed from CRAN prior to installing **parallelnewhybrid**.

***

###Function descriptions
<h4 class="text-primary">parallelnh_xx.R</h4>
Allows *NewHybrids* (Anderson and Thompson 2002) to be run in parallel. A job (*NewHybrids* analysis) is assigned to each of the *c* cores available in the computer. As each task finishes, a new analysis is asigned to the idled core.
All *NewHybrids* format files in the folder the user specifies as *folder.data* will be analyzed.
The user can also specify the length of the MCMC burnin using the *burnin* and *sweeps* parameters.
<span style = "color:red"> <strong>NOTE:</strong></span> : There are **three operating system-specific versions** of the **parallelnh_xx** function because of the different ways in which the operating systems handle forking of processes.
Allows *NewHybrids* (Anderson and Thompson 2002) to be run in parallel. A job (*NewHybrids* analysis) is assigned to each of the *c* cores available in the computer. As each task finishes, a new analysis is assigned to the idled core.
**parallelnewhybrid** will attempt to analyze all *NewHybrids* format files in the folder specified by the user through the *folder.data* argument. Therefore, it is essential this folder contain only the files the user wishes to analyze, and optionally their associated individual file(s).
The user can must also specify the length of the Markov chain Monte Carlo (MCMC) burn-in and subsequent run length using the *burnin* and *sweeps* parameters.
<span style = "color:red"> <strong>NOTE:</strong></span> There are **three operating system-specific versions** of the **parallelnh_xx** function because of the different ways in which the operating systems handle forking of processes.

**parallelnh version**|**Operating system**
------------|----------
Expand All @@ -36,8 +36,8 @@ Example datasets have been provided as R images (.rda files). These can be loade

**Example dataset** | **Contents**
------------|---------------------------------------------------------------
*SimPops\_S1R1_NH* | A *NewHybrids* format file. To analyze this file using the function **parallelnh_xx**, save it with the extension ".txt" to an empty folder on your hard drive, then provide **parallelnh_xx** with the file path to the folder. To run in parallel, after saving the file, copy it and give the copies unique names. **parallelnh_xx** will attempt to analyze all files which do not contain "individual.txt" within the file name, so it is essential that only NewHybrids formatted files, and their associated individual files be present in the folder provided to **parallelnh_xx**.
*SimPops\_S1R1_NH_individuals* | The individual file associated with *SimPops\_S1R1_NH*. A single copy of this file should be saved to the same folder in which *SimPops\_S1R1_NH* is saved. The filename of this file must end in "individuals.txt".
*SimPops\_S1R1_NH* | A *NewHybrids* format file. To analyze this file using the function **parallelnh_xx**, save it with the extension ".txt" to an empty folder on your hard drive, then provide **parallelnh_xx** with the file path to the folder. To run in parallel, after saving the file, copy it and give the copies unique names. **parallelnh_xx** will attempt to analyze all files which do not contain "individual.txt" within the file name, so it is essential that only *NewHybrids* formatted files, and their associated individual files be present in the folder provided to **parallelnh_xx**.
*SimPops\_S1R1_NH_individuals* | The individual file associated with *SimPops\_S1R1_NH*. A single copy of this file should be saved to the same folder in which *SimPops\_S1R1_NH* is saved. The filename must end in "individuals.txt".

***
<h4 class="text-primary">parallelnh_xx</h4>
Expand All @@ -46,17 +46,56 @@ Example datasets have been provided as R images (.rda files). These can be loade
------------|---------------------------------------------------------------
*folder.data*| A file path to the folder in which the *NewHybrids* formatted files to be analyzed, and their associated individual file reside.
*where.NH* | A file path to the *NewHybrids* installation folder. NOTE: The name of this folder must be named "newhybrids". If it is named anything else the function will fail.
*burnin* | An integer specifying how many burnin steps *NewHybrids* is to run
*burnin* | An integer specifying how many burn-in steps *NewHybrids* is to run
*sweeps* | An integer specifying how many sweep steps *NewHybrids* is to run

```r
### Run analyses on all NewHybrids format files in the folder "your_data_to_analyze"
parallelnh_OSX(folder.data = "~/ ... /your_data_to_analyze/", where.NH = "~/ ... /newhybrids/", burnin = 5000, sweeps = 10000)
## the usage would be the same for the WIN and LINUX versions.

### ANALYSIS OF EXAMPLE DATA

## To download the example file
## Get the file path to the working directory, will be used to allow a universal example
path.hold <- getwd()

## Get the individual file included along with the parallelnewhybrid package and make it an object
sim_inds <- parallelnewhybrid::SimPops_S1R1_NH_individuals

## Get the genotype data file included along with the parallelnewhybrid package and make it an object
sim_data <- parallelnewhybrid::SimPops_S1R1_NH

## Gave the individual data to the working directory as a file called "SimPops_S1R1_NH_individuals.txt"
write.table(x = sim_inds, file = paste0(path.hold, "/SimPops_S1R1_NH_individuals.txt"), row.names = FALSE, col.names = FALSE, quote = FALSE)

## Save the genotype data to the working directory as a file called "SimPops_S1R1_NH.txt"
write.table(x = sim_data, file = paste0(path.hold, "/SimPops_S1R1_NH.txt"), row.names = FALSE, col.names = FALSE, quote = FALSE)

## Create an empty folder within the working directory. Recall, parallelnewhybrids will analyze all files within the folder it is specified, but if there are files that are not NewHybrids format, or individual files, it will fail.
dir.create(paste0(path.hold, "/parallelnewhybrids example"))

## Copy the individual file to the new folder
file.copy(from = paste0(path.hold, "/SimPops_S1R1_NH_individuals.txt"), to = paste0(path.hold, "/parallelnewhybrids example"))

## Copy the genotype data file to the new folder
file.copy(from = paste0(path.hold, "/SimPops_S1R1_NH.txt"), to = paste0(path.hold, "/parallelnewhybrids example"))

## Create two copies of the genotype data file to act as technical replicates of the NewHybrids simulation based analysis. This will also serve demonstrate the parallel capabilities of parallelnewhybrid.
file.copy(from = paste0(path.hold, "/parallelnewhybrids example/SimPops_S1R1_NH.txt"), to = paste0(path.hold, "/parallelnewhybrids example/SimPops_S1R2_NH.txt"))

file.copy(from = paste0(path.hold, "/parallelnewhybrids example/SimPops_S1R1_NH.txt"), to = paste0(path.hold, "/parallelnewhybrids example/SimPops_S2R3_NH.txt"))

## Clean up the working directory by deleting the two files
file.remove(paste0(path.hold, "/SimPops_S1R1_NH_individuals.txt"))

file.remove(paste0(path.hold, "/SimPops_S1R1_NH.txt"))

## Create an object that is the file path to the folder in which NewHybrids is installed. Note: this folder must be named "newhybrids"
your.NH <- "YOUR PATH/newhybrids/"

## Execute parallelnh. NOTE: "xx" must be replaced by the correct designation for your operating system. burnin and sweep values have been chosen for demonstration only.
parallelnh_xx(folder.data = paste0(path.hold, "/parallelnewhybrids example/"), where.NH = your.NH, burnin = 100, sweeps = 100)


## Clean up everything by deleting the example folder. Note: comment characters have been added to prevent this command being run accidently.
unlink(paste0(path.hold, "/parallelnewhybrids example/"), recursive = TRUE)


```
Expand Down
114 changes: 68 additions & 46 deletions README.html

Large diffs are not rendered by default.

0 comments on commit 64d0a79

Please sign in to comment.