-
-
Notifications
You must be signed in to change notification settings - Fork 26
Gallery
Brian Quistorff edited this page Sep 3, 2021
·
9 revisions
Here are some examples of ways of parallelizing code. Also see the examples in the help file (html version).
A lot of users sometimes need a more tailored solution using parallel. A good way of achieving such is by using pll_instance
and PLL_CLUSTERS
macros that are generated within each instance. Here is a trivial but perhaps useful example doing such
clear all
set more off
set trace off
parallel setclusters 4
cap drop
// Generating a variable called code that goes from 1/4
sysuse auto
set seed 112321
gen code = floor(runiform()*4) + 1
tab code
/*
code | Freq. Percent Cum.
------------+-----------------------------------
1 | 20 27.03 27.03
2 | 11 14.86 41.89
3 | 21 28.38 70.27
4 | 22 29.73 100.00
------------+-----------------------------------
Total | 74 100.00
*/
// Storing
save mytempdata, replace
clear
// Program that stores a dataset for
program myprogram
use if code == $pll_instance using mytempdata.dta, clear
collapse (mean) price rep78 (max) code
save dataset_$pll_instance.dta, replace
end
// Processing the data and taking a look at the datasets
parallel, prog(myprogram) nodata: myprogram price
ls dataset_*.dta
/*
-rw-rw-r-- 1 george george 907 Sep 21 09:28 dataset_1.dta
-rw-rw-r-- 1 george george 907 Sep 21 09:28 dataset_2.dta
-rw-rw-r-- 1 george george 907 Sep 21 09:28 dataset_3.dta
-rw-rw-r-- 1 george george 907 Sep 21 09:28 dataset_4.dta
*/
// Now appending (using parallel append)
parallel append, do(di) e("dataset_%g.dta, 1/4")
list
/*
+------------------------------------------+
| price rep78 code dta_source |
|------------------------------------------|
1. | 6,292.5 3.3 1 dataset_1.dta |
2. | 4,489 3.5 2 dataset_2.dta |
3. | 6,532.1 3.35 3 dataset_3.dta |
4. | 6,537.5 3.52632 4 dataset_4.dta |
+------------------------------------------+
*/
// Removing files using shell
!rm dataset_*.dta mytempdata.dta
Another example can be found in the help file of parallel (Example 6).
If your data named in an easy manner then parallel append
can help. A more general solution is sketched out here. First, we review a simple application of parallel append
.
//files 2008_01/income.dta, ..., 2012_12/income.dta
program def myprogram
gen female = (gender == "female")
collapse (mean) income, by(female) fast
end
parallel append, do(myprogram) prog(myprogram) e("%g_%02.0f/income.dta, 2008/2012, 1/12")
Here is a more general solution. It requires the user to be able to load the names of the files into the current data
//Load files names into the variable filenames (this will depend on the use-case).
program def myprogram2
local N = _n
tempfile accumulated
forval i=1/`N'{
preserve
local fn = filename[`i']
use "`fn'", clear
//do the real work
gen female = (gender == "female")
collapse (mean) income, by(female) fast
cap append `accumulated'
save `accumulated', replace
restore
}
use `accumulated', clear
end
parallel prog(myprogram2) : myprogram2