-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro.qmd
130 lines (76 loc) · 7.72 KB
/
intro.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# Introduction
While most people see R as a slow programming language, it has powerful features that dramatically accelerate your code [^exampledatatable]. Although R wasn't necessarily built for speed, there are some tools and ways in which we can accelerate R. This chapter introduces what we will understand as High-performance computing in R.
[^exampledatatable]: Nonetheless, this claim can be said about almost any programming language; there are notable examples like the R package [`data.table`](https://cran.r-project.org){target="_blank"} [@datatable] which has been demonstrated to [out-perform most data wrangling tools](https://h2oai.github.io/db-benchmark/){target="_blank"}.
# High-Performance Computing: An overview
From R's perspective, we can think of HPC in terms of two or three things:[^crantask] Big data, parallel computing, and compiled code.
[^crantask]: Make sure to check out [CRAN Task View on HPC](https://CRAN.R-project.org/view=HighPerformanceComputing){target="_blank"}.
## Big Data
When we talk about big data, we refer to cases where your computer struggles to handle a dataset. A typical example of the latter is when the number of observations (rows) in your data frame is [too many to fit a linear regression model](https://stackoverflow.com/q/10326853/2097171){target="_blank"}. Instead of buying a bigger computer, there are many good solutions to solve memory-related problems:
* **Out-of-memory storage**. The idea is simple, instead of using your RAM to load the data, use other methods to load the data. Two notewirthy alternatives are the [bigmemory](https://CRAN.R-project.org/package=bigmemory){target="_blank"} and [implyr](https://cran.r-project.org/package=implyr) R packages. The `bigmemory` package provides methods for using "*file-backed*" matrices. On the other hand, `implyr` implements a wrapper to access Apache Impala, an [SQL query engine for a cluster running Apache Hadoop](https://en.wikipedia.org/w/index.php?title=Apache_Impala&oldid=1116544272){target="_blank"}.
* **Efficient algorithms for big data**: To avoid running out of memory with your regression analysis, the R packages [biglm](https://cran.r-project.org/package=biglm){target="_blank"} and [biglasso](https://cran.r-project.org/package=biglasso){target="_blank"} deliver highly-efficient alternatives to `glm` and `glmnet`, respectively. Now, if your data fits your RAM, but you still struggle with data wrangling, the [data.table](https://CRAN.R-project.org/package=data.table) package is the solution.
* **Store it more efficiently**: Finally, when it comes to linear algebra, the [Matrix](https://CRAN.R-project.org/package=Matrix){target="_blank"} R package shines with its formal classes and methods for managing Sparse Matrices, *i.e.*, big matrices whose entries are primarily zeros; for example, the `dgCMatrix` objects. Furthermore, `Matrix` comes shipped with R, which makes it even more appealing.
## Parallel computing
```{r, echo=FALSE, fig.cap="Flynn's Classical Taxonomy ([Blaise Barney, **Introduction to Parallel Computing**, Lawrence Livermore National Laboratory](https://computing.llnl.gov/tutorials/parallel_comp/))", fig.align='center'}
knitr::include_graphics("fig/flynnsTaxonomy.png")
```
We will focus on the **S**ingle **I**nstruction stream **M**ultiple **D**ata stream.
In general terms, a parallel computing program is one in which we use two or more *computational threads* simultaneously. Although computational thread usually means core, there are multiple levels at which a computer program can be parallelized. To understand this, we first need to see what composes a modern computer:
![Source: Original figure from LUMI consortium documentation [@lumi2023]](fig/socket-core-threads.svg){style="text-align: center"}
Streaming SIMD Extensions \[[SSE](https://en.wikipedia.org/w/index.php?title=Streaming_SIMD_Extensions&oldid=1149173008){target="_blank"}\] and Advanced Vector Extensions \[[AVX](https://en.wikipedia.org/w/index.php?title=Advanced_Vector_Extensions&oldid=1148504462){target="_blank"}\]
### Serial vs. Parallel
::: {.columns layout-align="center"}
::: {.column width="45%"}
![](fig/serialProblem.png){width="100%"}
:::
::: {.column width="45%"}
![](fig/parallelProblem.png){width="100%"}
:::
Source: [Blaise Barney, **Introduction to Parallel Computing**, Lawrence Livermore National Laboratory](https://computing.llnl.gov/tutorials/parallel_comp/){target="_blank"}
:::
![source: [Blaise Barney, **Introduction to Parallel Computing**, Lawrence Livermore National Laboratory](https://computing.llnl.gov/tutorials/parallel_comp/)](fig/nodesNetwork.png){fig-align="center" width="60%"}
## High-performance computing in R
### Some vocabulary for HPC
In raw terms
* Supercomputer: A **single** big machine with thousands of cores/GPGPUs.
* High-Performance Computing (HPC): **Multiple** machines within
a **single** network.
* High Throughput Computing (HTC): **Multiple** machines across **multiple**
networks.
You may not have access to a supercomputer, but certainly, HPC/HTC clusters are
more accessible these days, *e.g.*, AWS provides a service to create HPC clusters
at a low cost (allegedly, since nobody understands how pricing works)
## GPU vs. CPU
```{r gpu-cpu, echo=FALSE, fig.cap="[NVIDIA Blog](http://www.nvidia.com/object/what-is-gpu-computing.html)", fig.align='center'}
knitr::include_graphics("fig/cpuvsgpu.jpg")
nnodes <- 4L
```
* Why use OpenMP if GPU is _suited to compute-intensive operations_? Well, mostly because
OpenMP is **VERY** easy to implement (easier than CUDA, which is the easiest way to use GPU).[^kokkos]
[^kokkos]: [Sadia National Laboratories](https://www.sandia.gov/ccr/software/kokkos/){target="_blank"} started the [Kokkos project](https://kokkos.org/){target="_blank"}, which provides a one-fits-all C++ library for parallel programming. More information on the Kokkos's [wiki site](https://kokkos.github.io/kokkos-core-wiki/){target="_blank"}.
## When is it a good idea?
```{r good-idea, echo=FALSE, fig.cap="Ask yourself these questions before jumping into HPC!", fig.align='center', out.width="85%"}
knitr::include_graphics("fig/when_to_parallel.svg")
```
## Parallel computing in R
While there are several alternatives (just take a look at the
[High-Performance Computing Task View](https://cran.r-project.org/web/views/HighPerformanceComputing.html)),
we'll focus on the following R-packages for **explicit parallelism**:
* [**parallel**](https://cran.r-project.org/package=parallel): R package that provides '[s]upport for parallel computation,
including random-number generation'.
* [**future**](https://cran.r-project.org/package=future): '[A] lightweight and
unified Future API for sequential and parallel processing of R
expression via futures.'
* [**Rcpp**](https://cran.r-project.org/package=Rcpp) + [OpenMP](https://www.openmp.org):
[Rcpp](https://cran.r-project.org/package=Rcpp) is an R package for integrating
R with C++ and OpenMP is a library for high-level parallelism for C/C++ and
FORTRAN.
Others but not used here
* [**foreach**](https://cran.r-project.org/package=foreach) for iterating through lists in parallel.
* [**Rmpi**](https://cran.r-project.org/package=Rmpi) for creating MPI clusters.
And tools for implicit parallelism (out-of-the-box tools that allow the
programmer not to worry about parallelization):
* [**gpuR**](https://cran.r-project.org/package=gpuR) for Matrix manipulation using
GPU
* [**tensorflow**](https://cran.r-project.org/package=tensorflow) an R interface to
[TensorFlow](https://www.tensorflow.org/).
A ton of other types of resources, notably the tools for working with batch schedulers such as [Slurm](http://slurm.schedmd.com), and [HTCondor](https://research.cs.wisc.edu/htcondor/).