MiniTorch is a library for machine learning engineers who wish to learn about the internal concepts underlying deep learning systems. It is a pure Python re-implementation of the Torch API designed to be simple, easy-to-read, tested, and incremental. The final library can run Torch code.
MiniTorch requires Python 3.8 or higher. To check your version of Python, run either:
>>> python --version
The next step is to install packages. There are several packages used throughout these assignments, and you can install them in your virtual environment by running:
>>> python -m pip install -r requirements.txt
>>> python -m pip install -r requirements.extra.txt
>>> python -m pip install -Ue .
For anaconda users, you need to run an extra command to install llvmlite:
>>> conda install llvmlite
llvmlite is a dependency of numba, which is a dependency of the numba-jit package. This package is used to speed up the code in the library. If you are having trouble installing llvmlite, you can remove the numba-jit package from the requirements.extra.txt file.
Make sure that everything is installed by running python and then checking:
>>> import minitorch
These functions are simple, but they form the basis of MiniTorch. Make sure that you understand each of them as some terminologies might be new.
minitorch.operators.mul(x: float, y: float) -> float
minitorch.operators.id(x: float) -> float
minitorch.operators.eq(x: float, y: float) -> float
minitorch.operators.neg(x: float) -> float
minitorch.operators.add(x: float, y: float) -> float
minitorch.operators.max(x: float, y: float) -> float
minitorch.operators.lt(x: float, y: float) -> float
minitorch.operators.sigmoid(x: float) -> float
minitorch.operators.relu(x: float) -> float
minitorch.operators.inv(x: float) -> float
minitorch.operators.inv_back(x: float, d: float) -> float
minitorch.operators.relu_back(x: float, d: float) -> float
minitorch.operators.log_back(x: float, d: float) -> float
minitorch.operators.is_close(x: float, y: float) -> float
tests/test_operators.py
The use of higher-order functions in Python, implement three basic functional concepts.
minitorch.operators.map(fn: Callable[[float], float]) -> Callable[[Iterable[float]], Iterable[float]]
minitorch.operators.negList(ls: Iterable[float]) -> Iterable[float]
minitorch.operators.zipWith(fn: Callable[[float, float], float]) -> Callable[[Iterable[float], Iterable[float]], Iterable[float]]
minitorch.operators.addLists(ls1: Iterable[float], ls2: Iterable[float]) -> Iterable[float]
minitorch.operators.reduce(fn: Callable[[float, float], float], start: float) -> Callable[[Iterable[float]], float]]
minitorch.operators.sum(ls: Iterable[float]) -> float
minitorch.operators.prod(ls: Iterable[float]) -> float
>>> streamlit run app.py -- 0
We utilized the computation graph as a way to automatically compute derivatives of arbitrary python functions. The trick behind this auto differentiation is to implement the derivative of each invidual function, and then utilize the chain rule to compute a derivative for any scale value.
Let us say that we have a complex function :math:h(x) = f(g(x))
. We
want to compute :math:h'(x)
. For simplicity, we use :math:z = g(x)
,
and draw :math:h
as two boxes left to right.
The chain rule tell us how to compute this term. Specifically it gives the following formula.
The above derivative function tells us to compute the derivative of the
right-most function (:math:f
), and then multiply it by the derivative of the left function (:math:g
).
Here is where the perspective of thinking of functions as boxes pays off. We simply reverse the order.
The :math:d
multiplier passed to backward
of
the first box (left) should be the value returned by backward
of the
second box. The 1 at the end is to start off the chain rule process
with a value for :math:d_{out}
.
Tensor is a fancy name for a simple concept. A tensor is a multi-dimensional array of arbitrary dimensions. It is a convenient and efficient way to hold data, which becomes much more powerful when paired with fast operators and auto differentiation.
1-dimensional tensor (vector):
2-dimensional tensor (matrix):
three-dimensional tensor with shape (2, 3, 3) and size 18
Users of a Tensor library only have to be aware of the shape
and
size
of a tensor. However, there are important implementation details
that we need to keep track of. To make our code a bit cleaner, we
need to separate out the internal tensor data
from the user-facing
tensor. In addition to the shape
, minitorch.TensorData
manages tensor storage
and strides
:
- Storage is where the core data of the tensor is kept. It is always a
1-D array of numbers of length
size
, no matter the dimensionality orshape
of the tensor. Keeping a 1-D storage allows us to have tensors with different shapes point to the same type of underlying data.
- Strides is a tuple that provides the mapping from user indexing
to the position in the 1-D
storage
.
Broadcasting makes tensors convenient and
efficient to use, which comes in handy particularly for zip
operations.
So far all of our zip
operations assume two input
tensors of exactly the same size and shape. However there are many
interesting cases to zip
two tensors of different size.
This module is focused on taking advantage of tensors to write fast code, first on standard CPUs and then using GPUs.
The major technique we will use to speed up computation is parallelization. General parallel code can be difficult to write and get correct. However, we can again take advantage of the fact that we have structured our infrastructure around a set of core, general operations that are used throughout the codebase.
Numba is a numerical JIT compiler for Python. When a function is first created, it converts raw Python code to faster numerical operations under the hood. We have seen an example of this library earlier when developing our mathematical operators.
e.g.
def neg(x):
return -x
def map(fn):
fn = njit()(fn)
def _map(out, input):
# Change 3: Run the loop in parallel (prange)
for i in prange(len(out)):
out[i] = fn(input[i])
return njit(parallel=True)(_map)
CPU parallelization and operator fusion is important, but when you really need efficiency and scale, specialized hardware is critical. It is really hard to exaggerate how important GPU computation is to deep learning: it makes it possible to run many models that would have been intractable even just several years ago.
Writing code of GPUs requires a bit more work than the CPU parallelization examples. GPUs have a slightly different programming model than CPUs, which can take some time to fully understand. Luckily though, there is a nice Numba library extension that allows us to code for GPUs directly in Python.
The most commonly used programming model for GPUs in deep learning is known as CUDA. CUDA is a proprietary extension to C++ for Nvidia devices. Once you fully understand the terminology, CUDA is a relatively straightforward extension to the mathematical code that we have been writing.