The repository contains three notebooks of practice:
- General introduction to PySpark syntaxes for ETL data processing
- Example of accomplishing ML related task (regression problem) using intrinsic PySpark modules.
- Example of parallelization of Pi number calculation in PySpark
HWs and Projects on High Performance Python Lab (HPPL `21) course at Skoltech