Project Themis

Project Themis is an open benchmarking framework for big data solutions.

Architecture

Themis consists of 5 layers:

Layer 1: Spec

Themis spec define necessary component for benchmarking, such as generative dataset, benchmark workflow and reporting. There are implementations of the spec in different languages such as Python and Java.

Layer 2: Artifacts

Artifacts are materialized implementation of Themis spec that are required for benchmark execution, such as a Jar for Spark data generation application or a text file for a set of SQL queries to benchmark against.

Layer 3: CLI

A Python CLI is used to orchestrate each step in a benchmark execution.

Layer 4: Airflow

Apache Airflow is used to define a complete benchmark workflow as a DAG, and automate the orchestration by directly invoking the CLI to execute commands in the DAG.

Layer 5: CDK

AWS CDK integration is provided for bootstrapping a Themis benchmark environment on AWS.

Getting Started

Build shadow jar:

./gradlew :themis-spark-iceberg:shadowJar

Upload to S3:

aws s3 cp spark-iceberg/build/libs/themis-spark-iceberg-0.1.0-all.jar s3://yzhaoqin-iceberg-test/themis/themis-spark-iceberg-0.1.0-all.jar

Create EMR Spark cluster, make sure Glue for Spark and Hive catalog is enabled. Then run EMR step using command-runner.jar with the following configs input:

spark-submit 
--class io.themis.spark.iceberg.IcebergDatagen
s3://yzhaoqin-iceberg-test/themis/themis-spark-iceberg-0.1.0-all.jar
--gen random 
--database-name themis_simple 
--database-location s3://yzhaoqin-iceberg-test/themis/themis_simple
--drop-database-if-exists 
--parallelism 100 
--row-count 10000
--replace-tables

You will see 3 tables corresponding to the ones defined in io.themis.datagen.SimpleTables

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
core		core
gradle/wrapper		gradle/wrapper
spark-iceberg		spark-iceberg
spark		spark
tests		tests
themis		themis
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
cdk.json		cdk.json
gradlew		gradlew
gradlew.bat		gradlew.bat
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
settings.gradle		settings.gradle
themis-logo.png		themis-logo.png
versions.props		versions.props

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Themis

Architecture

Layer 1: Spec

Layer 2: Artifacts

Layer 3: CLI

Layer 4: Airflow

Layer 5: CDK

Getting Started

About

Releases

Packages

Languages

License

arminnajafi/themis

Folders and files

Latest commit

History

Repository files navigation

Project Themis

Architecture

Layer 1: Spec

Layer 2: Artifacts

Layer 3: CLI

Layer 4: Airflow

Layer 5: CDK

Getting Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages