TPC - DS - Big Query

Credits: Most scripts have been referenced from Fivetran DW Benchmark and have been adapted to suit our particular usecase.

Steps:

Move dsdgen to a GCS bucket to a specific location as mentioned in the bootstrap script
Create a High CPU VM eg. 16vCPU
Clone this repository

git clone $REPO_URL

chmod +x *.sh

Run bootstrap.sh
1. This pulls dsdgen binary
2. Installs Google Fuse; this is to mount GCS bucket as a local folder - More info
Run data_gen.sh
Usage:
```
./data_gen.sh $CPU $SCALE
```
1. This is responsible for generating data
2. $CPU denotes the amount of parallelism must be > 1
3. $SCALE denotes the scale of data that needs to be generated
4. This creates and mounts a GCS Bucket and writes data to it
5. NOTE: Ensure that $CPU is close to number of CPUs in VM for efficient parallel generation
Run load_data.sh
Usage:
```
./load_data.sh $SCALE
```
1. This is responsible of loading data in GCP buckets created in step 5 to BigQuery
2. $SCALE denotes the scale of data that needs to be loaded to BigQuery
3. Note: Before running this step ensure that data is generated and present in the appropriate GCS Bucket
Run benchmark.sh
Usage:
```
./benchmark.sh $SCALE
```
1. This is responsible for running TPC-DS queries and measuring query execution time
2. Generates a csv file in results folder containing the query start_time and end_time
3. Saves query statistics in the same directory

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
query		query
results		results
warmup		warmup
Readme.md		Readme.md
benchmark.sh		benchmark.sh
bootstrap.sh		bootstrap.sh
data_gen.sh		data_gen.sh
load_data.sh		load_data.sh