The application uses __TABLES__ table to gather storage statistics about each table in a dataset. Application loops through all datasets one by one.
PLEASE NOTE: This application stores all of these stats in a dataset named utils and a table named daily_storage_stats. If they don't exist in a given project, application creates them for you.
- Create a service account with following permissions:
- BigQuery Data Editor
- BigQuery Data Reader
- BigQuery Job user
- Create and download a service account key file in a JSON format
- Download a source code to your GCE instance or a local machine and go to root of the directory.
- Run following steps to create and activate a virtual environment
virtualenv venv
source venv/bin/activate
- Now install dependencies
pip install -r requirements.txt
- Once these requirements are installed, use following command to run your application.
python main.py --project_id=google.com:testdhaval --service_account_file=bqwriter.json
--project_id = project id for which we are capturing storage size for each table --service_account_file = location of the key file for your service account. -
Once this script finishes, you can run following query:
select * from `{PROJECT_ID}.utils.daily_storage_stats`
{PROJECT_ID} - project_id you provided as an argument when you executed your application.
****Please note:***** You can schedule this application to run on a nightly basis in a cron job to get a daily snapshot of your storage usage for every table and build dashboard to analyze this information.