Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring BigQuery samples up to standard. #358

Merged
merged 1 commit into from
Apr 25, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 41 additions & 38 deletions bigquery/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,21 +42,21 @@ __Usage:__ `node datasets --help`

```
Commands:
create <datasetId> Creates a new dataset.
delete <datasetId> Deletes a dataset.
list [projectId] Lists all datasets in the specified project or the current project.
size <datasetId> [projectId] Calculates the size of a dataset.
create <datasetId> Creates a new dataset.
delete <datasetId> Deletes a dataset.
list Lists datasets.

Options:
--help Show help [boolean]
--projectId, -p The Project ID to use. Defaults to the value of the GCLOUD_PROJECT or GOOGLE_CLOUD_PROJECT
environment variables. [string]
--help Show help [boolean]

Examples:
node datasets create my_dataset Creates a new dataset named "my_dataset".
node datasets delete my_dataset Deletes a dataset named "my_dataset".
node datasets list Lists all datasets in the current project.
node datasets list bigquery-public-data Lists all datasets in the "bigquery-public-data" project.
node datasets size my_dataset Calculates the size of "my_dataset" in the current project.
node datasets size hacker_news bigquery-public-data Calculates the size of "bigquery-public-data:hacker_news".
node datasets.js create my_dataset Creates a new dataset named "my_dataset".
node datasets.js delete my_dataset Deletes a dataset named "my_dataset".
node datasets.js list Lists all datasets in the project specified by the
GCLOUD_PROJECT or GOOGLE_CLOUD_PROJECT environments variables.
node datasets.js list --projectId=bigquery-public-data Lists all datasets in the "bigquery-public-data" project.

For more information, see https://cloud.google.com/bigquery/docs
```
Expand All @@ -77,14 +77,16 @@ Commands:
shakespeare Queries a public Shakespeare dataset.

Options:
--help Show help [boolean]
--projectId, -p The Project ID to use. Defaults to the value of the GCLOUD_PROJECT or GOOGLE_CLOUD_PROJECT
environment variables. [string]
--help Show help [boolean]

Examples:
node queries sync "SELECT * FROM publicdata.samples.natality Synchronously queries the natality dataset.
LIMIT 5;"
node queries async "SELECT * FROM Queries the natality dataset as a job.
node queries.js sync "SELECT * FROM Synchronously queries the natality dataset.
publicdata.samples.natality LIMIT 5;"
node queries shakespeare Queries a public Shakespeare dataset.
node queries.js async "SELECT * FROM Queries the natality dataset as a job.
publicdata.samples.natality LIMIT 5;"
node queries.js shakespeare Queries a public Shakespeare dataset.

For more information, see https://cloud.google.com/bigquery/docs
```
Expand All @@ -100,41 +102,42 @@ __Usage:__ `node tables --help`

```
Commands:
create <datasetId> <tableId> <schema> [projectId] Creates a new table.
list <datasetId> [projectId] Lists all tables in a dataset.
delete <datasetId> <tableId> [projectId] Deletes a table.
create <datasetId> <tableId> <schema> Creates a new table.
list <datasetId> Lists all tables in a dataset.
delete <datasetId> <tableId> Deletes a table.
copy <srcDatasetId> <srcTableId> <destDatasetId> Makes a copy of a table.
<destTableId> [projectId]
browse <datasetId> <tableId> [projectId] Lists rows in a table.
import <datasetId> <tableId> <fileName> [projectId] Imports data from a local file into a table.
<destTableId>
browse <datasetId> <tableId> Lists rows in a table.
import <datasetId> <tableId> <fileName> Imports data from a local file into a table.
import-gcs <datasetId> <tableId> <bucketName> <fileName> Imports data from a Google Cloud Storage file into a
[projectId] table.
table.
export <datasetId> <tableId> <bucketName> <fileName> Export a table from BigQuery to Google Cloud Storage.
[projectId]
insert <datasetId> <tableId> <json_or_file> [projectId] Insert a JSON array (as a string or newline-delimited
insert <datasetId> <tableId> <json_or_file> Insert a JSON array (as a string or newline-delimited
file) into a BigQuery table.

Options:
--help Show help [boolean]
--projectId, -p The Project ID to use. Defaults to the value of the GCLOUD_PROJECT or GOOGLE_CLOUD_PROJECT
environment variables. [string]
--help Show help [boolean]

Examples:
node tables create my_dataset my_table "Name:string, Createss a new table named "my_table" in "my_dataset".
node tables.js create my_dataset my_table "Name:string, Creates a new table named "my_table" in "my_dataset".
Age:integer, Weight:float, IsMagic:boolean"
node tables list my_dataset Lists tables in "my_dataset".
node tables browse my_dataset my_table Displays rows from "my_table" in "my_dataset".
node tables delete my_dataset my_table Deletes "my_table" from "my_dataset".
node tables import my_dataset my_table ./data.csv Imports a local file into a table.
node tables import-gcs my_dataset my_table my-bucket Imports a GCS file into a table.
node tables.js list my_dataset Lists tables in "my_dataset".
node tables.js browse my_dataset my_table Displays rows from "my_table" in "my_dataset".
node tables.js delete my_dataset my_table Deletes "my_table" from "my_dataset".
node tables.js import my_dataset my_table ./data.csv Imports a local file into a table.
node tables.js import-gcs my_dataset my_table my-bucket Imports a GCS file into a table.
data.csv
node tables export my_dataset my_table my-bucket my-file Exports my_dataset:my_table to gcs://my-bucket/my-file
node tables.js export my_dataset my_table my-bucket my-file Exports my_dataset:my_table to gcs://my-bucket/my-file
as raw CSV.
node tables export my_dataset my_table my-bucket my-file -f Exports my_dataset:my_table to gcs://my-bucket/my-file
JSON --gzip as gzipped JSON.
node tables insert my_dataset my_table json_string Inserts the JSON array represented by json_string into
node tables.js export my_dataset my_table my-bucket my-file Exports my_dataset:my_table to gcs://my-bucket/my-file
-f JSON --gzip as gzipped JSON.
node tables.js insert my_dataset my_table json_string Inserts the JSON array represented by json_string into
my_dataset:my_table.
node tables insert my_dataset my_table json_file Inserts the JSON objects contained in json_file (one per
node tables.js insert my_dataset my_table json_file Inserts the JSON objects contained in json_file (one per
line) into my_dataset:my_table.
node tables copy src_dataset src_table dest_dataset Copies src_dataset:src_table to dest_dataset:dest_table.
node tables.js copy src_dataset src_table dest_dataset Copies src_dataset:src_table to dest_dataset:dest_table.
dest_table

For more information, see https://cloud.google.com/bigquery/docs
Expand Down
162 changes: 84 additions & 78 deletions bigquery/datasets.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,123 +15,129 @@

'use strict';

const BigQuery = require('@google-cloud/bigquery');
function createDataset (datasetId, projectId) {
// [START bigquery_create_dataset]
// Imports the Google Cloud client library
const BigQuery = require('@google-cloud/bigquery');

// The project ID to use, e.g. "your-project-id"
// const projectId = "your-project-id";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standards question: should we consistently specify project IDs in samples, or not?

(I don't think we enforce this particular standard yet, but I could be wrong...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do specify them, we should also test to make sure they are (properly) configurable - i.e. test the sample without GCLOUD_PROJECT being set (or being set to the empty string).

Copy link
Member Author

@jmdobry jmdobry Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuring the project ID via the CLI is generally something we don't do for other products' samples. With BigQuery however, it is more common to be explicit about project ID because of public datasets. Many of the BigQuery samples already allowed configurable project ID via the CLI, I just made all the BigQuery samples consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


// [START bigquery_create_dataset]
function createDataset (datasetId) {
// Instantiates a client
const bigquery = BigQuery();
const bigquery = BigQuery({
projectId: projectId
});

// The ID for the new dataset, e.g. "my_new_dataset"
// const datasetId = "my_new_dataset";

// Creates a new dataset, e.g. "my_new_dataset"
return bigquery.createDataset(datasetId)
// Creates a new dataset
bigquery.createDataset(datasetId)
.then((results) => {
const dataset = results[0];
console.log(`Dataset ${dataset.id} created.`);
return dataset;
})
.catch((err) => {
console.error('ERROR:', err);
});
// [END bigquery_create_dataset]
}
// [END bigquery_create_dataset]

// [START bigquery_delete_dataset]
function deleteDataset (datasetId) {
function deleteDataset (datasetId, projectId) {
// [START bigquery_delete_dataset]
// Imports the Google Cloud client library
const BigQuery = require('@google-cloud/bigquery');

// The project ID to use, e.g. "your-project-id"
// const projectId = "your-project-id";

// Instantiates a client
const bigquery = BigQuery();
const bigquery = BigQuery({
projectId: projectId
});

// The ID of the dataset to delete, e.g. "my_new_dataset"
// const datasetId = "my_new_dataset";

// References an existing dataset, e.g. "my_dataset"
// Creates a reference to the existing dataset
const dataset = bigquery.dataset(datasetId);

// Deletes the dataset
return dataset.delete()
dataset.delete()
.then(() => {
console.log(`Dataset ${dataset.id} deleted.`);
})
.catch((err) => {
console.error('ERROR:', err);
});
// [END bigquery_delete_dataset]
}
// [END bigquery_delete_dataset]

// [START bigquery_list_datasets]
function listDatasets (projectId) {
// [START bigquery_list_datasets]
// Imports the Google Cloud client library
const BigQuery = require('@google-cloud/bigquery');

// The project ID to use, e.g. "your-project-id"
// const projectId = "your-project-id";

// Instantiates a client
const bigquery = BigQuery({
projectId: projectId
});

// Lists all datasets in the specified project
return bigquery.getDatasets()
bigquery.getDatasets()
.then((results) => {
const datasets = results[0];
console.log('Datasets:');
datasets.forEach((dataset) => console.log(dataset.id));
return datasets;
})
.catch((err) => {
console.error('ERROR:', err);
});
// [END bigquery_list_datasets]
}
// [END bigquery_list_datasets]

// [START bigquery_get_dataset_size]
function getDatasetSize (datasetId, projectId) {
// Instantiate a client
const bigquery = BigQuery({
projectId: projectId
});

// References an existing dataset, e.g. "my_dataset"
const dataset = bigquery.dataset(datasetId);

// Lists all tables in the dataset
return dataset.getTables()
.then((results) => results[0])
// Retrieve the metadata for each table
.then((tables) => Promise.all(tables.map((table) => table.get())))
.then((results) => results.map((result) => result[0]))
// Select the size of each table
.then((tables) => tables.map((table) => (parseInt(table.metadata.numBytes, 10) / 1000) / 1000))
// Sum up the sizes
.then((sizes) => sizes.reduce((cur, prev) => cur + prev, 0))
// Print and return the size
.then((sum) => {
console.log(`Size of ${dataset.id}: ${sum} MB`);
return sum;
});
}
// [END bigquery_get_dataset_size]

// The command-line program
const cli = require(`yargs`);

const program = module.exports = {
createDataset: createDataset,
deleteDataset: deleteDataset,
listDatasets: listDatasets,
getDatasetSize: getDatasetSize,
main: (args) => {
// Run the command-line program
cli.help().strict().parse(args).argv; // eslint-disable-line
}
};

cli
const cli = require(`yargs`)
.demand(1)
.command(`create <datasetId>`, `Creates a new dataset.`, {}, (opts) => {
program.createDataset(opts.datasetId);
})
.command(`delete <datasetId>`, `Deletes a dataset.`, {}, (opts) => {
program.deleteDataset(opts.datasetId);
})
.command(`list [projectId]`, `Lists all datasets in the specified project or the current project.`, {}, (opts) => {
program.listDatasets(opts.projectId || process.env.GCLOUD_PROJECT);
})
.command(`size <datasetId> [projectId]`, `Calculates the size of a dataset.`, {}, (opts) => {
program.getDatasetSize(opts.datasetId, opts.projectId || process.env.GCLOUD_PROJECT);
.options({
projectId: {
alias: 'p',
default: process.env.GCLOUD_PROJECT || process.env.GOOGLE_CLOUD_PROJECT,
description: 'The Project ID to use. Defaults to the value of the GCLOUD_PROJECT or GOOGLE_CLOUD_PROJECT environment variables.',
requiresArg: true,
type: 'string'
}
})
.command(
`create <datasetId>`,
`Creates a new dataset.`,
{},
(opts) => createDataset(opts.datasetId, opts.projectId)
)
.command(
`delete <datasetId>`,
`Deletes a dataset.`,
{},
(opts) => deleteDataset(opts.datasetId, opts.projectId)
)
.command(
`list`,
`Lists datasets.`,
{},
(opts) => listDatasets(opts.projectId)
)
.example(`node $0 create my_dataset`, `Creates a new dataset named "my_dataset".`)
.example(`node $0 delete my_dataset`, `Deletes a dataset named "my_dataset".`)
.example(`node $0 list`, `Lists all datasets in the current project.`)
.example(`node $0 list bigquery-public-data`, `Lists all datasets in the "bigquery-public-data" project.`)
.example(`node $0 size my_dataset`, `Calculates the size of "my_dataset" in the current project.`)
.example(`node $0 size hacker_news bigquery-public-data`, `Calculates the size of "bigquery-public-data:hacker_news".`)
.example(`node $0 list`, `Lists all datasets in the project specified by the GCLOUD_PROJECT or GOOGLE_CLOUD_PROJECT environments variables.`)
.example(`node $0 list --projectId=bigquery-public-data`, `Lists all datasets in the "bigquery-public-data" project.`)
.wrap(120)
.recommendCommands()
.epilogue(`For more information, see https://cloud.google.com/bigquery/docs`);
.epilogue(`For more information, see https://cloud.google.com/bigquery/docs`)
.help()
.strict();

if (module === require.main) {
program.main(process.argv.slice(2));
cli.parse(process.argv.slice(2));
}
2 changes: 1 addition & 1 deletion bigquery/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"yargs": "7.1.0"
},
"devDependencies": {
"@google-cloud/nodejs-repo-tools": "1.3.1",
"@google-cloud/nodejs-repo-tools": "1.3.2",
"ava": "0.19.1",
"proxyquire": "1.7.11",
"sinon": "2.1.0",
Expand Down
Loading