Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ShowVariable Statement #3455

Merged
merged 5 commits into from
Sep 14, 2022
Merged

Conversation

waitingkuo
Copy link
Contributor

Which issue does this PR close?

Closes #3364

Rationale for this change

To Support ShowVariable

What changes are included in this PR?

Support SHOW ALL;
This will show all the options in ConfigOptions

DataFusion CLI v12.0.0
❯ show all;
+-------------------------------------------------+---------+
| name                                            | setting |
+-------------------------------------------------+---------+
| datafusion.execution.batch_size                 | 8192    |
| datafusion.explain.logical_plan_only            | false   |
| datafusion.explain.physical_plan_only           | false   |
| datafusion.optimizer.filter_null_join_keys      | false   |
| datafusion.execution.coalesce_batches           | true    |
| datafusion.execution.coalesce_target_batch_size | 4096    |
| datafusion.optimizer.skip_failed_rules          | true    |
+-------------------------------------------------+---------+
7 rows in set. Query took 0.053 seconds.

Support SHOW [VARIABLE];

❯ show datafusion.execution.batch_size;
+---------------------------------+---------+
| name                            | setting |
+---------------------------------+---------+
| datafusion.execution.batch_size | 8192    |
+---------------------------------+---------+
1 row in set. Query took 0.012 seconds.

Are there any user-facing changes?

Show [Variable]; works now

@github-actions github-actions bot added core Core DataFusion crate sql SQL Planner labels Sep 12, 2022
Comment on lines 157 to 162
builder.add_table(
&catalog_name,
INFORMATION_SCHEMA,
SETTINGS,
TableType::View,
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add SETTINGS into information_schema

so that

❯ show tables;
+---------------+--------------------+------------+------------+
| table_catalog | table_schema       | table_name | table_type |
+---------------+--------------------+------------+------------+
| datafusion    | information_schema | tables     | VIEW       |
| datafusion    | information_schema | views      | VIEW       |
| datafusion    | information_schema | columns    | VIEW       |
| datafusion    | information_schema | settings   | VIEW       |
+---------------+--------------------+------------+------------+
4 rows in set. Query took 0.004 seconds.

Comment on lines 617 to 657
struct InformationSchemaSettingsBuilder {
names: StringBuilder,
settings: StringBuilder,
}

impl InformationSchemaSettingsBuilder {
fn new() -> Self {
Self {
names: StringBuilder::new(),
settings: StringBuilder::new(),
}
}

fn add_setting(&mut self, name: impl AsRef<str>, setting: impl AsRef<str>) {
self.names.append_value(name.as_ref());
self.settings.append_value(setting.as_ref());
}
}

impl From<InformationSchemaSettingsBuilder> for MemTable {
fn from(value: InformationSchemaSettingsBuilder) -> MemTable {
let schema = Schema::new(vec![
Field::new("name", DataType::Utf8, false),
Field::new("setting", DataType::Utf8, false),
]);

let InformationSchemaSettingsBuilder {
mut names,
mut settings,
} = value;

let schema = Arc::new(schema);
let batch = RecordBatch::try_new(
schema.clone(),
vec![Arc::new(names.finish()), Arc::new(settings.finish())],
)
.unwrap();

MemTable::try_new(schema, vec![vec![batch]]).unwrap()
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implement InformationSchemaSettingsBuilder
only two columns, name and setting supported now

this is what postgresql has

show all;

                  name                  |                                 setting                                  |                                                               description                                                               
----------------------------------------+--------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------
 allow_system_table_mods                | off                                                                      | Allows modifications of the structure of system tables.
 application_name                       | psql                                                                     | Sets the application name to be reported in statistics and logs.
 archive_cleanup_command                |                                                                          | Sets the shell command that will be executed at every restart point.
 archive_command                        | (disabled)                                                               | Sets the shell command that will be called to archive a WAL file.
 archive_mode                           | off                                                                      | Allows archiving of WAL files using archive_command.
 archive_timeout                        | 0                                                                        | Forces a switch 

Comment on lines 1133 to +1134
/// Configuration options
pub config_options: ConfigOptions,
pub config_options: Arc<RwLock<ConfigOptions>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change SessionConfig.config_options from ConfigOptions to Arc<RwLock<ConfigOptions>
since

  1. we need to share this with CatalogWithInformationSchema
  2. allow to set config

Comment on lines +1176 to 1178
pub fn set(self, key: &str, value: ScalarValue) -> Self {
self.config_options.write().set(key, value);
self
Copy link
Contributor Author

@waitingkuo waitingkuo Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_options.write() to unlock the write lock and set the value

Comment on lines 1254 to 1260
pub fn batch_size(&self) -> usize {
self.config_options
.read()
.get_u64(OPT_BATCH_SIZE)
.try_into()
.unwrap()
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_options.read() to unlock the read lock to get the value

Comment on lines 447 to 472
//#[tokio::test]
//async fn show_unsupported() {
// let ctx = SessionContext::with_config(SessionConfig::new());
//
// let err = plan_and_collect(&ctx, "SHOW SOMETHING_UNKNOWN")
// .await
// .unwrap_err();
//
// assert_eq!(err.to_string(), "This feature is not implemented: SHOW SOMETHING_UNKNOWN not implemented. Supported syntax: SHOW <TABLES>");
//}

// FIXME
// currently we cannot know whether a variable exists or not while the collect, this will output 0 row instead
// one way to fix this is to generate a ConfigOptions and get options' key to compare
// however config.rs is currently in core lib, could not be used by datafusion_sql due to the dependency cycle
#[tokio::test]
async fn show_unsupported() {
let ctx = SessionContext::with_config(SessionConfig::new());
async fn show_non_existing_variable() {
let ctx =
SessionContext::with_config(SessionConfig::new().with_information_schema(true));

let err = plan_and_collect(&ctx, "SHOW SOMETHING_UNKNOWN")
let result = plan_and_collect(&ctx, "SHOW SOMETHING_UNKNOWN")
.await
.unwrap_err();
.unwrap();

assert_eq!(err.to_string(), "This feature is not implemented: SHOW SOMETHING_UNKNOWN not implemented. Supported syntax: SHOW <TABLES>");
assert_eq!(result.len(), 0);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we cannot know whether a variable exists or not, it will always show 0 rows instead while trying to show non-existing variable

❯ show a;
0 rows in set. Query took 0.003 seconds.

one way to solve this is to generate a ConfigOptions here, however config.rs is currently in core lib, could not be used by datafusion_sql due to the dependency cycle

Comment on lines 2379 to 2402
fn show_variable_to_plan(&self, variable: &[Ident]) -> Result<LogicalPlan> {
let variable = ObjectName(variable.to_vec()).to_string();
Err(DataFusionError::NotImplemented(format!(
"SHOW {} not implemented. Supported syntax: SHOW <TABLES>",
variable
)))

if !self.has_table("information_schema", "settings") {
return Err(DataFusionError::Plan(
"SHOW [VARIABLE] is not supported unless information_schema is enabled"
.to_string(),
));
}

let query = if variable.to_lowercase() == "all" {
String::from("SELECT name, setting FROM information_schema.settings")
} else {
format!(
"SELECT name, setting FROM information_schema.settings WHERE name = '{}'",
variable
)
};

let mut rewrite = DFParser::parse_sql(&query)?;
assert_eq!(rewrite.len(), 1);

self.statement_to_plan(rewrite.pop_front().unwrap())
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show [Variable] will be converted to the select statement

@codecov-commenter
Copy link

codecov-commenter commented Sep 12, 2022

Codecov Report

Merging #3455 (0aa1055) into master (f48a997) will increase coverage by 0.03%.
The diff coverage is 94.28%.

@@            Coverage Diff             @@
##           master    #3455      +/-   ##
==========================================
+ Coverage   85.71%   85.74%   +0.03%     
==========================================
  Files         298      299       +1     
  Lines       54882    55224     +342     
==========================================
+ Hits        47040    47354     +314     
- Misses       7842     7870      +28     
Impacted Files Coverage Δ
datafusion/core/src/physical_plan/planner.rs 77.29% <ø> (+0.04%) ⬆️
datafusion/core/src/execution/context.rs 79.31% <80.00%> (+0.25%) ⬆️
datafusion/sql/src/planner.rs 80.95% <81.81%> (+0.06%) ⬆️
datafusion/core/src/catalog/information_schema.rs 94.96% <100.00%> (+0.58%) ⬆️
datafusion/core/tests/sql/information_schema.rs 98.70% <100.00%> (+0.09%) ⬆️
datafusion/expr/src/expr_fn.rs 90.13% <0.00%> (-1.13%) ⬇️
...atafusion/physical-expr/src/expressions/in_list.rs 85.00% <0.00%> (-0.50%) ⬇️
datafusion/expr/src/logical_plan/plan.rs 77.35% <0.00%> (-0.17%) ⬇️
datafusion/core/tests/parquet_pruning.rs 99.43% <0.00%> (ø)
... and 8 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks @waitingkuo -- I think this PR could be merged as is but I have a question about tests and a suggestion on naming that I think are worth considering

@@ -436,15 +444,40 @@ async fn information_schema_show_table_table_names() {
);
}

//#[tokio::test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to leave this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i thought we might have a way to check whether a variable exists or not in the future

const INFORMATION_SCHEMA: &str = "information_schema";
const TABLES: &str = "tables";
const VIEWS: &str = "views";
const COLUMNS: &str = "columns";
const SETTINGS: &str = "settings";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think settings is an "offical" information_schema (if there is such a thing) table. Perhaps we could call this table something like df_settings to make it clear that this table is a datafusion extension?

https://www.postgresql.org/docs/current/information-schema.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not. postgresl has this

willy=#  select name, setting from pg_settings limit 1;
          name           | setting 
-------------------------+---------
 allow_system_table_mods | off
(1 row)

df_settings sounds good. should we leave it in information_schema, or move it out?

Copy link
Contributor

@alamb alamb Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df_settings sounds good. should we leave it in information_schema, or move it out?

I don't have any preference -- whatever is easier at first (information_schema seems reasonable to me)

Copy link
Contributor Author

@waitingkuo waitingkuo Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb updated

❯ show tables;
+---------------+--------------------+-------------+------------+
| table_catalog | table_schema       | table_name  | table_type |
+---------------+--------------------+-------------+------------+
| datafusion    | information_schema | tables      | VIEW       |
| datafusion    | information_schema | views       | VIEW       |
| datafusion    | information_schema | columns     | VIEW       |
| datafusion    | information_schema | df_settings | VIEW       |
+---------------+--------------------+-------------+------------+
4 rows in set. Query took 0.003 seconds.


❯ select * from information_schema.df_settings;
+-------------------------------------------------+---------+
| name                                            | setting |
+-------------------------------------------------+---------+
| datafusion.execution.coalesce_batches           | true    |
| datafusion.explain.physical_plan_only           | false   |
| datafusion.execution.coalesce_target_batch_size | 4096    |
| datafusion.execution.batch_size                 | 8192    |
| datafusion.optimizer.skip_failed_rules          | true    |
| datafusion.optimizer.filter_null_join_keys      | false   |
| datafusion.explain.logical_plan_only            | false   |
+-------------------------------------------------+---------+
8 rows in set. Query took 0.002 seconds.

const INFORMATION_SCHEMA: &str = "information_schema";
const TABLES: &str = "tables";
const VIEWS: &str = "views";
const COLUMNS: &str = "columns";
const DF_SETTINGS: &str = "df_settings";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename settings -> df_settings

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me - thanks @waitingkuo

@alamb alamb merged commit 0388682 into apache:master Sep 14, 2022
@ursabot
Copy link

ursabot commented Sep 14, 2022

Benchmark runs are scheduled for baseline = 9fbee1a and contender = 0388682. 0388682 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support Statement::ShowVariable to show session configs
4 participants