Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding limited support for case-sensitive table names #8674

Closed
wants to merge 4 commits into from

Conversation

Drizzt321
Copy link

Based on needs in issue #2863, where in RDBMS that supports case-sensitive table names and have tables with case-sensitive names, then those tables are unable to be used with PrestoDB. This change adds optional (configuration controlled) mapping for table names when executing queries that end up calling getTableHandle(). It maps a lowercased name to the name of the table from the database, and supports reloading the mapping in the event that a table is referenced but doesn't currently have a mapping so new tables can be used without restarting Presto.

A test was attempted to be written similar to the TestJdbcClient with a new TestingDatabase, however it was discovered that H2 will return TRUE for metadata.storesUpperCaseIdentifiers() even if DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE is using MODE=MYSQL, however in that case H2 lowercases all table names. So it is not possible to create a test with H2 without modifying the section of code with the call to metadata.storesUpperCaseIdentifiers().

As a first time PR for this project from me, I did complete the CLA as per the repository contribution guidelines.

Based on needs in issue prestodb#2863,
where in RDBMS that supports case-sensitive table names and have
tables with case-sensitive names, then those tables are unable to be
used with PrestoDB. This change adds optional (configuration controlled)
mapping for table names when executing queries that end up calling
getTableHandle(). It maps a lowercased name to the name of the table
from the database, and supports reloading the mapping in the event that
a table is referenced but doesn't currently have a mapping so new tables
can be used without restarting Presto.

A test was attempted to be written similar to the TestJdbcClient with a
new TestingDatabase, however it was discovered that H2 will return TRUE
for metadata.storesUpperCaseIdentifiers() even if
DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE
is using MODE=MYSQL, however in that case H2 lowercases all table names.
So it is not possible to create a test with H2 without modifying the
section of code with the call to metadata.storesUpperCaseIdentifiers().
@@ -101,6 +104,10 @@
protected final String connectionUrl;
protected final Properties connectionProperties;
protected final String identifierQuote;
private final boolean mapTableNames;

private Map<String, Map<String, String>> schemaTableMapping = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like Map of Map things, can you please extract some class instead? That class could have different implementation for mapTableNames equal to true and false. You could have a lock there as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is caching such information is correct? What if someone do some rename in underlying database in the meantime?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can extract that to a separate Class. Is it preferred to be an inner class for this type of usage? Or a properly separate class file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in response about the lack of tests, if you read the 2nd part of my initial commit comment:

A test was attempted to be written similar to the TestJdbcClient with a new TestingDatabase, however it was discovered that H2 will return TRUE for metadata.storesUpperCaseIdentifiers() even if DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE is using MODE=MYSQL, however in that case H2 lowercases all table names. So it is not possible to create a test with H2 without modifying the section of code with the call to metadata.storesUpperCaseIdentifiers().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can extract that to a separate Class. Is it preferred to be an inner class for this type of usage? Or a properly separate class file?

Up to you. Choose the thing what fits bets.

Regarding the tests, sorry for not reading the commit message. What about using PSQL instead of H2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using what was there for the existing tests. Also was other wanting to end up with external applications needing to be run. You're referring to Postgres? In a quick search I actually see some Java libs for embedding Postgres and MySQL for testing purposes, I'll look into both if them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MySQL and PostgreSQL connectors already run tests against the embedded versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a Guava LoadingCache which will allow expiration and background refresh.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, looking at the MySQL specific connector would have been smart to look at for the tests. Thanks for pointing that out, and I'll look at LoadingCache.

@kokosing
Copy link
Contributor

kokosing commented Aug 7, 2017

Also I have not noticed tests, can you please point me what ensures coverage for your change?

{
// Only have 1 thread at a time load the mapping in. This may result in having some queries not return anything or fail because they table
ReentrantLock lock = schemaTableMappingLock.get(jdbcSchemaName);
if (lock == null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use ConcurrentMap::computeIfAbsent here

}

@Config("connection-map-tables")
public BaseJdbcConfig setMapLowercaseTableNames(boolean mapLowercaseTableNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be configurable as the feature is needed in order for the connectors to work properly given Presto's limitations on lowercase table names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought for having a flag is to make this an optional feature for people, so as to preserve current behavior for those who don't want to use this. I'm very happy to make it default on, or even just always on without a flag.

@Drizzt321
Copy link
Author

So I'm running into a situation where I'm having to place the test code in the presto-mysql because I need to use the MySqlClient, unless I essentially copy the MySqlClient (even if as an anonymous class) in order to deal with the quirks of MySQL compared to the BaseJdbcClient. So the tests for this, which really IMO belong in presto-base-jdbc, would be challenging to put there. Thoughts?

@electrum
Copy link
Contributor

electrum commented Aug 11, 2017 via email

Based on the pull request review, making some major changes. List of
changes:
* Handling mapping schema names now as well as table names
* Using CacheLoader to handle loading/adding/reloading the name mapping
  for both schema and tables
* Needed to change so that the concrete client can return the raw
  schema/table names from protected methods, which the base class then
  lowercases as needed
* Needed to have a way to prevent initial cache loading for the Plugin
  tests for each of the clients, otherwise it would fail as no server
  was created to load schemas/tables from
* Updated the Plugin tests for each JDBC concrete client to set the flag
  to not auto-load the schema/table mappings
* Test checking the mapping and case sensitivity is in the presto-mysql
  plugin, as the issues with H2 for a case-sensitive mean I need MySQL
  instance to test against, which means the MySqlClient, which I can't
  pull into presto-base-jdbc as it would cause a circular dependancy
@Drizzt321
Copy link
Author

Sorry for the delay, lots of work on this and of course other things also came up at work. Hopefully I've managed to take into account all of the concerns, see the latest commit message and code for full details.

private Map<String, Map<String, String>> schemaTableMapping = new HashMap<>();
private Map<String, ReentrantLock> schemaTableMappingLock = new ConcurrentHashMap<>();
private final LoadingCache<String, Optional<String>> schemaMappingCache;
private final Map<String, LoadingCache<String, Optional<String>>> schemaTableMapping = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hide these fields and related methods as a separate class. Then it seems that with some abstraction over JDBC (which is used to load the cache) you could write some additional low level unit tests without using mysql. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you're asking to hide these fields as a separate class. Something like a SchemaTableMapping class? Pass in a BaseJdbcClient object to the constructor, and use that to perform all of the loading? Problem with this is how would I get access to the raw RDBMS schema/table names? The public API (currently) only provides for the lowercased names).

Hmm...I suppose I could instead create an implementation of the BaseJdbcClient in the test context which doesn't actually connect to JDBC and returns pre-defined lists of tables/schemas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Sounds a bit like an over engineering.

* @param schema the schema to list the tables for, or NULL for all
* @return the schema + table names
*/
protected List<String[]> getOriginalTablesWithSchema(Connection connection, String schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning List of arrays does not look good. Why not to use SchemaTableName?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaTableName in it's constructor specifically lowercases both the schema and the table strings, thus can't be used for this.

I chose an array as an easy, simple, basic structure that wouldn't take much effort. Also can't be a Map since a schema can have a multiple tables, and we might be pulling from multiple schemas at once rather than just 1 schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead you could create another new class like SchemaTableName, which remain the names casing.

@@ -29,6 +29,6 @@ public void testCreateConnector()
{
Plugin plugin = new PostgreSqlPlugin();
ConnectorFactory factory = getOnlyElement(plugin.getConnectorFactories());
factory.create("test", ImmutableMap.of("connection-url", "test"), new TestingConnectorContext());
factory.create("test", ImmutableMap.of("connection-url", "test", "connection-load-table-mappings", "false"), new TestingConnectorContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could tests for this and sqlserver connector as well? What do you think? Maybe you could define a set of tests in presto-base-jdbc and them call them from each connector which support mapping? And from jdbc connectors which do not support that we could have negative tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not sure what you mean for this, I do have this tests updated/fixed in sqlserver module, as this particular one just seems to be to test that the plugin factory can be fetched and loaded and to create an instance of the plugin successfully.

I do see where defining the tests in presto-base-jdbc for handling mixed-case schema/table names that each JDBC implementation can pull to test individually in their own module if the DB handles mixed-case is useful.

@Drizzt321
Copy link
Author

So I'm leaving for vacation for 2 weeks, so I won't be around for any replies or what not.

@Drizzt321
Copy link
Author

Ok, back to work and getting back into the swing of things, so looking to get back into this and move this forward. @kokosing, any thoughts on my replies to your comments?

@kokosing
Copy link
Contributor

I am sorry, but I won't be able to continue work on this. Please ask somebody else for a review.

* CaseSensitiveMappedSchemaTableName used to return the raw table names
  instead of a String[]
* Added tests by creating some mocked implementations of a Driver and
  metadata and supporting methods necessary for use with the
  BaseJdbcClient to perform some basic case-sensitive name tests
  See TestJdbcClientNameMapping and TestingNameMappingDriver
@Drizzt321
Copy link
Author

So, reviewing the 2 checks that failed, both seem to involve trying to connect to the servers but fail to. Is there an issue with the current server startup for these 2 checks? Or could it be that my pre-load code is happening too early as it's occurring in the constructor and so the rest of the surrounding code hasn't properly completed something needed to correctly connect?

@Drizzt321
Copy link
Author

@electrum, @lvijay, please take a look.


public CaseSensitiveMappedSchemaTableName(String schemaName, String tableName)
{
if (schemaName == null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Presto way to do this appears to be requireNonNull(schemaName, "schemaName is null");

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, thank you.

}

// if someone is listing all of the schema names, throw them all into the cache as a refresh since we already spent the time pulling them from the DB
schemaMappingCache.putAll(mappedNames);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method appears to be doing too much. (1) it's computing all the schema names and (2) caching the values.

It's unclear where (2) is used and why (1) doesn't use it.

Furthermore, it forces that reloadCache must call getSchemaNames before invalidating the values in schemaTableMapping. This won't be clear to a future maintainer of the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use the cache because this method, in my view, must return the current list of schema names directly from the DB. I also see this change as something which needs to NOT change the output of any method, and will only change the input, and even then as little as possible in order to get a query to execute.

The caching being done on the schema names is so that other requests for data can successfully execute those queries, even if it's just for listing the table names if the schema name has mixed case name.

for (String key : schemaTableMapping.keySet()) {
schemaTableMapping.get(key).invalidateAll();
schemaTableMapping.remove(key);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason you can't just iterate through the values?

schemaTableMapping.values().foreach(v -> v.invalidateAll());
schemaTableMapping.clear();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't typically used the Java8 Lambdas because of our code base, which sadly has an incredibly out of date Hibernate which barfs if we try and use Lambdas in the wrong place. sigh

But yes, that forEach is more efficient, especially the clear() after rather than doing a bunch of removes.


schema = finalizeSchemaName(metadata, schema);
Map<String, Map<String, String>> schemaMappedNames = new HashMap<>();
ImmutableList.Builder<SchemaTableName> list = ImmutableList.builder();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this tableNames, perhaps?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the list? It's a local only variable, but sure.

Some small tweaks from the feedback, doing things a
bit more prestodb way.
@tooptoop4
Copy link

any update? hive metastore tables in mysql like DBS, PARTITIONS can't be queried.

presto> select * from mysql.metastore.DBS;
Query 20180913_015448_00010_774as failed: line 1:15: Table mysql.metastore.dbs does not exist

@babayega
Copy link

Is this issue resolved ??
Because I am using presto 0.214 and still facing it.

@hamlet-lee
Copy link

look forward for this! I am eager to know if there are any updates!

@Praveen2112
Copy link
Contributor

@hamlet-lee, @babayega I am currently working on adding support for case-sensitive identifiers. You can track the work here(trinodb/trino#354)

@findepi
Copy link
Contributor

findepi commented Mar 13, 2019

@hamlet-lee, @babayega,
until @Praveen2112 adds proper case-sensitive identifiers, you can use Starburst Presto release
which provides an case-insensitive-name-matching = true | false option for all JDBC-based connectors.
See https://docs.starburstdata.com/latest/release/release-302-e.html

@hamlet-lee
Copy link

@Praveen2112 thanks for the information, I'll track that PR.
@findepi thanks for the information. Currently, we'd like to stick to community version :)

@findepi
Copy link
Contributor

findepi commented Mar 13, 2019

@hamlet-lee sure. Just note that Starburst Presto is freely available.
We planned to contribute the change i mentioned, just haven't had time.
However, given that @Praveen2112 's work is already in progress, I don't think
there would be an interest in our change, as it will become obsolete in the long-term.

@rschlussel
Copy link
Contributor

Closing. This will be supported by https://github.com/prestodb/presto/pulls/kewang1024

@rschlussel rschlussel closed this Aug 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.