Add SQL dialect for DB2 for IBM i #658

chrjorgensen · 2023-11-08T15:12:24Z

Hello.

We use your great SQL formatter utility in our Db2 for IBM i tools extension, but we find that the DB2 dialect in the formatter is for DB2 for Z (mainframe), whereas our extension is for IBM i on IBM Power server. There is quite some differences between the two platforms, which causes some problems in the SQL formatter.

So we offer this PR to add a different DB2 dialect targeted for IBM i - so we and other users of your extension can select this platform instead of IBM Z.

I'm a newbie to your code, but I have tried to copy everything Db2 related into new files for Db2i. Please verify carefully and let me know, if I missed something.

Thanks.

chrjorgensen · 2023-11-08T15:20:26Z

It seems like the tests fails when run for the new Db2i dialect - what have I missed?

etc...

nene

Thanks for the PR. Pointed out a few issues that popped up in my first quick glance.

You should keep in mind that the original DB2 implementation is likely seriously problematic. I have no idea how many people actually use it, if any. It was introduced during some very early days of SQL Formatter development and has remained largely unchanged. I Personally have never used any version of DB2.

src/languages/db2i/db2i.formatter.ts

nene · 2023-11-08T16:29:55Z

I'm unsure whether having two DB2 dialects is worth it. The fact that the existing dialect targets the mainframe version is most likely just the result of googling DB2 syntax and picking the first result which happened to be the mainframe version.

I would guess that there are less people interested in using this tool with the mainframe version... but really I have no idea.

chrjorgensen · 2023-11-09T08:12:38Z

Thanks for showing me where I made mistakes, I will fix them asap.

I didn't know about the state of the existing DB2 dialect, assumed it was okay and copied it. I'm also not familiar how the formatter works and the rules it applies, but I learn along the way.

FYI, there are three DB2 versions available: DB2 for Z (mainframe), DB2 for IBM i and DB2 for LUW (Linux, Unix, Windows). They are not the same (even though they share some codebase) and each have special functions and keywords...

nene · 2023-11-09T13:22:41Z

Looking through the few past issues that relate to DB2 I see the following:

white space inserted for language db2 #123 references the Z-version
Language DB2 format with $ failed #203 seems to be about a problem with using a templating language, not really related to DB2. At least I guess so. It mentions DBeaver tool, which supports all variants of DB2.
[VSCODE] SQLFlavourOverride: db2 .... Gets error on "#" character, even though db2 allows for it #550 references the i-version

The current implementation of DB2 is really a mess. I have taken some stuff (like list of keywords) from the Z-version and some other stuff (like identifier syntax) from the LUW-version.

I'm currently inclining towards throwing the existing db2 implementation away and just keeping the i-version. At least I know that somebody is actively using it, and the differences between these DB2 versions for now are really minimal.

Just FYI. This doesn't really effect this PR.

nene · 2023-11-09T13:38:54Z

test/features/join.ts

@@ -42,43 +42,43 @@ export default function supportsJoin(
        const result = format(`
          SELECT * FROM customers
          ${join} orders ON customers.customer_id = orders.customer_id
-          ${join} items ON items.id = orders.id;
+          ${join} items ON items.col1 = orders.col1;


Is it really the case that unquoted id can't be used as a column name in DB2 i?

When I look at the reserved word lists of LUW-version and Z-version the ID indeed doesn't appear in reserved words list.

However I still find it surprising that a) one would need to quote such a commonly used column name b) there's such a striking difference between the i-version of DB2 from other DB2 dialects.

The reserved words list for DB2 for IBM i shows ID as a reserved word - but I haven't tested if ID is allowed in the context of a column name.

I stumbled into the ID problem when I activated JOIN ... USING - suddenly the tests failed for DB2i because ID was converted to uppercase by the formatter. So I changed the id in the test to a more generic column name.

Just tested id ad a column name on DB2 for i - and it is indeed allowed... Is it better to remove ID from the reserved words list?

Could you please test this?

I remember seeing ID in the keywords list of some other SQL dialects, but it has always turned out that it's not really a reserved keyword.

Yep. Best to just remove it from reserved words list. Given that this list has currently like 500 words, my suspicion is that many more of these keywords are actually not reserved.

I encountered a similar problem when trying to find out which keywords does SingleStoreDB actually reserve. Their documentation only listed all the keywords and didn't distinguish which ones are reserved and which ones were not. I ended up executing hundreds of SQL queries to see which would actually result in a failure.

chrjorgensen · 2023-11-09T13:52:13Z

Thanks for looking into the DB2 "mess"! 😃

I've made some updates to the DB2i formatter - and it raised some questions in my head:

I had to remove ANY from the functions, since it collided with the ANY set operator. Any (no pun intended) way to distinguish between the two?
Where are the rules for identifier name specified? Is it the identspecifications in <dialect>.formatter.ts?
What is the priority used by the formatter? Order of specifications?

I'm curious and want to learn how the formatter works...

nene · 2023-11-09T14:27:35Z

I had to remove ANY from the functions, since it collided with the ANY set operator. Any (no pun intended) way to distinguish between the two?

Not really. I guess in both cases ANY will be followed by (, which makes distinguishing the two extra difficult. Also related to #444

Where are the rules for identifier name specified? Is it the identspecifications in <dialect>.formatter.ts?

Yes, that's correct. The identChars option defines additional characters for identifier names. The regex for matching identifiers is constructed here: https://github.com/sql-formatter-org/sql-formatter/blob/master/src/lexer/regexFactory.ts#L162C22-L162C22

What is the priority used by the formatter? Order of specifications?

The Tokenizer class defines the order of matching different types of tokens. Pretty much reading the rules listed in this class from top-to-bottom... with higher priority items at the top. From there one can see that reservedClauses is matched before reservedSelect.

But really the main thing is not to repeat the same pattern in multiple places. One notable special case is reservedPhrases which sole purpose is to have higher priority than reservedClauses, which allows overriding patterns that would otherwise be detected and formatted as separate clauses, like overriding ON UPDATE so it's not confused with the UPDATE statement.

nene · 2023-11-09T18:29:40Z

I made small tweaks and merged this in.

nene · 2023-11-10T09:58:16Z

This is now released in 13.1.0.

chrjorgensen · 2023-11-10T13:34:09Z

Thank you! And thank you for explaining about the formatter and how it works. 😃

worksofliam · 2023-11-10T17:37:00Z

Thanks @nene and @chrjorgensen for your work here. I am grateful!

Add SQL dialect for DB2 for IBM i

cb83ef6

nene requested changes Nov 8, 2023

View reviewed changes

src/languages/db2i/db2i.formatter.ts Outdated Show resolved Hide resolved

src/languages/db2i/db2i.formatter.ts Outdated Show resolved Hide resolved

src/languages/db2i/db2i.formatter.ts Show resolved Hide resolved

chrjorgensen added 5 commits November 9, 2023 12:15

Fix DB2 for IBM i formatter errors

c5feee5

Combine common db2i oneline clauses

7540fbf

Add replace and exists options to db2i tests

d0c9daf

db2i supports 'using' on joins

6802136

Fix join tests - 'id' is a reserved word in db2i

dd91db1

nene reviewed Nov 9, 2023

View reviewed changes

chrjorgensen requested a review from nene November 9, 2023 13:52

nene added a commit that referenced this pull request Nov 9, 2023

Merge PR #658: Add SQL dialect for DB2 for IBM i

d192a2c

nene merged commit dd91db1 into sql-formatter-org:master Nov 9, 2023
2 checks passed

chrjorgensen deleted the feature/add-db2-for-i-dialect branch November 10, 2023 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SQL dialect for DB2 for IBM i #658

Add SQL dialect for DB2 for IBM i #658

chrjorgensen commented Nov 8, 2023

chrjorgensen commented Nov 8, 2023

nene left a comment

nene commented Nov 8, 2023

chrjorgensen commented Nov 9, 2023

nene commented Nov 9, 2023

nene Nov 9, 2023

chrjorgensen Nov 9, 2023

chrjorgensen Nov 9, 2023

nene Nov 9, 2023

nene Nov 9, 2023

nene Nov 9, 2023

chrjorgensen commented Nov 9, 2023

nene commented Nov 9, 2023 •

edited

Loading

nene commented Nov 9, 2023

nene commented Nov 10, 2023

chrjorgensen commented Nov 10, 2023

worksofliam commented Nov 10, 2023

Add SQL dialect for DB2 for IBM i #658

Add SQL dialect for DB2 for IBM i #658

Conversation

chrjorgensen commented Nov 8, 2023

chrjorgensen commented Nov 8, 2023

nene left a comment

Choose a reason for hiding this comment

nene commented Nov 8, 2023

chrjorgensen commented Nov 9, 2023

nene commented Nov 9, 2023

nene Nov 9, 2023

Choose a reason for hiding this comment

chrjorgensen Nov 9, 2023

Choose a reason for hiding this comment

chrjorgensen Nov 9, 2023

Choose a reason for hiding this comment

nene Nov 9, 2023

Choose a reason for hiding this comment

nene Nov 9, 2023

Choose a reason for hiding this comment

nene Nov 9, 2023

Choose a reason for hiding this comment

chrjorgensen commented Nov 9, 2023

nene commented Nov 9, 2023 • edited Loading

nene commented Nov 9, 2023

nene commented Nov 10, 2023

chrjorgensen commented Nov 10, 2023

worksofliam commented Nov 10, 2023

nene commented Nov 9, 2023 •

edited

Loading