Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publishing prql-java to maven central #1643

Closed
zhicwu opened this issue Jan 30, 2023 · 30 comments
Closed

Publishing prql-java to maven central #1643

zhicwu opened this issue Jan 30, 2023 · 30 comments

Comments

@zhicwu
Copy link

zhicwu commented Jan 30, 2023

Hi there! Any plan to publish the official Java lib with cross-compile native binaries to maven central? It would be fun to add it into a JDBC driver, so that user can issue PRQL in addition to SQL in popular database client like DBeaver/DataGrip :)

I noticed https://repo1.maven.org/maven2/io/github/doki23/prql-java/0.0.3/, but unfortunately it does not have native binary for mac. Apart from that, the Java package should be changed to org.prqllang, right?

cc: @doki23

@doki23
Copy link
Contributor

doki23 commented Jan 30, 2023

The java lib is incomplete now. If you want to use it immediately, you can build it local.
BTW, doki23/prql-java isn't maintained, you'd better do not use it.

@zhicwu
Copy link
Author

zhicwu commented Jan 30, 2023

Thanks @doki23. Understood prql-java and dialect support are still at early stage. What I'm trying to do is a fast prototype hoping to increase usage and get more feedback for future development. I'll try to create a new artifact later next month, if there's no plan to release the official Java lib.

@snth
Copy link
Member

snth commented Jan 30, 2023

I would also like to use PRQL from DBeaver and commented a bit on that in a thread on Discord: https://discord.com/channels/936728116712316989/1066475896581660773/1067781287009591398

A JDBC driver to enable this would be huge! @zhicwu If you want to take a stab at it, I'm sure the team will offer you as much support as we can!

@zhicwu
Copy link
Author

zhicwu commented Jan 30, 2023

Thanks @snth, good to know it might be of use :) A DBeaver plugin would be great, but it's just one database client, right? Perhaps it's better to create prql-jdbc as a wrapper of all other JDBC drivers using a slightly different connection string(e.g. jdbc:prql:mysql://localhost instead of jdbc:mysql://localhost).

Anyway, the rough idea I have in mind at this point is:

  1. wait patiently for my new laptop :p
  2. get a working prql-java - perhaps repack it under com.clickhouse:org.prqllang first
  3. enhance ClickHouse Java client and JDBC driver to use prql-java(just compiling PRQL to SQL using clickhouse dialect)
  4. test the driver in DBeaver and release
  5. submit an annoying issue in DBeaver repo asking for PRQL syntax highlighting and autocompletion ;)
  6. wait for complaints...

Jokes aside, I think it's critical to publish prql-java to maven central first to unblock development in all Java applications. It should be fairly easy to implement prql-jdbc(as a submodule of prql-java) to bring PRQL to all JDBC-based database clients including DBeaver and DataGrip. I'd love to create a PR for this.

@max-sixty
Copy link
Member

All of those steps sound great! :)

In particular a PR to bring prql-java along to publish to Maven would be very welcome.

FWIW we have lots of syntax highlighting libraries already, so hopefully integrating into something like DBeaver will be fairly simple. V happy to help with that sort of thing if it can get tools to adopt PRQL.

@snth
Copy link
Member

snth commented Jan 30, 2023

@zhicwu your proposal for a prql-jdbc driver sounds great and the proposed connection string format jdbc:prql:mysql://localhost too.

Looking forward to seeing what you come up with!

@zhicwu
Copy link
Author

zhicwu commented Jan 31, 2023

Great! So I forked the repo, did some search, and thought this over. I'm going to create 2 or 3 PRs one by one for each of the changes shown below. Please review and let me know if there's any concern or better idea.

  1. Rewrite prql-java
    a. use org.prqllang as group id and package name for consistency
    b. make prql-java a multi-module project, with below directory structure

    prql-java                       # parent project of 3 modules: prql-api, prql-jdbc, and prql-parer
     |--- prql-api                  # provides API for PRQL compiling and maybe transformation later
            |--- src
                   |--- main
                          |--- java
                          |--- rust # just trying to follow standard maven directory structure
     |--- prql-jdbc                 # JDBC wrapper to enable PRQL support
     |--- prql-parser               # less important for now
     |--- examples                  # and remove prql-java-demo?

    c. change APIs to something like below:

    String prql = "...";
    String sql = org.prqllang.PrqlApi.getInstance().toSql(prql);
    String json = org.prqllang.PrqlApi.getInstance().toJson(prql);
    
    // enforce to use JNI      
    sql = org.prqllang.PrqlApi.getJniInstance().toSql(prql);
    // enforce to use CLI(including docker/podman)
    json = org.prqllang.PrqlApi.getCliInstance().toJson(prql);

    d. load native library in below order, with consideration of prql-api might be repacked in a shaded jar

    • java.library.path(same as LIBPATH)
    • /META-INF/natvies/<os>/<lib> inside prql-jar.jar
    • ~/.prql/<lib>

    e. fall back to prql-compiler CLI and docker/podman when native library is not available(or failed to load)
    f. workflow for publishing the artifact to maven central - start with separate workflow and then update existing ones?
    g. examples to show to use it

  2. Add prql-jdbc
    a. a simple JDBC wrapper(focusing on JDBC 4.2) with mappings for all supported dialects
    b. some integration tests to cover some databases(e.g. h2, sqlite, clickhouse, mysql, and postgresql etc.)
    c. examples

  3. Add prql-parser
    a. a Congo(aka. JavaCC 21) grammar file similar to prql-compiler/src/parser/prql.pest - it can be used to generate parser for C# and Python as well
    b. some unit tests - would be great to reuse the same cases for rust implemetation
    c. examples

It's not complex but I do need some time to complete all the work, and I didn't mention documentation at all ;) Besides, I know nothing about rust, so I'll simply reuse what we have here to build the JNI lib and get them packed in prql-api.jar.

Lastly, please help with below items in order to publish the artifacts to maven central:

  1. Create GPG key(for signing artifacts) and send key to keyserver.ubuntu.com
  2. Sign up two accounts(one for yourself and the other for CI) on https://issues.sonatype.org
  3. Create a ticket like OSSRH-73453 applying new group id org.prqllang and access for specific accounts(e.g. yourself and the CI account) - you'd better add TXT as in the example to prove that you're owner of prql-lang.org
  4. Define repository secrets: GPG_PRIVATE_KEY, GPG_PASSPHRASE, SONATYPE_USER, SONATYPE_PASSWD

Will start with prql-java rewrite if we're all good with above. We can discuss details on Discord if needed.

@max-sixty
Copy link
Member

This sound awesome @zhicwu !

I have very little Java experience, so I these comments won't be that insightful unfortunately. But others should feel free to comment as ever.

A couple of small questions:

String sql = org.prqllang.PrqlApi.getInstance().toSql(prql);

We have changed our API slightly to use compile here; so would be great if we could use the same names in prql-java. Here's the python impl as a comparison:

assert prql.compile(prql_query)

  1. e. fall back to prql-compiler CLI and docker/podman when native library is not available(or failed to load)

Is this standard in Java? Generally I would have imagined this might give confusing error messages, and having no fallback would be OK. But again, if this is standard in java then great.

2. Add prql-jdbc

Would this mean that someone could use prql-jdbc as a JDBC library, and get back data? So the JDBC library does the PRQL-to-SQL compilation? That would be very cool. My understanding is that JDBC is more used for OLTP than OLAP (which is more of PRQL's focus), but this would still be useful / v nice to abstract that interface away.

3. a. a Congo(aka. JavaCC 21) grammar file similar to prql-compiler/src/parser/prql.pest - it can be used to generate parser for C# and Python as well

I couldn't find anything online about Congo grammars. What would the goal of this be? For syntax highlighting?

Besides, I know nothing about rust, so I'll simply reuse what we have here to build the JNI lib and get them packed in prql-api.jar.

We're very happy to help on the rust side!

Lastly, please help with below items in order to publish the artifacts to maven central:

For sure, let me know when you're ready and I'll make some keys...

  1. use org.prqllang as group id

Great. Not sure how it standardizes around punctuation. The website is prql-lang.org with a -.

@zhicwu
Copy link
Author

zhicwu commented Jan 31, 2023

Thanks for the feedback!

We have changed our API slightly to use compile here; so would be great if we could use the same names in prql-java.

Sure, will do the same in Java for consistency.

Is this standard in Java? Generally I would have imagined this might give confusing error messages, and having no fallback would be OK.

It's not a standard. The idea is to try harder to make it work even when JNI library is missing or not working(incompatible with the installed lib for instance). PrqlApi.getJniInstance().compile(...) won't have the fallback and will throw an exception instead. Let's take it as low priority - I'll add an option or so to disable the fallback by default.

Would this mean that someone could use prql-jdbc as a JDBC library, and get back data? So the JDBC library does the PRQL-to-SQL compilation?

Yes, a JDBC driver for PRQL so that all JDBC-based database clients understand PRQL. The JDBC library is a wrapper of existing JDBC drivers. It takes PRQL as input, call above method for PRQL-to-SQL compilation, and then pass the SQL to the actual JDBC driver for execution.

I couldn't find anything online about Congo grammars. What would the goal of this be? For syntax highlighting?

Sorry I was talking about https://github.com/congo-cc/congo-parser-generator (migrating from https://github.com/javacc21/javacc21). It will be a parser generating AST, so it would be overkill for syntax highlighting etc. I take that as an approach of learning PRQL grammar, but I hope to do more with that for client-side optimization in the future, for example enhanced grammar for queries across multiple datasources etc. Probably worthy of a separate issue for discussion.

For sure, let me know when you're ready and I'll make some keys...

Thanks. I think you may start to make the keys, apply access and maven group id on sonatype, and setup repository secrets whenever you have time :)

Not sure how it standardizes around punctuation. The website is prql-lang.org with a -.

Unfortunately Java does not support - in package name and so does maven group id. Unicode like \u0335 can be used but it's really confusing. Personally, I think org.prql is better but it's not aligned with the domain name.

@doki23
Copy link
Contributor

doki23 commented Feb 1, 2023

Sounds great! But maintaining an individual java parser may be painful. If that's the case, substrait is recommended(#738).

@max-sixty
Copy link
Member

max-sixty commented Feb 1, 2023

That all sounds great.

From the perspective of the project here's a quick take on each of the items — but also do what you want to do:

  • it would be most helpful to have the Java library available and published
  • I can't immediately visualize what the project would gain from a Congo grammar — we already have lots of grammars because tools use different ones for syntax highlighting https://prql-lang.org/book/internals/syntax-highlighting.html (but of course you're welcome to do it as a side-project / maybe there are areas where it would be helpful)
    • It would be a huge amount of work to rebuild the whole compiler in Java, I would not recommend that as a side project...
  • The JDBC driver is a great idea

Unfortunately Java does not support - in package name and so does maven group id.

OK cool. I registered prqllang.org and redirected it (doesn't redirect correctly for a few hours though)

@zhicwu
Copy link
Author

zhicwu commented Feb 1, 2023

Thanks @doki23 and @max-sixty! Let's drop the crazy idea of having prql-parser in Java for now :)

To recap what we discussed:

  • I'll start with prql-java rewrite and publish prql-api to maven central
  • and then I'll add prql-jdbc so that we can issue PRQL in popular JDBC-based database clients

I have some work on hand to complete by this week. Will try to get back to this next week.

@zhicwu zhicwu changed the title Any plan to publish prsql-java to maven central? Any plan to publish prql-java to maven central? Feb 1, 2023
@max-sixty max-sixty changed the title Any plan to publish prql-java to maven central? Publishing prql-java to maven central Feb 1, 2023
@snth
Copy link
Member

snth commented Feb 1, 2023

Yes, a JDBC driver for PRQL so that all JDBC-based database clients understand PRQL. The JDBC library is a wrapper of existing JDBC drivers. It takes PRQL as input, call above method for PRQL-to-SQL compilation, and then pass the SQL to the actual JDBC driver for execution.

This would be so huge! Can't wait to see this come to life!

@zhicwu
Copy link
Author

zhicwu commented Feb 10, 2023

OK, I lied. I implemented prql-jdbc first and I'm not going to submit two PRs but one :) Again, I'll get back to this next week.

image

@vanillajonathan
Copy link
Collaborator

If hyphens aren't allowed in the domain name on Maven maybe you could replace it with a underscore? Like org.prql_lang.

@max-sixty
Copy link
Member

As discussed with @zhicwu :

  • @zhicwu can will open the Jira tickets at Publishing prql-java to maven central #1643 (comment)
  • When we're ready, I will validate the domain
  • @zhicwu can open an issue about changing the prql-lib API, so that in C# & Java, we don't have this 1024 byte limit on the return value. I will keep an eye on it (I don't know C well, so I'm not the best person to do it, but will help, and if really needed, I'll take up C again... :) )
  • @zhicwu will submit a PR for the java bindings, and move the JDBC code to another repo for the moment

This will let us merge his excellent code, so we can base further changes on that code. Hopefully @linux-china can offer feedback as the original author, and a consumer of prql-java.

Thank you @zhicwu !

@zhicwu
Copy link
Author

zhicwu commented Mar 6, 2023

Thanks @max-sixty. Please add dns TXT record for validation as requested at here.

@zhicwu
Copy link
Author

zhicwu commented Mar 6, 2023

If hyphens aren't allowed in the domain name on Maven maybe you could replace it with a underscore? Like org.prql_lang.

Thanks @vanillajonathan, let's use underscore then.

@max-sixty
Copy link
Member

max-sixty commented Mar 6, 2023

Thanks @max-sixty. Please add dns TXT record for validation as requested at here.

Done!

image

@eitsupi
Copy link
Member

eitsupi commented Mar 18, 2023

Hi, what is the current status of this?
I am really looking forward to DBeaver integration (#1643 (comment)).
Thanks.

@max-sixty
Copy link
Member

@zhicwu let me know if you had any time to work on this — the code was already very close indeed!

@zhicwu
Copy link
Author

zhicwu commented Mar 18, 2023

Apologies for my delayed response. I've spent most of my spare time trying to close a release by tomorrow. I should be able to get back to this later next week.

@zhicwu
Copy link
Author

zhicwu commented Jul 6, 2023

You may now download the JDBC driver (e.g. jdbcx-driver-0.1.0.jar) from either Github Release or Maven Central, and follow instructions at here to use PRQL in DBeaver.

@zhicwu zhicwu closed this as completed Jul 6, 2023
@aljazerzen
Copy link
Member

Great news!

I've tested it using DataGrip, and managed to connect to the database, but it was not compiling PRQL to SQL.

@zhicwu
Copy link
Author

zhicwu commented Jul 6, 2023

Thanks for trying it out! Just tried DataGrip, it works but not as smooth as on DBeaver. I use jdbcx:prql:ch://explorer@play.clickhouse.com:443?ssl=true for testing by the way. Will try more databases and enhance the driver later.

image

image

image

image

@aljazerzen
Copy link
Member

Ah I was using jdbcx:postgres:... instead of jdbcx:prql:postgres:....

Now it's working great. The introspection is not working (obviously), but otherwise, this is a great interface for testing out PRQL!

@snth
Copy link
Member

snth commented Jul 7, 2023

@zhicwu This is very exciting! Thank you very much for this! 🙏

I'm trying to use this with DBeaver on Windows but am having some issues:

I downloaded the jdbcx-driver-0.1.1.jar file and placed it in the ~\Downloads\JDBCX directory referenced below but I get the following error:

image

I don't actually have a working prqlc binary that's reachable from Windows since I usually do all that from WSL but I assume that I would get a different error for that?

@zhicwu
Copy link
Author

zhicwu commented Jul 8, 2023

Thanks for the feedback! Indeed there's issue on Windows, but it's been fixed in 0.1.2.

Judging from the error message, you need either put PostreSQL JDBC driver under C:\Users\Tobias\Downloads\JDBCX directory, or add it into classpath in DBeaver. JDBCX is just a wrapper, it relies on existing JDBC driver to query database.

As to prqlc, if you have rust installed in WSL, you can simply get that by issuing command cargo install prqlc. With that, you can set jdbcx.prql.cli.path to something like wsl -- /home/zhicwu/.cargo/bin/prqlc in DBeaver, so that JDBCX can use that to compile PRQL into SQL.

@swt30
Copy link

swt30 commented Jul 26, 2023

Hi all, I have found an easier way to set up this PRQL wrapper in DBeaver, by getting DBeaver to handle the driver download :) I will use the example of setting up PRQL with the Postgres JDBC driver.

  1. Make sure that the prqlc command is installed on your system and note down its path.

  2. In DBeaver, Database -> Driver Manager -> New

  3. Name the driver whatever you like; I went for postgres-prql.

    • Class Name is io.github.jdbcx.WrappedDriver
    • URL template is jdbcx:prql:postgresql://{host}[:{port}][/{database}]. Replace postgresql with your desired database type if you are not setting up the driver for Postgres.

    You may also want to Allow Empty Password.
    image

  4. Under Libraries tab, choose Add Artifact and paste in the JDBCX driver Maven dependency specification:

    <!-- https://mvnrepository.com/artifact/io.github.jdbcx/jdbcx-driver -->
    <dependency>
        <groupId>io.github.jdbcx</groupId>
        <artifactId>jdbcx-driver</artifactId>
        <version>RELEASE</version>
    </dependency>
    

    Click OK.
    image

  5. Repeat to add the Postgres driver.

    <!-- https://mvnrepository.com/artifact/org.postgresql/postgresql -->
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <version>RELEASE</version>
    </dependency>
    

    If you want a non-Postgres JDBC driver, you'll need to provide your desired driver's Maven specification instead. You can also freeze the versions if you like by replacing RELEASE with a version number (current versions are jdbcx-driver 0.1.4 and postgresql 42.6.0)

    This puts the postgres driver on your classpath so you do not have to add it manually later.

  6. Click Download/Update. DBeaver offers to download the drivers for you. Click Download.

  7. Under Driver Properties tab, set:

    • jdbcx.prql.cli.path to your path to prqlc (depending on your system, it may be able to find it with the default prqlc, but this was not the case for me)
    • jdbcx.prql.compile.target to postgres (or the appropriate prql target name for non-postgres)
    • You may want to also experiment with jdbcx.prql.compile.error. The default is ignore which will ignore PRQL compile errors and pass the query through untouched. This is useful because it will not break DBeaver's GUI for things like viewing data, creating and dropping tables and so on. The downside is that you will not get helpful error messages if you write malformed PRQL. Setting it to throw will instead cause queries to fail if they are not valid PRQL, but will prevent the use of the data viewer, table creation DDL, etc.
      You may have to right-click and "Add New Property" if the default driver properties are not listed; while testing this, I found that they sometimes didn't show up (not sure why)

    image

  8. Create a new connection and choose your newly created postgres-prql driver.

At this point you should be able to enter host, port, database, username and password and connect as usual. Any queries you make through this connection will use PRQL!

Many things will still break: for example the interactive filtering, grouping pane, etc. But you can enter and run PRQL queries from the scripting window.

@aljazerzen
Copy link
Member

That's cool! We have to get a summary of this thread into the book: https://prql-lang.org/book/project/integrations/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants