Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasource): add support for cassandra/scylla #2332

Open
tempbottle opened this issue Dec 29, 2023 · 7 comments
Open

feat(datasource): add support for cassandra/scylla #2332

tempbottle opened this issue Dec 29, 2023 · 7 comments
Assignees
Labels
feat New feature or request

Comments

@tempbottle
Copy link

Description

As title.

@tempbottle tempbottle added the feat New feature or request label Dec 29, 2023
@tychoish
Copy link
Contributor

This looks cool! Would love to hear more about what you're trying to build and how you expect to use Cassandra/Scylla data in GlareDB particularly with regards to handling the schema translation. Also it would be cool to know more about the environment that you're working in so we can make sure we test it appropriately.

Thanks for your interest and attention!

@tempbottle
Copy link
Author

tempbottle commented Dec 29, 2023

We just have many big tables in cassandra or may migrate to scylladb.
And we want to analyze some data with cassandra tables and mysql tables.
There are too many join logic from the two datasource. And the implementation of join is by hand write code in java application. But this java code look like very easier to implement by SQL.

@tychoish
Copy link
Contributor

Awesome this makes sense!

Which version of Cassandra are you using?

Would you be interested in/able to? talking/corresponding more with us next week about the queries you'd need to run on the Cassandra side?

There are a few parts of any integration between GlareDB and external data sources:

  • translating Cassandra data types and structures into GlareDB and connecting to Cassandra servers.
  • translating SQL queries to Cassandra (CQL) queries so that GlareDB can push down queries to Cassandra (typically we do this just for projections and simple predicates at first)
  • writing data to the external data source.

I trust we can drop the write path (for now, certainly), and the first part could be relatively straightforward, but we have some opportunity to control the scope for the second, which could help us get something to you sooner.

@universalmind303 universalmind303 self-assigned this Jan 2, 2024
@universalmind303 universalmind303 changed the title Data source from cassandra or scylladb feat(datasource): add support for cassandra/scylla Jan 5, 2024
@universalmind303
Copy link
Contributor

universalmind303 commented Jan 5, 2024

initial functionality is implemented in #2344

follow up issues

@tempbottle
Copy link
Author

tempbottle commented Jan 5, 2024

Awesome this makes sense!

Which version of Cassandra are you using?

Would you be interested in/able to? talking/corresponding more with us next week about the queries you'd need to run on the Cassandra side?

There are a few parts of any integration between GlareDB and external data sources:

* translating Cassandra data types and structures into GlareDB and connecting to Cassandra servers.

* translating SQL queries to Cassandra (CQL) queries so that GlareDB can push down queries to Cassandra (typically we do this just for projections and simple predicates at first)

* writing data to the external data source.

I trust we can drop the write path (for now, certainly), and the first part could be relatively straightforward, but we have some opportunity to control the scope for the second, which could help us get something to you sooner.

Sorry to reply slowly.
Thanks for your attention.
We have two main Cassandra versions(3.9 and 4.1) in the data center. And we plan to migrate to 4.1 in the future.
We don't have very complex SQL in Cassandra now. There are mostly simple query like this:

Select * from table1 where user_city="xxxxxx" and regist_date<="20231222" and regist_date>="20231001" and last_logon_date>"20231210"

Hope this info is helpful.

universalmind303 added a commit that referenced this issue Jan 5, 2024
partially implements #2332 

This PR is limited to only the table function. 

Inserts & `CREATE EXTERNAL` will be done in separate PR's
@tychoish
Copy link
Contributor

@tempbottle, just wanted to follow up with the current state of this:

Cassnadra/scylla datasources (tables and entire databases) exists in the codebase today, and in the latest release. However, the initial implementation missed authentication, and we have a PR open to add that, which should be in by the end of the week and we can cut a point-release with this too.

Wanted to give you an update and see if the auth-free example works locally, and also that we'll have a more complete version out really soon.

(Writes and more pushdowns coming soon, particularly if, from a high level what we have so far works for you.)

Cheers,
sam

@greyscaled
Copy link
Contributor

greyscaled commented Jan 23, 2024

Adding to this, we have a few examples of what's possible in our release notes: https://github.com/GlareDB/glaredb/releases/tag/v0.8.0

Cheers,
Grey

@scsmithr scsmithr removed the epic 🏁 label Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants