-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idea: Introduce memory
catalog
#412
Comments
User Story A: I'm a user of Iceberg downstream. I'm attempting to integrate Iceberg into my project and need to conduct unit tests to ensure the accuracy of my Iceberg-related code. However, I've discovered that I must first connect to a catalog. Although setting up a REST catalog is quick, it doesn't suit my needs well. |
User Story B: I'm an external consumer of iceberg tables. My clients will store TiB of Iceberg data in S3 using their own catalogs. Please note, I don't have access to their catalog systems. The only thing available to me is paths to different tables. Which catalog should I set up to read/fetch data from these iceberg tables? I know we have StaticTable, but it will need: use iceberg::io::FileIO;
use iceberg::table::StaticTable;
use iceberg::TableIdent;
async fn example() {
let metadata_file_location = "s3://bucket_name/path/to/metadata.json";
let file_io = FileIO::from_path(&metadata_file_location).unwrap().build().unwrap();
let static_identifier = TableIdent::from_strs(["static_ns", "static_table"]).unwrap();
let static_table = StaticTable::from_metadata_file(&metadata_file_location, static_identifier, file_io).await.unwrap();
println!("{:?}", static_table.metadata());
} I want: let table2 = catalog
.load_table(&TableIdent::from_strs(["default", "t2"]).unwrap())
.await
.unwrap();
println!("{:?}", table2.metadata()); |
Great Idea, I think this could be really useful. We should be able to have this kind of behavior with the SQL catalog and an in-memory sqlite database. |
+1 for this idea. |
Not only in ut, but also useful in our example for demonstration. |
Great idea @liurenjie1024 I'm all for it!
I'm not sure if this is the best example. Ideally when you have a fully functioning catalog, you should be able to expose the catalog with the right privileges (can be behind VPNs etc). It is a bad practice to register a table in multiple catalogs, since it won't track when a table is being updated across the catalogs.
StaticTable serves a different purpose, and is just ment to access read only tables. In PyIceberg we had a MemoryCatalog in tests for a long while, and at some point there was a discussion to move this outside of the test directory. In the end we did not do this, and we used the SQLCatalog with a SQLite backend. This can work both fully in-memory, and also persisted locally (for example in |
Seems a great idea! The situation differs slightly from the Rust side as we might not want to depend on |
As long as both of them are getting maintained :) |
Since I didn't note any objections, I've started working on an in-memory implementation of |
Thanks! |
Hi, I came up with this idea while trying to create quick demos showcasing the capabilities and cool features of iceberg-rust. However, I found that setting up the catalog initially consumes most of the time. This isn't ideal for attracting new users or contributors.
I propose introducing a short-lived, in-memory catalog as an ideal starting point for either testing iceberg-rust or using it statelessly.
The design details are currently unclear, and I would like to seek comments and feedback on this idea. What do you think? Do you find the catalog useful?
The text was updated successfully, but these errors were encountered: