Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databend Roadmap for 2024 (Discussion) #14167

Open
BohuTANG opened this issue Dec 27, 2023 · 14 comments
Open

Databend Roadmap for 2024 (Discussion) #14167

BohuTANG opened this issue Dec 27, 2023 · 14 comments

Comments

@BohuTANG
Copy link
Member

BohuTANG commented Dec 27, 2023

Databend Roadmap for 2024 (Discussion)

Explore our ongoing journey and future plans for Databend. Join the discussion and contribute your ideas!

2024: Compute Where Data Lives: Swift, Smart, Seamless.

Review of 2023

In 2023, Databend scaled significantly.

The largest single table in Databend managed to handle hundreds of thousands of segments, several ten million blocks, tens of trillions of records, encompassing 7PB of raw data and over 300TB of index data.

Main Tasks for 2024

Task Status Comments
Concurrency and Scheduler In Progress Aiming for faster, more efficient task handling and improved system responsiveness.
GEOMETRY Data Type In Progress
TPC-DS Performance In Progress Continuously optimizing for better performance benchmarks.
Full-Text Indexes Done
Multi-Statement Transactions Done
Stored Procedures(Python) In Progress Adding Python support for versatile data analysis alongside SQL.
Storage + Compute + Inference Not Specified Creating a cohesive data platform for AI and cloud computing, provisioning CPU & GPU resources.

Previous Roadmaps for Reference:

@BohuTANG BohuTANG pinned this issue Dec 27, 2023
@GaoYusong
Copy link

Congratulations on what Databend has achieved in such a short time. Looking forward to 2024!

@Xuanwo
Copy link
Member

Xuanwo commented Dec 29, 2023

What exactly is the Support Python Worksheet? Does it enable running Python in Databend?

@ZhiHanZ
Copy link
Collaborator

ZhiHanZ commented Dec 29, 2023

Any plan for SQL transaction and stored procedures?

@ZhiHanZ
Copy link
Collaborator

ZhiHanZ commented Dec 29, 2023

besides, I think we could also support query queueing, warehouse automatica scaling based on pending queue and separate another coordinator component for dispatching physical plan to warehouse compute node.

@BohuTANG
Copy link
Member Author

besides, I think we could also support query queueing, warehouse automatica scaling based on pending queue and separate another coordinator component for dispatching physical plan to warehouse compute node.

This is a part of Enhancements to Concurrency and Scheduler.

@BohuTANG
Copy link
Member Author

What exactly is the Support Python Worksheet? Does it enable running Python in Databend?

The goal is to make the Hugging Face Model + Python + GPU( or CPU) + Data in Databend is possible.

@djouallah
Copy link

all I want is to be able to read a delta table from a local path :)

@JasonLi-cn
Copy link
Contributor

How to understand Inference ?Which abilities does it refer to?

@BohuTANG
Copy link
Member Author

How to understand Inference ?Which abilities does it refer to?

Move the models(huggingface models) to the database, the database can load and run them.

@keltia
Copy link

keltia commented Jan 27, 2024

Thanks for making this available to everyone. I'm currently interested to anything that can deal with Geospatial stuff so adding Geometry support is very nice.

@BohuTANG
Copy link
Member Author

Thanks for making this available to everyone. I'm currently interested to anything that can deal with Geospatial stuff so adding Geometry support is very nice.

Working on it: #14470

@inviscid
Copy link

Thanks for making this available to everyone. I'm currently interested to anything that can deal with Geospatial stuff so adding Geometry support is very nice.

@keltia Databend uses H3 for geospatial operations. Is that what you are referring to? https://docs.databend.com/sql/sql-functions/geo-functions/

@inviscid
Copy link

inviscid commented Jul 9, 2024

Would it be possible to restructure the code a little to help with creating a fully OSS compliant version of Databend?

I know the license info specifically calls out the ee directories but we have found some other files that are also covered by the Elastic license that seem to be part of the core functionality.

  • /src/meta/binaries/meta/ee-main.rs
  • /src/binaries/query/ee-main.rs

Many companies will not adopt a technology that doesn't use an approved OSS license, which Apache 2.0 is, but Elastic is not (open-source vs. source-available).

In an ideal world there would be a databend-oss repo and a databend-ee repo with the latter adding all of the Enterprise features and licensing on top of the OSS version.

I know the goal is to get enterprise customers to buy a license but I think you may be losing out on more community support and growth because devs restricted by corporate governance policies will never be able to even give Databend a test drive. Plus, like many grassroots database projects within a company, things start small and cheap then turn into mission critical systems requiring licensing. If we skip the small and cheap phase, the licensing never comes in those cases.

@johnpyp
Copy link

johnpyp commented Aug 11, 2024

Agree, the mixing and matching of licenses both in docs conceptually and in the literal code makes it really hard to be able to use this database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants