Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? #8566

Open
schenksj opened this issue Jul 10, 2023 · 4 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Storage Issues and PRs relating to data and metadata storage

Comments

@schenksj
Copy link

schenksj commented Jul 10, 2023

Is your feature request related to a problem? Please describe.

I am contemplating partnering with some folks to deliver rich support for Lucene on Spark (EMR, Databricks, etc...) as a cost-effective alternative to needing a separate OpenSearch/elastic cluster for enabling fast search against large quantities of log data. This ideally would include indexing (on dataframe.write), retrieval (w/partition filter-based pre-scan searching ), and eventually ACID transactions and optimization (compact shards?) supported by delta-io log protocol. Extending the applicability of a solution like OpenSearch ultra warm to this use case could be an exciting alternative to starting from scratch with something like plain Lucene.

I think a solution like this would have significant applicability in the security domain as well as in application observability and support.

I'm curious to understand if anything like this has been contemplated by the community to date, and if any existing art/POC work exists that serve to catalyze the effort.

Describe the solution you'd like

I would appreciate community feedback as to whether there has been existing research/work that could be leveraged in this effort, or if this is truly novel.

Describe alternatives you've considered

Plain Lucene file providers...

Additional context
Add any other context or screenshots about the feature request here.

@schenksj schenksj added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 10, 2023
@schenksj schenksj changed the title OpenSearch on Spark - has this been contemplated? OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? Jul 10, 2023
@penghuo
Copy link
Contributor

penghuo commented Jul 11, 2023

@MaxKsyunz
Copy link

@penghuo I can't access the opensearch-spark repo. Is it private?

@schenksj
Copy link
Author

@penghuo I can't access the opensearch-spark repo. Is it private?

@penghuo i have the same issue! is there someone we can reach out to on this?

@penghuo
Copy link
Contributor

penghuo commented Jul 17, 2023

Fix the link opensearch-project/sql#1875.
opensearch-spark is private repo now, we tansfer the issue from SQL repo to opensearch-reop by accident.

@Xtansia Xtansia added the Storage Issues and PRs relating to data and metadata storage label Aug 13, 2023
@rramachand21 rramachand21 self-assigned this Nov 6, 2023
@github-project-automation github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Storage Issues and PRs relating to data and metadata storage
Projects
Status: New
Status: 🆕 New
Development

No branches or pull requests

5 participants