OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? #8566

schenksj · 2023-07-10T12:51:07Z

Is your feature request related to a problem? Please describe.

I am contemplating partnering with some folks to deliver rich support for Lucene on Spark (EMR, Databricks, etc...) as a cost-effective alternative to needing a separate OpenSearch/elastic cluster for enabling fast search against large quantities of log data. This ideally would include indexing (on dataframe.write), retrieval (w/partition filter-based pre-scan searching ), and eventually ACID transactions and optimization (compact shards?) supported by delta-io log protocol. Extending the applicability of a solution like OpenSearch ultra warm to this use case could be an exciting alternative to starting from scratch with something like plain Lucene.

I think a solution like this would have significant applicability in the security domain as well as in application observability and support.

I'm curious to understand if anything like this has been contemplated by the community to date, and if any existing art/POC work exists that serve to catalyze the effort.

Describe the solution you'd like

I would appreciate community feedback as to whether there has been existing research/work that could be leveraged in this effort, or if this is truly novel.

Describe alternatives you've considered

Plain Lucene file providers...

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

penghuo · 2023-07-11T18:57:39Z

We have similar idea and did a PoC. Here is the [RFC] OpenSearch Storage Format. [RFC] OpenSearch Data Format #8639
There is another ongoing project which enable OpenSearch and Spark integration. more reading in [RFC] OpenSearch and Apache Spark Integration sql#1875

MaxKsyunz · 2023-07-13T22:53:59Z

@penghuo I can't access the opensearch-spark repo. Is it private?

schenksj · 2023-07-17T17:52:06Z

@penghuo I can't access the opensearch-spark repo. Is it private?

@penghuo i have the same issue! is there someone we can reach out to on this?

penghuo · 2023-07-17T18:18:29Z

Fix the link opensearch-project/sql#1875.
opensearch-spark is private repo now, we tansfer the issue from SQL repo to opensearch-reop by accident.

schenksj added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 10, 2023

schenksj changed the title ~~OpenSearch on Spark - has this been contemplated?~~ OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? Jul 10, 2023

Xtansia added the Storage Issues and PRs relating to data and metadata storage label Aug 13, 2023

rramachand21 removed the untriaged label Nov 6, 2023

rramachand21 self-assigned this Nov 6, 2023

Bukhtawar added this to Storage Project Board Feb 15, 2024

github-project-automation bot moved this to 🆕 New in Storage Project Board Feb 15, 2024

getsaurabh02 added this to OpenSearch Roadmap May 31, 2024

github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? #8566

OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? #8566

schenksj commented Jul 10, 2023 •

edited

Loading

penghuo commented Jul 11, 2023 •

edited

Loading

MaxKsyunz commented Jul 13, 2023

schenksj commented Jul 17, 2023

penghuo commented Jul 17, 2023

OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? #8566

OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? #8566

Comments

schenksj commented Jul 10, 2023 • edited Loading

penghuo commented Jul 11, 2023 • edited Loading

MaxKsyunz commented Jul 13, 2023

schenksj commented Jul 17, 2023

penghuo commented Jul 17, 2023

schenksj commented Jul 10, 2023 •

edited

Loading

penghuo commented Jul 11, 2023 •

edited

Loading