OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? #8566
Labels
enhancement
Enhancement or improvement to existing feature or request
Storage
Issues and PRs relating to data and metadata storage
Is your feature request related to a problem? Please describe.
I am contemplating partnering with some folks to deliver rich support for Lucene on Spark (EMR, Databricks, etc...) as a cost-effective alternative to needing a separate OpenSearch/elastic cluster for enabling fast search against large quantities of log data. This ideally would include indexing (on
dataframe.write
), retrieval (w/partition filter-based pre-scan searching ), and eventually ACID transactions and optimization (compact shards?) supported by delta-io log protocol. Extending the applicability of a solution like OpenSearch ultra warm to this use case could be an exciting alternative to starting from scratch with something like plain Lucene.I think a solution like this would have significant applicability in the security domain as well as in application observability and support.
I'm curious to understand if anything like this has been contemplated by the community to date, and if any existing art/POC work exists that serve to catalyze the effort.
Describe the solution you'd like
I would appreciate community feedback as to whether there has been existing research/work that could be leveraged in this effort, or if this is truly novel.
Describe alternatives you've considered
Plain Lucene file providers...
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: