Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFK with transform for tabular data #616

Merged
merged 47 commits into from
Jun 19, 2019

Conversation

ryandawsonuk
Copy link
Contributor

@ryandawsonuk ryandawsonuk commented Jun 4, 2019

Alternative to #610

In this version the engine can do a transform to put tabular data into key-value pairs for searching. We can then do e.g. value-range searches:

image

It checks whether there's a 'data.names' array before transforming but need to do more checks to make sure this won't break other types of requests (e.g. what can we assume about the data value type and array shape). Also need to consider performance.

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented Jun 5, 2019

For images we may be better to store the image on disk or in a bucket and keep a searchable reference to it. I'd like to investigate computing metadata about the image before storing it so that you could search by metadata (e.g. presence of certain colours) - see https://stackoverflow.com/questions/30440224/possible-to-store-images-in-elasticsearch

We should discuss whether to do this in the engine or in a wrapper library or even start with just an example of how it can be done in the end-user code

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented Jun 5, 2019

Next steps:

  • use thread to make async
  • consider more examples - e.g. config option for column-primary, data.names as comma-sep in env var, size cap on transform/enrichment, percentage sampling
  • revisit whether to split out batches or provide option to
  • look at binary, tensor and str cases
  • Look at how parameterised - what env vars and what defaults and how to apply (think example helm charts)
  • If single engine for graph is that ok that we won’t know which component? Will we know anyway? Do we need to? What about MABs?
  • run on Cloud
  • look at custom metadata
  • document how it works (mention [filter_kubernetes] enhancement: provide mechanism to exclude containers from fluent bit via annotations fluent/fluent-bit#737 )
  • consider KF-serving where engine not present - could it use same logic in python as part of its base image/s? Or should we publish a py lib?

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented Jun 6, 2019

New idea - look at performing the enrichment in a separate batch job that updates the entries in elasticsearch or an elasticsearch transform script - preferably we'd do something python-based to simplify working with the arrays. In that case the transformation/enrichment part in the engine for this PR would be taken out.

@ryandawsonuk ryandawsonuk changed the title WIP: 545 EFK with in-engine transform for tabular data WIP: 545 EFK with transform for tabular data Jun 8, 2019
@seldondev seldondev added size/XL and removed size/L labels Jun 8, 2019
@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented Jun 8, 2019

Related to SeldonIO/seldon-operator#13 but not a dependency

@seldondev seldondev added size/L and removed size/XL labels Jun 10, 2019
@seldondev seldondev added size/XL and removed size/L labels Jun 10, 2019
@ryandawsonuk ryandawsonuk changed the title WIP: 545 EFK with transform for tabular data EFK with transform for tabular data Jun 18, 2019
@ryandawsonuk
Copy link
Contributor Author

Depends on SeldonIO/seldon-operator#17

@ryandawsonuk ryandawsonuk merged commit aacb697 into SeldonIO:master Jun 19, 2019
agrski pushed a commit that referenced this pull request Dec 2, 2022
* add UnloadEnvoyRequested state

* transition to UnloadEnvoyRequested

* add UnloadEnvoyRequested in model stats

* add UnloadEnvoyRequested in UnloadingOrUnloaded

* make removeRouteForServerInEnvoy not sync

* update func name to reflect how envoy is called

* modelUpdate changes to proceed with batched rm

* add TODO for pipeline envoy updates

* remove unused field in ModelVersion

* move field to atomic.bool

* fix condition check

* add tests

* fix unload test

* lint

* fix lint issues

* add extra test

* docs update

* add new replicastate in protos

* update generated protos

* add extra test in agent-server sync
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants