-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Infrastructure Monitoring] Better data generation #119491
Comments
Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI) |
Next steps (given sync with @miltonhultgren ):
|
Thinking that metrics UI could be a good next target given the efforts around alerting on high cardinality group-bys - cc @Zacqary |
One more thought from syncing with @matschaffer is that one good POC would be to rewrite one of the Stack Monitoring E2E/integration tests using synthtrace generated data. |
Just a ping on more problems that could benefit from data generation tooling #119658 |
Dropping this link here to Mat's notes https://docs.google.com/document/d/1lImDQTih61ufW3gDuY1FAYLUpZTJjj58sVbC993PSGA/edit#heading=h.m95jascdig79 |
I got into this topic after struggling with writing tests for our API based on "missing" test data. As I wrote to Jason:
Additionally es_archiver currently doesn't support data streams although this will likely get fixed in time #69061. While there have been many attempts to solve part of this problem we haven't really landed on something that we can make a road map around. I would like to have a tool to easily generate data with different mappings, and to be able to use that in test instead of relying on es_archiver would be nice. What the current tools seem to have in common is the idea of defining a time range of where "events" should happen, with some frequency and possibility for spikes (by having overlapping time ranges with different event frequencies) and then some layer that turns these "events" into Elasticsearch documents. One thing that most tools miss though is a connection to the underlying mappings of those documents. Synthtrace for example loads the index + mappings with es_archiver before inserting their documents. My hope is that by defining our problems more we can put up some goals to reach and the Synthtrace route seems promising so far, given also that in the future we'll likely want to use APM data in our own tests as well. What could the next steps be? What have we missed so far? @elastic/infra-monitoring-ui |
Can we do an hour sync to present an overall set of findings here, and try to jumpstart this effort in a good direction? I want us to invest in this, but like you all are saying (I think), we need clear goals and to choose the ones that will have the highest ROI for us. |
I wouldn't mind showing off the stack monitoring simulation stuff so far. It's basic but it looks promising. |
@jasonrhodes Let's book something! My vote is for focusing on the issue of "data generation from mapping", since much of the work in Stack Monitoring would benefit from being able to take 1 of the 3 different mappings we have (which are moving towards a single mapping) and generate data from that, run tests and check that things work. |
Please book an hour for after the new year, it can be during my "meeting block" / "focus time" if that works but likely it'll be before that anyway in order to work with other calendars. @matschaffer maybe if you can weigh in on some good times to target and work with @miltonhultgren to get an hour on the calendar? Thanks, all! |
Closing this for now. If we put effort into improving data generation while creating or updating tests perhaps we can evolve to the right solution. |
Epic for organizing work on how to generate data for development and testing. We will flesh this out over time.
The text was updated successfully, but these errors were encountered: