-
Notifications
You must be signed in to change notification settings - Fork 73
Flush profiles to disk per row groups #486
Flush profiles to disk per row groups #486
Conversation
349a4bf
to
74a34de
Compare
With the latest changes I am also writing out the SeriesFingerprint to the parquet file. I need to do this to later renumber the SeriesIndexes. I think this also could help recover from an unclean shutdown. I think this is ready for taking a first look @cyriltovena |
5788815
to
37a6335
Compare
I think for the read path we should narrow down column to what we need per type of request instead of exposing the whole profile object, the will help leverage parquet goodness by only pulling requested columns. Request type are basically based on the querier interface type Querier interface {
...
SelectMatchingProfiles(ctx context.Context, params *ingestv1.SelectProfilesRequest) (iter.Iterator[Profile], error)
MergeByStacktraces(ctx context.Context, rows iter.Iterator[Profile]) (*ingestv1.MergeProfilesStacktracesResult, error)
MergeByLabels(ctx context.Context, rows iter.Iterator[Profile], by ...string) ([]*typesv1.Series, error)
MergePprof(ctx context.Context, rows iter.Iterator[Profile]) (*profile.Profile, error)
...
} |
@cyriltovena I would like to get another round of reviews, this afternoon I felt brave enough to deploy briefly to dev, but that resulted in all queries timing out, I will do look a bit more into this tomorrow |
Looking good. |
* Implement getters, to support loading from disk from the profile index * Import store implementation and other relevant parts from long running branch
This improves on the previous implementation by the raised concerns: Rows are read in rowGroups and the sorting is done, just before rows are cut
Was used in the interim
This helped to discover a couple of bugs that would have gone unnoticed otherwise.
One for the data fully in memory and one for each row group on disk.
It was actually the string builder not being reset
Also adds a test with concurrent ingestion/querying, to cover the most basic cases.
2c1a6e1
to
555c2f0
Compare
This ensure the readLock only needs to be hold as short as necssary.
* Ingest same profile series concurrenctly * Test Flush as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think we resolve symbols multiple times for the same block, but that can be improved later.
I don't think it matters *much as long all of them are in memory |
…profiles_row_groups_to_disk Flush profiles to disk per row groups
This PR is flushes out profiles to a temporary on disk profile parquet file once a row group has been filled.
Further more it avoids using parquet.Buffer for sorting and rather uses a binary insert into the memory slice.