Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRestAnalysisTree fixing observable estimators #409

Merged
merged 3 commits into from
May 8, 2023
Merged

Conversation

jgalan
Copy link
Member

@jgalan jgalan commented May 4, 2023

jgalan Medium: 115

TRestAnalysisTree::GetObservableAverage. Changing the way average is calculated

Other similar methods GetObservableMax/Min/RMS have been updated too

@jgalan jgalan marked this pull request as ready for review May 4, 2023 09:18
@jgalan jgalan requested review from lobis and juanangp May 4, 2023 09:18
@jgalan
Copy link
Member Author

jgalan commented May 4, 2023

This fixes the pipeline, still I don't understand what is wrong with the previous implementation.

I don't like to iterate manually on the entries inside the analysis tree. Specially because TRestAnalysisTree might be pointing to a given entry that is the same as the entry in TRestRun event tree.

This might be an issue if we call GetObservableAverage in between requesting an entry using TRestRun::GetEntry and the access to the information.

@juanangp
Copy link
Member

juanangp commented May 4, 2023

This fixes the pipeline, still I don't understand what is wrong with the previous implementation.

I don't like to iterate manually on the entries inside the analysis tree. Specially because TRestAnalysisTree might be pointing to a given entry that is the same as the entry in TRestRun event tree.

This might be an issue if we call GetObservableAverage in between requesting an entry using TRestRun::GetEntry and the access to the information.

You can try with RDataFrame

RDataFrame df (*myAnalysisTree);
double mean = *df.Mean(obsName);

@jgalan
Copy link
Member Author

jgalan commented May 4, 2023

This fixes the pipeline, still I don't understand what is wrong with the previous implementation.
I don't like to iterate manually on the entries inside the analysis tree. Specially because TRestAnalysisTree might be pointing to a given entry that is the same as the entry in TRestRun event tree.
This might be an issue if we call GetObservableAverage in between requesting an entry using TRestRun::GetEntry and the access to the information.

You can try with RDataFrame

RDataFrame df (*myAnalysisTree);
double mean = *df.Mean(obsName);

Using RDataFrame is much slower.

Processing Validate.C...
Entries: h1: 1984, h2: 7432, h3: 9430
====== Theta average ====
Elapsed time in nanoseconds: 1026357322 ns
Elapsed time in microseconds: 1026357 µs
Elapsed time in milliseconds: 1026 ms
====== Phi average ====
Elapsed time in nanoseconds: 144700115 ns
Elapsed time in microseconds: 144700 µs
Elapsed time in milliseconds: 144 ms

Using the direct tree access:

Processing Validate.C...
Entries: h1: 1984, h2: 7432, h3: 9430
====== Theta average ====
Elapsed time in nanoseconds: 618828 ns
Elapsed time in microseconds: 618 µs
Elapsed time in milliseconds: 0 ms
====== Phi average ====
Elapsed time in nanoseconds: 613547 ns
Elapsed time in microseconds: 613 µs
Elapsed time in milliseconds: 0 ms

I thought that would happen only the first time I called to the df.Mean method, since the first time it would probably require to initialise the data frame. But the chronometer for the second round just gives around 150ms for both!

So the DataFrame method is bullshit?!?!

@juanangp
Copy link
Member

juanangp commented May 4, 2023

I thought that would happen only the first time I called to the df.Mean method, since the first time it would probably require to initialise the data frame. But the chronometer for the second round just gives around 150ms for both!

So the DataFrame method is bullshit?!?!

It should happen everytime you call the method because you generate a dataset.
I am not surprised of the time delay, it takes a long time to generate a dataset, I would not say it is bullshit, it is just not designed for this user case.

@jgalan
Copy link
Member Author

jgalan commented May 5, 2023

I thought that would happen only the first time I called to the df.Mean method, since the first time it would probably require to initialise the data frame. But the chronometer for the second round just gives around 150ms for both!
So the DataFrame method is bullshit?!?!

It should happen everytime you call the method because you generate a dataset. I am not surprised of the time delay, it takes a long time to generate a dataset, I would not say it is bullshit, it is just not designed for this user case.

Then, what it is the proper fast way to calculate the mean using the RDataFrame?

@jgalan
Copy link
Member Author

jgalan commented May 5, 2023

Could you approve this PR so that we fix pipelines?

@juanangp
Copy link
Member

juanangp commented May 5, 2023

Then, what it is the proper fast way to calculate the mean using the RDataFrame?

There is not fast way, it is what it is, I guess is not using multithreading but anyhow I don't think it will improve since it is a lazy method

@jgalan jgalan requested a review from nkx111 May 5, 2023 13:15
@jgalan
Copy link
Member Author

jgalan commented May 5, 2023

Then, what it is the proper fast way to calculate the mean using the RDataFrame?

There is not fast way, it is what it is, I guess is not using multithreading but anyhow I don't think it will improve since it is a lazy method

Ok, so that means the Mean method from RDataFrame should not be used. Even in multithreading we would need 200 threads to equal the direct tree access.

@jgalan jgalan merged commit d81f90e into master May 8, 2023
@jgalan jgalan deleted the jgalan_fix_restG4 branch May 8, 2023 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants