Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #772, Incorrect zeros included in emsd average when particles are dropped #773

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vivarose
Copy link
Contributor

I've tested this code with imsd() and emsd() on test data and it works, and on actual movie data and it is more accurate but it doesn't make as big a difference as I feared it would. So Issue #772 isn't so bad unless there are a lot of gaps in the data.

Pros of accepting my pull-request: Slightly more accurate emsd data! If there are a lot of gaps, then it's MUCH MUCH more accurate.

Cons: It takes slightly more time to run. With timit, I see 30.6 ms ± 1.55 ms per loop before my edits versus 32.1 ms ± 1.43 ms per loop after my edits.

Issue #772 occurs when there are gaps in the data, so the problem shows up in _msd_gaps().

Here are the edits:
I updated _msd_gaps() with "skipna=False" on the sum() function. This way values based on a lack of data will show up as NaN (instead of showing up as zero).

I also added some code to reset the estimated number of datapoints N to be 0 if there is no data.
result['N'] = np.where(result['msd'].isna(), 0, result['N'])

I was hoping this would be enough but it created another problem that I had to take care of: When the emsd is calculated, there's a problem because we are averaging in a NaN (albeit weighted as N=0), and it turns out NaN * 0 = NaN. I wanted it to come out to zero. So I had to reset the NaN to zero again. Awkward, but it works. Then it's 0*0 = 0, which is the weight I wanted, and it calculates the emsd correctly.

# remove np.nan because it would make the rest of the calculation break
msds['msd'] = np.where(msds['msd'].isna(), 0, msds['msd'])

So this update does provide a more accurate emsd() calculation, and it removes the weird DROPS that show up in the imsd data (see Walkthrough output 31).

I plan to use this version going forward because I can't stand inaccuracies!

@vivarose
Copy link
Contributor Author

It looks like the NaN makes the tests unhappy.

I prefer the NaN because an MSD of 0 means that there was a particle that didn't move, but we are really dealing with a situation of a particle that wasn't observed. But if it makes the tests unhappy, then, what can I do?

@vivarose
Copy link
Contributor Author

I see that my attempt to appease the checks (#774) didn't work. That makes me think that my pull-request isn't the problem. Maybe there is something else going on.

I'm re-opening this pull-request. If you are trying to decide whether to incorporate one of these pull-requests, this is the one that leaves NaN in the imsd whereas #774 is the one that uses 0 instead. Both versions fix Issue #772.

@vivarose vivarose reopened this Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant