Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyGrinder block and sequence missing algorithms are not reaching the correct percentage of missing values #542

Open
giacomoguiduzzi opened this issue Nov 9, 2024 · 3 comments
Labels
question Further information is requested stale

Comments

@giacomoguiduzzi
Copy link

Issue description

Greetings,

I'm working on a project related to forecasting time series with Deep Learning methods. A quick question about sequence missing and block missing from PyGrinder: I noticed that when I set a replace_pct value of 0.5 I am not actually getting around 50% of missing values, but 39%. If I raise this value to 0.75 then I get around 50%. Is this normal? Am I missing something?
Let me know if there is any additional information I can give you regarding this behaviour.
Thanks in advance, I'm looking forward to your kind response.

Best Regards,
Giacomo Guiduzzi

@giacomoguiduzzi giacomoguiduzzi added the question Further information is requested label Nov 9, 2024
Copy link

This issue had no activity for 14 days. It will be closed in 1 week unless there is some new activity. Is this issue already resolved?

@github-actions github-actions bot added the stale label Nov 24, 2024
@LinglongQian
Copy link
Contributor

Dear Giacomo Guiduzzi,

Thank you for reaching out and sharing your observations about sequence-missing and block-missing behaviour in PyGrinder. The behaviour you’ve described could be due to an interaction between the existing missing data in your dataset and the additional missingness introduced.

If your dataset already contains missing values, the new missing values added will mix with the original ones. This blending effect could result in the observed actual missing rate being lower than the specified value. This issue is particularly noticeable when there are fewer completely observed sequences or blocks in the data to begin with.

Please let me know if this explanation aligns with your situation, or feel free to provide more details about your dataset or experimental setup, and I’d be happy to assist further.

Best regards,
linglong

@github-actions github-actions bot removed the stale label Nov 29, 2024
Copy link

This issue had no activity for 14 days. It will be closed in 1 week unless there is some new activity. Is this issue already resolved?

@github-actions github-actions bot added the stale label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

2 participants