Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Average with linq on big collection #27451

Closed
Petermarcu opened this issue Sep 22, 2018 · 2 comments
Closed

Average with linq on big collection #27451

Petermarcu opened this issue Sep 22, 2018 · 2 comments
Milestone

Comments

@Petermarcu
Copy link
Member

@bdhamelicodra commented on Thu Sep 20 2018

On a big array, the method "System.Linq.Enumerable.Average(this IEnumerable source)" (in .Net Core 2.1) throw the exception "OverflowException: Arithmetic operation resulted in an overflow.".
Why not use an algorithm that can prevent this, like :

double Average(this IEnumerable<long> source) {
double avg = 0;
int t = 1;
foreach (long x in source) {
avg += (((double)x) - avg) / ((double)t);
t += 1;
}
return avg;
}
```​

---

@MarcoRossignoli commented on [Thu Sep 20 2018](https://github.com/dotnet/core/issues/1951#issuecomment-423157601)

better move to https://github.com/dotnet/corefx/issues where https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Average.cs#L77 lives

@Clockwork-Muse
Copy link
Contributor

Clockwork-Muse commented Sep 22, 2018

Hmmm....
This post suggests there are cumulative errors accrued when using this method. However, I don't know what they are, or whether the error would be acceptable.

What might be of larger concern is that, after a certain point, the precision of floating point itself is going to interfere. For instance:

double mantissa_limit = 9007199254740992L;
// Prints out 9007199254740992
Console.WriteLine($"{mantissa_limit + 1.0:R}");

...So weird stuff starts to happen.
Imagining a sequence of numbers, where the first is the mantissa limit, and the rest being 1, up to whatever limit you wish:

long mantissa_limit = 9007199254740992L;
double moving_average = mantissa_limit;
long sum = mantissa_limit;
long limit = 500000;
for (long count = 1; count < limit; count++)
{
    moving_average += (((double)1.0) - moving_average) / (double)count;
    sum += 1;

}
// Prints "1"
Console.WriteLine($"{moving_average:R}");
// Prints "18014398510.481983"
Console.WriteLine($"{sum / (double)limit:R}");

...which is a big difference!

@stephentoub
Copy link
Member

The question appears to have been answered. Closing. Thanks.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants