Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn against quadratic list summation #5073

Closed
hauntsaninja opened this issue Jun 14, 2023 · 7 comments · Fixed by #6489
Closed

Warn against quadratic list summation #5073

hauntsaninja opened this issue Jun 14, 2023 · 7 comments · Fixed by #6489
Assignees
Labels
accepted Ready for implementation rule Implementing or modifying a lint rule

Comments

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Jun 14, 2023

I can't find a linter that warns me about sum(list_of_lists, []). Basically any time the default arg of sum is a list literal, I think something slow could happen. Maybe ruff could warn me about this? :-)

@konstin
Copy link
Member

konstin commented Jun 14, 2023

Is the correct replacement for this list(itertools.chain(*list_of_lists)) or is there a better solution? I've been following https://stackoverflow.com/a/953097/3549270 so far

@charliermarsh charliermarsh added the rule Implementing or modifying a lint rule label Jun 14, 2023
@hauntsaninja
Copy link
Contributor Author

Yup, that'll work!

I don't have strong opinions on the replacement, they all seem to be a constant factor off.

  • list(itertools.chain.from_iterable(list_of_lists)) looks to be a shade faster than list(itertools.chain(*list_of_lists)) on my machine
  • Looking at your StackOverflow link I see mention of functools.reduce(operator.iadd, list_of_lists, []), which is fastest for me
  • The most straightforward is probably just [x for lst in list_of_lists for x in lst]
λ python3.11 -m timeit -n 10000 -r 20 -s 'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;import itertools,functools,operator' '[x for y in l for x in y]'
10000 loops, best of 20: 25.5 usec per loop
λ python3.11 -m timeit -n 10000 -r 20 -s 'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;import itertools,functools,operator' 'list(itertools.chain(*l))'
10000 loops, best of 20: 18.5 usec per loop
λ python3.11 -m timeit -n 10000 -r 20 -s 'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;import itertools,functools,operator' 'list(itertools.chain.from_iterable(l))'
10000 loops, best of 20: 17.5 usec per loop
λ python3.11 -m timeit -n 10000 -r 20 -s 'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;import itertools,functools,operator' 'functools.reduce(operator.iadd, l, [])'
10000 loops, best of 20: 11.7 usec per loop

Repro using the nice perfplot code from your StackOverflow link:

out

@charliermarsh charliermarsh added the needs-decision Awaiting a decision from a maintainer label Jul 10, 2023
@hauntsaninja
Copy link
Contributor Author

Saw a "needs decision" label got added. I'd vote the best replacement is functools.reduce(operator.iadd, l, []). If we don't want to add imports, then [x for y in l for x in y].

@charliermarsh
Copy link
Member

I'm cool with adding this as long as the ecosystem checks look okay (that is: no major false positives).

@charliermarsh charliermarsh added accepted Ready for implementation and removed needs-decision Awaiting a decision from a maintainer labels Jul 25, 2023
@evanrittenhouse
Copy link
Contributor

You can assign this to me. I should have a fix out soon.

@arogozhnikov
Copy link

arogozhnikov commented Feb 25, 2024

funny enough, if I use sum(l, []), it is slow

But when I reimplement the same code in a very naive way:

def sum_lists(x):
    res = []
    for i in x:
        res += i
    return res

it is on par with proposed solutions (or even faster)

@hauntsaninja
Copy link
Contributor Author

Yes, that's the magic of iadd instead of add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants