Use contains method from Range #280

EnverOsmanov · 2020-08-08T10:44:48Z

Related issues: #227, #278
__

The problem:

contains method in Range has no override modifier. So after upcast Range to Seq[Int], usage of contains cost us O(n) (or even O(n) * n in total, since we call contains for each row). That means the bigger file is, the more rows file has the longer spark-excel will read the file.

I found out this was fixed in Scala 2.13.3.

nightscape · 2020-08-08T14:43:35Z

Oh wow, great find!
I deliberately put the Seq[Int] there so that we could later allow selecting arbitrary rows, but that doesn't justify making it unusable for bigger files.
One last thing we could try is to introduce a subclass of Range that applies the same method override as in the Scala 2.13.3 PR.
Would you mind giving that a shot?

nightscape · 2020-08-08T14:52:20Z

Never mind, Range is sealed...

Use contains method from Range

add9f46

nightscape merged commit ee5af5d into nightscape:master Aug 8, 2020

EnverOsmanov deleted the fix/big-slow-files branch November 2, 2021 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use contains method from Range #280

Use contains method from Range #280

EnverOsmanov commented Aug 8, 2020

nightscape commented Aug 8, 2020

nightscape commented Aug 8, 2020

Use contains method from Range #280

Use contains method from Range #280

Conversation

EnverOsmanov commented Aug 8, 2020

The problem:

nightscape commented Aug 8, 2020

nightscape commented Aug 8, 2020