Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use contains method from Range #280

Merged
merged 1 commit into from
Aug 8, 2020

Conversation

EnverOsmanov
Copy link
Collaborator

Related issues: #227, #278
__

The problem:

contains method in Range has no override modifier. So after upcast Range to Seq[Int], usage of contains cost us O(n) (or even O(n) * n in total, since we call contains for each row). That means the bigger file is, the more rows file has the longer spark-excel will read the file.

I found out this was fixed in Scala 2.13.3.

@nightscape
Copy link
Owner

Oh wow, great find!
I deliberately put the Seq[Int] there so that we could later allow selecting arbitrary rows, but that doesn't justify making it unusable for bigger files.
One last thing we could try is to introduce a subclass of Range that applies the same method override as in the Scala 2.13.3 PR.
Would you mind giving that a shot?

@nightscape
Copy link
Owner

Never mind, Range is sealed...

@nightscape nightscape merged commit ee5af5d into nightscape:master Aug 8, 2020
@EnverOsmanov EnverOsmanov deleted the fix/big-slow-files branch November 2, 2021 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants