Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache resolved Fix file path to improve include() performance. #327

Merged
merged 2 commits into from
Oct 30, 2023

Conversation

blackwinter
Copy link
Member

@blackwinter blackwinter commented Oct 20, 2023

See #287.

Improves the performance of include() instructions by avoiding costly operations for every record (attempting to construct a URL, hitting the file system). Instead, caches the resolved path to the included Fix file upon first encounter.

Benchmark results based on simplistic "integration tests"¹ [runtime in seconds: lower is better; N=4]:

Benchmark 4ddfcac 4fb8426 Boost
Base1 3.2375 ± 0.0193 3.3080 ± 0.0558 +2.18 %
Base1000 3.5835 ± 0.1132 3.4018 ± 0.0212 -5.07 %
Base100000 5.4308 ± 0.1477 5.4032 ± 0.0494 -0.51 %
Some1 3.6988 ± 0.0185 3.7932 ± 0.1135 +2.55 %
Some1000 3.9415 ± 0.2000 3.8172 ± 0.0428 -3.15 %
Some100000 7.8708 ± 0.1882 6.1055 ± 0.1947 -22.43 %
Many1 5.1312 ± 0.0764 5.2958 ± 0.2262 +3.21 %
Many1000 5.5572 ± 0.2899 5.3335 ± 0.3680 -4.03 %
Many100000 17.6438 ± 0.0740 7.7365 ± 0.1504 -56.15 %

Still has a noticeable overhead, though, so there's room for further improvement [runtime in seconds: lower is better; N=4]:

Commit Base100000 Many100000 Boost
4ddfcac 5.4308 ± 0.1477 17.6438 ± 0.0740 +224.88%
4fb8426 5.4032 ± 0.0494 7.7365 ± 0.1504 +43.18%

¹ NOTE: I intend to remove the "integration tests" after the pull request has been merged; in case we should ever need them again, we can easily restore them.

However, it doesn't seem to affect real-world workloads as much (Limetrans transformation: approx. 2m records; the second workflow utilizes roughly twice as many include()s as the first one) [runtime in hours/minutes/seconds: lower is better; N=1]:

Commit 0.6.1 4fb8426 Boost
hbz/limetrans@dcf1753 4h13m37s 4h15m47s +0.85 %
hbz/limetrans@c734b3a 3h39m42s 3h14m22s -11.53 %

@TobiasNx / @dr0i: You might want to evaluate the impact on lobid-resources as well.

- includeBenchmarkBase1:       3.2375 +/- 0.0193
- includeBenchmarkBase1000:    3.5835 +/- 0.1132
- includeBenchmarkBase100000:  5.4308 +/- 0.1477
- includeBenchmarkSome1:       3.6988 +/- 0.0185
- includeBenchmarkSome1000:    3.9415 +/- 0.2000
- includeBenchmarkSome100000:  7.8708 +/- 0.1882
- includeBenchmarkMany1:       5.1312 +/- 0.0764
- includeBenchmarkMany1000:    5.5572 +/- 0.2899
- includeBenchmarkMany100000: 17.6438 +/- 0.0740
- includeBenchmarkBase1:      3.3080 +/- 0.0558
- includeBenchmarkBase1000:   3.4018 +/- 0.0212
- includeBenchmarkBase100000: 5.4032 +/- 0.0494
- includeBenchmarkSome1:      3.7932 +/- 0.1135
- includeBenchmarkSome1000:   3.8172 +/- 0.0428
- includeBenchmarkSome100000: 6.1055 +/- 0.1947
- includeBenchmarkMany1:      5.2958 +/- 0.2262
- includeBenchmarkMany1000:   5.3335 +/- 0.3680
- includeBenchmarkMany100000: 7.7365 +/- 0.1504
@dr0i
Copy link
Member

dr0i commented Oct 26, 2023

Coming Alma metadata update ETL in lobid-resources will use this branch. Will check tomorow.

@dr0i dr0i merged commit 6cf83ae into master Oct 30, 2023
1 check passed
@dr0i
Copy link
Member

dr0i commented Oct 30, 2023

Tested on basedump: before this PR average (since a month) was 14h 12m. Now it's 13h 45m. So a bit faster 👍 .

@dr0i dr0i deleted the 287-includePerformance branch October 30, 2023 11:06
dr0i added a commit to hbz/lobid-resources that referenced this pull request Oct 30, 2023
dr0i added a commit to hbz/lobid-resources that referenced this pull request Oct 30, 2023
dr0i added a commit to hbz/lobid-resources that referenced this pull request Oct 30, 2023
dr0i added a commit to hbz/lobid-resources that referenced this pull request Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants