Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf test v1.0.0 #62

Merged
merged 6 commits into from
Jan 1, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ Pkg.add("SASLib")

I did benchmarking mostly on my Macbook Pro laptop. In general, the Julia implementation is somewhere between 10-100x faster than the Python Pandas. Test results are documented in the `test/perf_results_<version>` folders.

Latest performance [test results for v1.0.0](test/perf_results_1.0.0) is as follows:

Test|Result|
----|------|
py\_jl\_homimp\_50.md |Julia is ~27.9x faster than Python/Pandas|
py\_jl\_numeric\_1000000\_2\_100.md |Julia is ~10.2x faster than Python/Pandas|
py\_jl\_productsales\_100.md |Julia is ~46.9x faster than Python/Pandas|
py\_jl\_test1\_100.md |Julia is ~118.8x faster than Python/Pandas|
py\_jl\_topical\_30.md |Julia is ~27.3x faster than Python/Pandas|

## User Guide

```
Expand Down
73 changes: 73 additions & 0 deletions test/perf_results_1.0.0/py_jl_homimp_50.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~27.9x faster than Python/Pandas

## Test File

Iterations: 50

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
homimp.sas7bdat|1.2 MB|46641|6|1|5

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_AHS2013/homimp.sas7bdat 50
Minimum: 0.5793 seconds
```

## Julia (ObjectPool)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

BenchmarkTools.Trial:
memory estimate: 20.20 MiB
allocs estimate: 494963
--------------
minimum time: 39.500 ms (0.00% GC)
median time: 44.556 ms (0.00% GC)
mean time: 44.054 ms (4.70% GC)
maximum time: 63.587 ms (7.46% GC)
--------------
samples: 50
evals/sample: 1
```

## Julia (Regular String Array)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

BenchmarkTools.Trial:
memory estimate: 18.02 MiB
allocs estimate: 428420
--------------
minimum time: 20.776 ms (0.00% GC)
median time: 25.170 ms (0.00% GC)
mean time: 29.005 ms (18.45% GC)
maximum time: 109.289 ms (73.77% GC)
--------------
samples: 50
evals/sample: 1
```
47 changes: 47 additions & 0 deletions test/perf_results_1.0.0/py_jl_numeric_1000000_2_100.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~10.2x faster than Python/Pandas

## Test File

Iterations: 100

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
numeric_1000000_2.sas7bdat|16.3 MB|1000000|2|2|0

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_misc/numeric_1000000_2.sas7bdat 100
Minimum: 1.8784 seconds
```

## Julia
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

BenchmarkTools.Trial:
memory estimate: 168.83 MiB
allocs estimate: 1004863
--------------
minimum time: 183.319 ms (6.02% GC)
median time: 208.804 ms (14.80% GC)
mean time: 235.003 ms (25.50% GC)
maximum time: 383.528 ms (54.19% GC)
--------------
samples: 22
evals/sample: 1
```
73 changes: 73 additions & 0 deletions test/perf_results_1.0.0/py_jl_productsales_100.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~46.9x faster than Python/Pandas

## Test File

Iterations: 100

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
productsales.sas7bdat|148.5 kB|1440|10|4|6

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_pandas/productsales.sas7bdat 100
Minimum: 0.0505 seconds
```

## Julia (ObjectPool)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

BenchmarkTools.Trial:
memory estimate: 1.17 MiB
allocs estimate: 14693
--------------
minimum time: 1.745 ms (0.00% GC)
median time: 2.431 ms (0.00% GC)
mean time: 2.679 ms (2.39% GC)
maximum time: 5.482 ms (60.67% GC)
--------------
samples: 100
evals/sample: 1
```

## Julia (Regular String Array)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

BenchmarkTools.Trial:
memory estimate: 1.15 MiB
allocs estimate: 14638
--------------
minimum time: 1.078 ms (0.00% GC)
median time: 3.277 ms (0.00% GC)
mean time: 6.618 ms (3.48% GC)
maximum time: 83.970 ms (0.00% GC)
--------------
samples: 100
evals/sample: 1
```
73 changes: 73 additions & 0 deletions test/perf_results_1.0.0/py_jl_test1_100.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~118.8x faster than Python/Pandas

## Test File

Iterations: 100

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
test1.sas7bdat|131.1 kB|10|100|73|27

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_pandas/test1.sas7bdat 100
Minimum: 0.1036 seconds
```

## Julia (ObjectPool)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

BenchmarkTools.Trial:
memory estimate: 1.00 MiB
allocs estimate: 7132
--------------
minimum time: 871.807 μs (0.00% GC)
median time: 1.254 ms (0.00% GC)
mean time: 1.470 ms (6.75% GC)
maximum time: 6.470 ms (78.01% GC)
--------------
samples: 100
evals/sample: 1
```

## Julia (Regular String Array)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

BenchmarkTools.Trial:
memory estimate: 990.86 KiB
allocs estimate: 6819
--------------
minimum time: 1.119 ms (0.00% GC)
median time: 2.666 ms (0.00% GC)
mean time: 9.009 ms (6.71% GC)
maximum time: 161.985 ms (0.00% GC)
--------------
samples: 100
evals/sample: 1
```
73 changes: 73 additions & 0 deletions test/perf_results_1.0.0/py_jl_topical_30.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~27.3x faster than Python/Pandas

## Test File

Iterations: 30

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
topical.sas7bdat|13.6 MB|84355|114|8|106

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_AHS2013/topical.sas7bdat 30
Minimum: 46.9720 seconds
```

## Julia (ObjectPool)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

BenchmarkTools.Trial:
memory estimate: 685.66 MiB
allocs estimate: 19193161
--------------
minimum time: 1.720 s (6.37% GC)
median time: 1.806 s (11.83% GC)
mean time: 1.796 s (10.69% GC)
maximum time: 1.863 s (13.57% GC)
--------------
samples: 3
evals/sample: 1
```

## Julia (Regular String Array)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

BenchmarkTools.Trial:
memory estimate: 648.04 MiB
allocs estimate: 19048983
--------------
minimum time: 1.994 s (46.01% GC)
median time: 2.559 s (51.16% GC)
mean time: 2.559 s (51.16% GC)
maximum time: 3.123 s (54.45% GC)
--------------
samples: 2
evals/sample: 1
```
Loading