Skip to content

Commit

Permalink
Perf test v1.0.0 (#62)
Browse files Browse the repository at this point in the history
* Include v1.0.0 perf test vs Pandas & ReadStat
* Updated README with perf test summary
  • Loading branch information
tk3369 committed Jan 2, 2020
1 parent 8a6a253 commit 7e9ea75
Show file tree
Hide file tree
Showing 9 changed files with 404 additions and 11 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ Pkg.add("SASLib")

I did benchmarking mostly on my Macbook Pro laptop. In general, the Julia implementation is somewhere between 10-100x faster than the Python Pandas. Test results are documented in the `test/perf_results_<version>` folders.

Latest performance [test results for v1.0.0](test/perf_results_1.0.0) is as follows:

Test|Result|
----|------|
py\_jl\_homimp\_50.md |Julia is ~27.9x faster than Python/Pandas|
py\_jl\_numeric\_1000000\_2\_100.md |Julia is ~10.2x faster than Python/Pandas|
py\_jl\_productsales\_100.md |Julia is ~46.9x faster than Python/Pandas|
py\_jl\_test1\_100.md |Julia is ~118.8x faster than Python/Pandas|
py\_jl\_topical\_30.md |Julia is ~27.3x faster than Python/Pandas|

## User Guide

```
Expand Down
73 changes: 73 additions & 0 deletions test/perf_results_1.0.0/py_jl_homimp_50.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~27.9x faster than Python/Pandas

## Test File

Iterations: 50

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
homimp.sas7bdat|1.2 MB|46641|6|1|5

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_AHS2013/homimp.sas7bdat 50
Minimum: 0.5793 seconds
```

## Julia (ObjectPool)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4
BenchmarkTools.Trial:
memory estimate: 20.20 MiB
allocs estimate: 494963
--------------
minimum time: 39.500 ms (0.00% GC)
median time: 44.556 ms (0.00% GC)
mean time: 44.054 ms (4.70% GC)
maximum time: 63.587 ms (7.46% GC)
--------------
samples: 50
evals/sample: 1
```

## Julia (Regular String Array)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4
BenchmarkTools.Trial:
memory estimate: 18.02 MiB
allocs estimate: 428420
--------------
minimum time: 20.776 ms (0.00% GC)
median time: 25.170 ms (0.00% GC)
mean time: 29.005 ms (18.45% GC)
maximum time: 109.289 ms (73.77% GC)
--------------
samples: 50
evals/sample: 1
```
47 changes: 47 additions & 0 deletions test/perf_results_1.0.0/py_jl_numeric_1000000_2_100.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~10.2x faster than Python/Pandas

## Test File

Iterations: 100

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
numeric_1000000_2.sas7bdat|16.3 MB|1000000|2|2|0

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_misc/numeric_1000000_2.sas7bdat 100
Minimum: 1.8784 seconds
```

## Julia
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4
BenchmarkTools.Trial:
memory estimate: 168.83 MiB
allocs estimate: 1004863
--------------
minimum time: 183.319 ms (6.02% GC)
median time: 208.804 ms (14.80% GC)
mean time: 235.003 ms (25.50% GC)
maximum time: 383.528 ms (54.19% GC)
--------------
samples: 22
evals/sample: 1
```
73 changes: 73 additions & 0 deletions test/perf_results_1.0.0/py_jl_productsales_100.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~46.9x faster than Python/Pandas

## Test File

Iterations: 100

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
productsales.sas7bdat|148.5 kB|1440|10|4|6

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_pandas/productsales.sas7bdat 100
Minimum: 0.0505 seconds
```

## Julia (ObjectPool)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4
BenchmarkTools.Trial:
memory estimate: 1.17 MiB
allocs estimate: 14693
--------------
minimum time: 1.745 ms (0.00% GC)
median time: 2.431 ms (0.00% GC)
mean time: 2.679 ms (2.39% GC)
maximum time: 5.482 ms (60.67% GC)
--------------
samples: 100
evals/sample: 1
```

## Julia (Regular String Array)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4
BenchmarkTools.Trial:
memory estimate: 1.15 MiB
allocs estimate: 14638
--------------
minimum time: 1.078 ms (0.00% GC)
median time: 3.277 ms (0.00% GC)
mean time: 6.618 ms (3.48% GC)
maximum time: 83.970 ms (0.00% GC)
--------------
samples: 100
evals/sample: 1
```
73 changes: 73 additions & 0 deletions test/perf_results_1.0.0/py_jl_test1_100.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~118.8x faster than Python/Pandas

## Test File

Iterations: 100

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
test1.sas7bdat|131.1 kB|10|100|73|27

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_pandas/test1.sas7bdat 100
Minimum: 0.1036 seconds
```

## Julia (ObjectPool)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4
BenchmarkTools.Trial:
memory estimate: 1.00 MiB
allocs estimate: 7132
--------------
minimum time: 871.807 μs (0.00% GC)
median time: 1.254 ms (0.00% GC)
mean time: 1.470 ms (6.75% GC)
maximum time: 6.470 ms (78.01% GC)
--------------
samples: 100
evals/sample: 1
```

## Julia (Regular String Array)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4
BenchmarkTools.Trial:
memory estimate: 990.86 KiB
allocs estimate: 6819
--------------
minimum time: 1.119 ms (0.00% GC)
median time: 2.666 ms (0.00% GC)
mean time: 9.009 ms (6.71% GC)
maximum time: 161.985 ms (0.00% GC)
--------------
samples: 100
evals/sample: 1
```
73 changes: 73 additions & 0 deletions test/perf_results_1.0.0/py_jl_topical_30.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Julia/Python Performance Test Result

## Summary

Julia is ~27.3x faster than Python/Pandas

## Test File

Iterations: 30

Filename|Size|Rows|Columns|Numeric Columns|String Columns
--------|----|----|-------|---------------|--------------
topical.sas7bdat|13.6 MB|84355|114|8|106

## Python
```
$ python -V
Python 3.7.1
$ python perf_test1.py data_AHS2013/topical.sas7bdat 30
Minimum: 46.9720 seconds
```

## Julia (ObjectPool)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4
BenchmarkTools.Trial:
memory estimate: 685.66 MiB
allocs estimate: 19193161
--------------
minimum time: 1.720 s (6.37% GC)
median time: 1.806 s (11.83% GC)
mean time: 1.796 s (10.69% GC)
maximum time: 1.863 s (13.57% GC)
--------------
samples: 3
evals/sample: 1
```

## Julia (Regular String Array)
```
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4
BenchmarkTools.Trial:
memory estimate: 648.04 MiB
allocs estimate: 19048983
--------------
minimum time: 1.994 s (46.01% GC)
median time: 2.559 s (51.16% GC)
mean time: 2.559 s (51.16% GC)
maximum time: 3.123 s (54.45% GC)
--------------
samples: 2
evals/sample: 1
```
Loading

0 comments on commit 7e9ea75

Please sign in to comment.