-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support decimal option in PythonParser #12933 #13189
Conversation
return lines | ||
|
||
if self.thousands is None: | ||
nonnum = re.compile('[^-^0-9^%s]+' % self.decimal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be created in init
Current coverage is 84.16%@@ master #13189 diff @@
==========================================
Files 138 138
Lines 50496 50404 -92
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 42501 42418 -83
+ Misses 7995 7986 -9
Partials 0 0
|
can you post a quick benchmark on how this differents from current (with python engine; you will have to actually use the decimal as the sep otherwise won't have a good comparison), but just for sake of completness. |
@@ -38,6 +38,8 @@ Other enhancements | |||
idx = pd.Index(["a1a2", "b1", "c1"]) | |||
idx.str.extractall("[ab](?P<digit>\d)") | |||
|
|||
- Support decimal option in PythonParser (:issue:`12933`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`
so something like
pd.read_csv()
with engine='python'
gained support for the decimal
option.
minor comments. pls squash; ping when green. @gfyoung any comments? |
@camilocot :
|
@jreback regarding the benchmark, I'll read some asv documentation before posting it or with a %timeit is enought ? |
Read the |
Also, I looked at the test again that I asked you to change (with the comment), and I am unfortunately no longer convinced that a comment change is sufficient. Can you change the |
yeah you can add to the asv benchmarks. I don't expect them to change significantly (as this is a special passed option), but nice to add a benchmark and (show the results here). |
@jreback:
@gfyoung test assert was fixed, could you please take a look ? |
@camilocot: Test assertion fix LGTM. Thanks! |
|
||
def setup(self): | ||
self.data_decimal = '0,1213700904466425978256438611;0,0525708283766902484401839501;0,4174092731488769913994474336\n 0,4096341697147408700274695547;0,1587830198973579909349496119;0,1292545832485494372576795285\n 0,8323255650024565799327547210;0,9694902427379478160318626578;0,6295047811546814475747169126\n 0,4679375305798131323697930383;0,2963942381834381301075609371;0,5268936082160610157032465394\n 0,6685382761849776311890991564;0,6721207066140679753374342908;0,6519975277021627935170045020\n ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these benchmarks exist for the c-engine as well (ideally we would use the same exact data so we can compare)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback, added one benchmark with with same date for c-engine. Here are the results:
▶ asv continuous master 12933 -b parser_vb.read_csv_default
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.
·· Installing into conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.
· Running 8 total benchmarks (2 commits * 1 environments * 4 benchmarks)
[ 0.00%] · For pandas commit hash 465272e1:
[ 0.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...............................................
[ 0.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 12.50%] ··· Running parser_vb.read_csv_default_converter.time_read_csv_default_converter 1.99ms
[ 25.00%] ··· Running parser_vb.read_csv_default_converter_python_engine.time_read_csv_default_converter 2.87ms
[ 37.50%] ··· Running parser_vb.read_csv_default_converter_with_decimal.time_read_csv_default_converter_with_decimal 2.00ms
[ 50.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal_python_engine.time_read_csv_default_converter_with_decimal 9.24ms
[ 50.00%] · For pandas commit hash 86f68e6a:
[ 50.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 50.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 62.50%] ··· Running parser_vb.read_csv_default_converter.time_read_csv_default_converter 1.97ms
[ 75.00%] ··· Running parser_vb.read_csv_default_converter_python_engine.time_read_csv_default_converter 2.87ms
[ 87.50%] ··· Running parser_vb.read_csv_default_converter_with_decimal.time_read_csv_default_converter_with_decimal 1.99ms
[100.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal_python_engine.time_read_csv_default_converter_with_decimal failed
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
thanks @camilocot |
closes pandas-dev#12933 Author: Camilo Cota <ccota@riplife.es> Closes pandas-dev#13189 from camilocot/12933 and squashes the following commits: 465272e [Camilo Cota] Benchmark decimal option in read_csv for c engine 9f42d0c [Camilo Cota] double backticks around decimal and engine='python' dc8ca62 [Camilo Cota] fix test_empty_decimal_marker comment 49613fe [Camilo Cota] Assert read_csv error message in test_empty_decimal_marker d821052 [Camilo Cota] fix test_empty_decimal_marker comment f71509d [Camilo Cota] Include descritive what's new line 803356e [Camilo Cota] set nonnum regex in init method 1472d80 [Camilo Cota] Include the issue number in what's new b560fda [Camilo Cota] Fix what's new dc7acd1 [Camilo Cota] ENH: support decimal option in PythonParser pandas-dev#12933
ENH: support decimal option in PythonParser pandas-dev#12933 closes pandas-dev#12933 Author: Camilo Cota <ccota@riplife.es> Closes pandas-dev#13189 from camilocot/12933 and squashes the following commits: 465272e [Camilo Cota] Benchmark decimal option in read_csv for c engine 9f42d0c [Camilo Cota] double backticks around decimal and engine='python' dc8ca62 [Camilo Cota] fix test_empty_decimal_marker comment 49613fe [Camilo Cota] Assert read_csv error message in test_empty_decimal_marker d821052 [Camilo Cota] fix test_empty_decimal_marker comment f71509d [Camilo Cota] Include descritive what's new line 803356e [Camilo Cota] set nonnum regex in init method 1472d80 [Camilo Cota] Include the issue number in what's new b560fda [Camilo Cota] Fix what's new dc7acd1 [Camilo Cota] ENH: support decimal option in PythonParser pandas-dev#12933 ENH: Allow to_sql to recognize single sql type pandas-dev#11886 PEP pandas-dev#3
git diff upstream/master | flake8 --diff