Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fread options not extended to excel files #2632

Open
samukweku opened this issue Sep 16, 2020 · 5 comments
Open

Fread options not extended to excel files #2632

samukweku opened this issue Sep 16, 2020 · 5 comments
Labels
bug Any bugs / errors in datatable; however for severe bugs use [segfault] label

Comments

@samukweku
Copy link
Contributor

  • Did you find a bug in datatable, or maybe the bug found you?
    I cant select columns or limit number of rows read when working on an excel file

  • How to reproduce the bug?

import urllib

url = 'https://github.com/samukweku/Extracting-Data-from-Excel-with-Python/blob/master/names.xlsx?raw=true'

filename = 'test.xlsx'
        
urllib.request.urlretrieve(url, filename)

fread("test.xlsx/naija")
      Name	      Age	Height
0	Tolu	        24       2
1	Chukwuka 	50	1.8
2	Ogor	        15	1.5

# column selection fails
fread('test.xlsx/naija', columns={"Name"})

# nrows fails as well
fread("test.xlsx/naija", max_nrows=2)
  • What was the expected behavior?
  # column selection : 
fread('test.xlsx/naija', columns={"Name"})
      Name
0	Tolu
1	Chukwuka
2	Ogor

# row selection : 
fread("test.xlsx/naija", max_nrows=2)
      Name	     Age	Height
0	Tolu	     24       	2
1	Chukwuka     50      	1.8
  • Your environment?
dt.__version__
'0.11.0a0+build.1600188275.sam'

sys.version
'3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:22:49) \n[GCC 7.5.0]'

operating system
'Linux-5.4.0-7642-generic-x86_64-with-glibc2.10'
@pradkrish
Copy link
Collaborator

pradkrish commented Nov 3, 2020

@samukweku I can confirm that I am able to reproduce your observation for xlsx files. I only hope it is not by design. :)

@pradkrish pradkrish added the bug Any bugs / errors in datatable; however for severe bugs use [segfault] label label Nov 3, 2020
@pradkrish pradkrish changed the title [BUG]Fread options not extended to excel files Fread options not extended to excel files Nov 3, 2020
@pradkrish
Copy link
Collaborator

Just briefly looked into the code. I don't think it's even implemented for xls files, perhaps @st-pasha could comment on this??

@st-pasha
Copy link
Contributor

st-pasha commented Nov 3, 2020

That's correct. The logic for reading Excel files is completely separate from the code that reads CSV files, and consequently every option has to be implemented for Excel parsing separately.

@pradkrish
Copy link
Collaborator

That's what I gathered looking at the code. is there an appetite to do so for Excel files? My guess is csv files are probably much more common than Excel files in data manipulation tasks making this a low priority.

@Jiezuel
Copy link

Jiezuel commented Feb 25, 2021

I had the same problem when I tried to read .xlsx file.
I think the reason is probably that fread uses xlrd to read .xlsx or .xls file. See fread.py in line 325 and datatable/xls.py.
Unfortunately, xlrd updated the version on 11 Dec 2020, and it had removed support for anything other than .xls files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Any bugs / errors in datatable; however for severe bugs use [segfault] label
Projects
None yet
Development

No branches or pull requests

4 participants