-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String columns are read as None #43
Comments
I had a quick look at it and had the same issue so I don't think you're doing anything wrong. For now, have you tried re-saving the file with Excel? The library should be able to handle it anyway, but just to check if it's a specific glitch with that file. |
Okay, i have finally got access to a Windows install with working Excel and it turns out that after re-saving the file with Excel the file size goes from 45kB to 65kB. Now pyxlsb is able to properly read the re-saved file. |
Something similar happened in #23. But since I have your "broken" file, this will help figure out what's weird with it and hopefully make pyxlsb more lenient. |
I have been looking at it and i'm on to something regarding the cause of this. I added some more debug logging to ...
recdata = self._fp.read(reclen)
with RecordReader(recdata) as reader:
# Addition start
if recid not in self.handlers:
print("Unknown handler " + str(recid) + " with len " + str(reclen) + " (line below)")
# Addition end
ret = (self.handlers.get(recid) or Handler()).read(reader, recid, reclen)
if self._debug:
print('{:08X} {:04X} {:<6} {} {}'.format(pos, recid, reclen, ' '.join('{:02X}'.format(b) for b in recdata), ret))
... In my testing this reports a missing handler for @willtrnr Do you have a binary specification you use commonly for this? I think the CellHandler could be added for recid Debug logging for reference:
On the last line, 0x06 is the length. And the string starts with 0x37 |
v1.0.x...jeffreykog:fix-special-strings That is what was required to get it working. The behavior of this cell type seems to be a mix between I have not opened this as a PR as i'm sure the quality of my 'fix' is not good enough to merge into upstream. But i'm sure this can be used as a reference for other people or as inspiration for a more permanent fix. |
That's great, thanks a lot for looking into it. The reference for this is MS-XLSB. I see that there's quite a few new major releases since I last checked up on the record sets, it's quite possible that this is a new-ish feature or that some Excel version decided to start using it. Skimming through the docs, the record 62 seems to be related to string cells in shared workbooks. Looks like you were pretty close with your "special string" idea. If you're up for it, you can rework your patch to better match the technical spec in MS-XLSB and I'd happy to pull it in. Since this turns out to be a fixable "new feature", I'll re-open this. |
Hi, i am attempting to read a file (also as attachment in this ticket). The integer column (titled Site) is read correctly, but the fields containing string values are all read as String.
I was not able to find a cause for this in the source code. This file is delivered to me on a regular basis, and we have no control over the format the file is provided in. So requesting to export it as xlsx or csv is not really an option.
As github does not support uploading xlsb files, i compressed in a zip. So the zip is not the direct content of the file.
file.zip
Reproducing:
The first row of the file contains strings. The second row also contains some integer values, which are read correctly.
What am i doing wrong?
The text was updated successfully, but these errors were encountered: