-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output excel table objects with to_xlsx() #24862
Comments
I think this would be a reasonable enhancement - probably via some kind of parameter I'd start around here for the existing Excel logic. You can ping on here with questions or also feel free to put up a WIP PR. pandas/pandas/io/formats/excel.py Line 631 in 6d918f0
There are probably some annoying corner cases with |
I am trying to bolt on this functionality as per @chris-b1 suggestion with an extra keyword to the already very heave For an idea of the options we could support, here is a snippet for OpenPyXL (full example here: # add column headings. NB. these must be strings
ws.append(["Fruit", "2011", "2012", "2013", "2014"])
for row in data:
ws.append(row)
tab = Table(displayName="Table1", ref="A1:E5")
# Add a default style with striped rows and banded columns
style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False,
showLastColumn=False, showRowStripes=True, showColumnStripes=True) And from XlsxWriter worksheet.add_table('B3:F7', {'data': data,
'style': 'Table Style Light 11'}) XlsxWriter supports the following keywords:
|
Without having looked to deeply it feels like toggling this via a keyword argument would still be the better way to go, as a separate function would I think end up duplicating a lot of functionality. With that said, if you see a better way of going about it always open to ideas - probably just best to push a PR and have it reviewed in that case |
going to be -1 on a to_excel_table this certainly should be done via a keyword |
Clear, I'll give it a go and see where it leads |
@chris-b1, I put up the WIP PR. For discussion, which part of the current
Proposed:
And then there are many other other options that could be supported. think it would be better to leave that out though:
|
@WillAyd, @jreback, @chris-b1 To effectively test the proposed from openpyxl import load_workbook
from pandas.core.dtypes.common import is_list_like
from pandas.core.frame import DataFrame
def read_excel_tables(io, table_name=None, index_col='auto'):
"""Read an Excel table into a pandas dataframe.
Only supports xlsx files.
Parameters
----------
io : str, file descriptor or pathlib.Path
table_name : str or None, default None
Strings are used for table names. Specify None to get all
tables in a dict of dataframes.
index_col: int, list of int, None or 'auto', Default 'auto'
Column (0-indexed) to use as the row labels of the DataFrame.
Pass None if there is no such column. If a list is passed,
those columns will be combined into a ``MultiIndex``.
'auto' will determine if there is an index column from the table
`First Column` option in Excel
"""
def get_tables(io):
# unfortunately tables are only parsed in the slower write mode
wbk = load_workbook(io, data_only=True, read_only=False)
tables = {}
for wks in wbk:
for t in wks._tables:
tables[t.name] = dict(table=t, wks=wks)
return tables
def read_table(table, wks, index_col):
columns = [col.name for col in table.tableColumns]
data_rows = wks[table.ref][
(table.headerRowCount or 0):
-table.totalsRowCount if table.totalsRowCount is not None else None]
data = [[cell.value for cell in row] for row in data_rows]
frame = DataFrame(data, columns=columns, index=None)
if index_col:
if index_col == 'auto':
if table.tableStyleInfo.showFirstColumn:
frame.set_index(columns[0])
elif is_list_like(index_col):
frame = frame.set_index([columns[i] for i in index_col])
else:
frame = frame.set_index(columns[index_col])
return frame
tables = get_tables(io)
if table_name is not None:
return read_table(**tables[table_name], index_col=index_col)
else:
return {k: read_table(**v, index_col=index_col) for k, v in tables.items()} I think it would make perfect sense to also include this in pandas, however not really sure where to place that code. There is currently only an So then I have the following questions:
|
Makes sense to me!
Yep
I would say yes; I don't think it makes sense to implement an engine for a small subset of functionality as it just makes our API more confusing. Note that there has been a decent amount of refactoring going on to better support community engagement on this. There's also an open issue #11499 for this. I would think if anything this is the most logical starting point to just get the reader up and working. From there you could add table support (I think better as a keyword argument in
Yea I think a keyword argument makes the most sense as it keeps the API simplest and you could leverage the existing functionality of other applicable parameters |
Didn't know that supporting more readers was an ongoing development. Are you still working on that? Also, just out of curiosity, what is the rationale for supporting multiple excel libraries? |
I haven't actually put any code into it, just be reorganizing things in hopes of better community engagement. As far as supporting multiple libraries goes the libraries support different file types. Openpyxl I believe is only .xlsx format, xlrd can handle .xls in addition to .xlsx. Off the top of my head I don't think either support .xlsb and not sure about .xlsm, so having multiple engines gives flexibility to seamlessly deal with different file types amongst potential other optimizations available to the end user |
I'll give it a go then to make an OpenPyXL reader. I see there are pretty extensive tests for |
# Conflicts: # pandas/core/generic.py # pandas/io/excel.py # pandas/io/formats/excel.py # pandas/tests/io/test_excel.py
I'm wondering if just having a cc @Themanwithoutaplan for thoughts |
|
I just merged the master to get a feel again for where I left off. Seems to work, but still a lot of loose ends to be discussed. @WillAyd, perhaps you could reopen #24899? Or maybe keep the discussion here for the moment. In the meantime I'll try to see if I can get a table keyword in the read_excel function. Should reading and writing tables be the same pr? |
I bit the bullet and created a dedicated package to read and write properly formatted excel tables with pandas: https://github.com/VanOord/pandas-xlsx-tables |
Appears there hasn't been much activity or community support for this feature in a while so closing. Happy to reopen if there's renewed support |
Currently pandas can quickly write a dataframe to an excel sheet. However the output is a plain workbook with a bunch of values, and not the much more powerful excel table object. Excel table objects are very useful because they allow referencing columns/cells by header name instead of
$A$23
, better filtering, sorting, formatting, pivoting, plotting etc. Of course this can be achieved by selecting the cells output by Pandas and use theFormat as Table
functionality, but why not support this out of the box?With XlsxWriter this functionality is fully supported , so it should not be too hard to implement. I would be willing to make a PR if there is any interest, though I might need some guidance as I am not familiar with the Pandas code base.
The text was updated successfully, but these errors were encountered: