Skip to content

ENH: #40231 does not allow writing multiple dataframes to a single sheet #42221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
feefladder opened this issue Jun 25, 2021 · 8 comments · Fixed by #42222
Closed

ENH: #40231 does not allow writing multiple dataframes to a single sheet #42221

feefladder opened this issue Jun 25, 2021 · 8 comments · Fixed by #42222
Labels
Enhancement IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@feefladder
Copy link
Contributor

feefladder commented Jun 25, 2021

The problem

The following code works well, both in 1.4.0 and 1.2.5:

df = pd.DataFrame(["=1+7"])
with pd.ExcelWriter("test.xlsx",engine="openpyxl") as writer:
    df.to_excel(writer,sheet_name="Sheet2")
    pd.DataFrame(["=1+1"]).to_excel(writer,sheet_name="Sheet2",header=None,index=False,startcol=4,startrow=3)

however, the following code raises an error in 1.4.0, but works in 1.2.5 (it creates a new sheet and writes both dataframes to it):

pd.DataFrame(["=1+1"]).to_excel("test.xlsx")
df = pd.DataFrame(["=1+7"])
with pd.ExcelWriter("test.xlsx",engine="openpyxl",mode="a") as writer:
    df.to_excel(writer,sheet_name="Sheet2")
    pd.DataFrame(["=1+1"]).to_excel(writer,sheet_name="Sheet2",header=None,index=False,startcol=4,startrow=3)

Also other options cannot replicate the old behaviour.

Describe the solution you'd like

An added option to if_sheet_exists="write_to" that justs selects the sheet and writes into it, without first deleting its contents. This also allows to append a dataframe within a sheet.

EDIT: in #40230 this was also suggested behaviour, but it was not implemented:

Originally I considered adding a flag like overwrite=True to pd.ExcelWriter but I don't think this would be sufficient to cover all the potential behaviours, so would suggest an enum along the lines of the if_exists option in df.to_sql, with the following options:

* new_sheet: Create a new sheet with a different name.

* overwrite_sheet: Delete the contents of the sheet, then write to it.

* overwrite_cells: Write directly to the named sheet without deleting the previous contents.
@feefladder feefladder added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 25, 2021
@feefladder feefladder changed the title ENH: #40231 changed behaviour for ExcelWriter ENH: #40231 does not allow to write multiple dataframes to a single sheet Jun 25, 2021
@feefladder feefladder changed the title ENH: #40231 does not allow to write multiple dataframes to a single sheet ENH: #40231 does not allow writing multiple dataframes to a single sheet Jun 25, 2021
@rhshadrach rhshadrach added the Regression Functionality that used to work in a prior pandas version label Jun 25, 2021
@rhshadrach rhshadrach added this to the 1.3 milestone Jun 25, 2021
@rhshadrach
Copy link
Member

Tagging for 1.3 pending further discussion.

@rhshadrach
Copy link
Member

rhshadrach commented Jun 25, 2021

Summary: #40230 did not include the overwrite option to modify an already existing excel sheet, I've confirmed this was the default behavior of mode="a" on 1.2.x when using ExcelWriter and making multiple .to_excel calls within a single context manager. As such, the default behavior of 1.2.x is no longer possible to achieve.

cc @WillAyd @jreback

@lithomas1 lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jun 25, 2021
@feefladder
Copy link
Contributor Author

feefladder commented Jun 26, 2021

I've made a #42222 that solves this issue.

Also, naming is inconsistent between to_sql(if_exists={"fail","replace","append"}) and to_excel(if_sheet_exists={"error","replace","new",<"write_to">}) in #40230
should this be solved?

@simonjayhawkins
Copy link
Member

As such, the default behavior of 1.2.x is no longer possible to achieve.

An open PR, #42222 is an enhancement so not milestoned 1.3x.

@rhshadrach
Copy link
Member

@simonjayhawkins - #42222 is adding back in the feature that was (unknowingly, I think) removed. If that's regarded as an enhancement, how can this regression get fixed for e.g. 1.3.2?

That said, there was opposition to implementing the option in question back in the PR causing this regression. If there is still sufficient opposition to this, then it seems to me the regression should be closed as a won't-fix.

@simonjayhawkins simonjayhawkins modified the milestones: 1.3.2, 1.3.3 Aug 15, 2021
@rhshadrach rhshadrach added the IO Excel read_excel, to_excel label Aug 16, 2021
@jreback jreback modified the milestones: 1.3.3, 1.4 Aug 21, 2021
@tkrausjr
Copy link

I have hit this regression as well in multiple workflows that I have and it is causing a lot of problems for me and my company.

@feefladder
Copy link
Contributor Author

This will be solved by #42222 which will be merged soon and out in 1.4.0 :)

@feefladder
Copy link
Contributor Author

feefladder commented Sep 17, 2021

Behaviour should the be:

with ExcelWriter("existing_file.xlsx", engine="openpyxl", mode="a",  if_sheet_exists="overlay") as writer:
    df1.to_excel(writer, sheet="sheet")
    df2.to_excel(writer, sheet="sheet")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants