-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: read_excel with Workbook and engine="openpyxl" raises ValueError #39528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I will have time later today to look into this. Can you open/wrap the workbook with import pandas as pd
with pd.ExcelFile('test.xlsx', engine="openpyxl") as excel:
dataframe = pd.read_excel(excel) |
Can reproduce on master.
|
@twoertwein |
Yes, it could probably be fixed by something like:
|
Thanks for the report @ajkaijanaho! @twoertwein - seems to me like the patch should be improving inspect_excel_format itself instead of avoiding calling it. |
Not really useful for us, because in our code the Workbook object is created somewhere else and passed around the call chain quite a bit before we get to Pandas. Rewriting all those calls to pass around ExcelFile is not really worth the effort. Our workaround is to stay with 1.1.5 for now, and it works well enough. |
@twoertwein - I was wrong above, your suggested patch is much better. We should really avoid calling |
Depending on which behavior is expected, this simple elif-patch is probably not enough. If a user provides a workbook compatible with one of the engines but does not specify an engine explicitly, do we need to auto-detect the engine from the workbook type? If we need that, it should probably go into |
@twoertwein: Comparing to 1.1.x:
It appears to me that passing a Workbook with Supporting engine=None with Workbooks would be an additional feature, which might be welcome, and should go into 1.3 (or later). My only hesitation here is that it seems the implementation would have to special-case all engines and their individual workbook (and worksheet?) types which I think is a bit undesirable. If there is an implementation that is free of this, then I'd certainly be +1. |
The read_excel documentation is quite explicit about this point: "engine str, default None If io is not a buffer or path, this must be set to identify io" |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
Problem description
In pandas 1.1.5, the above code completes with no problems.
In pandas 1.2.1, it causes the following exception:
The documentation does not specify Workbook as an acceptable value type for io, but supporting it seems reasonable and accords with the 1.1.5 behavior.
In my use case, we mainly parse an Excel file with openpyxl but use pandas with a specific sub-problem. We would like to reuse the same Workbook instead of having pandas re-read the file.
The text was updated successfully, but these errors were encountered: