-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: exponential moving window covariance fails for multiIndexed DataFrame #34440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Confirming that this bug exists in the master version of pandas. Output of pd.show_versions()INSTALLED VERSIONScommit : 62c7dd3 pandas : 1.1.0.dev0+1681.g62c7dd3e7.dirty |
For the purposes of debugging I tweaked @PablocFonseca's snippet to make the DataFrames are smaller and that way easier to inspect. I believe the snippet below throws the same error as the original report. import pandas as pd
import numpy as np
columns = pd.MultiIndex.from_product([['a', 'b'],['x','y'], list(range(2))])
index = range(3)
df = pd.DataFrame(
np.random.normal(size=(len(index), len(columns))),
index=index,
columns=columns
)
df.ewm(alpha=0.1).cov() #Throws AssertionError: Length of order must be same as number of levels (4), got 3 |
I tracked down the origin of the bug to the I haven't figured out the full details but I think that by the time of line 182: result = result.reorder_levels([2, 0, 1]).sort_index() # line 182 the DataFrame levels=[Index(['a', 'b'], dtype='object'), Index(['x', 'y'], dtype='object'), Int64Index([0, 1], dtype='int64'), Int64Index([0, 1, 2], dtype='int64')] and then obviously I haven't understood |
@PablocFonseca thanks for the report, and @arw2019 thanks for the confirmation and simple reproducer! It's also failing on 0.25 |
take |
Would love your feedback on #34943 - I think I fixed this problem there |
I want to work on this issue. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
When calculating ewm covariance, pandas fails when the DataFrame has multiindex columns. However it works when columns are simple Index dataframes.
It works for:
Expected Output
The covariance, actually only the last matrix (last level of index)
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.8.3
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : 0.8.3
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.49.0
The text was updated successfully, but these errors were encountered: