Skip to content

BUG: exponential moving window covariance fails for multiIndexed DataFrame #34440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
PablocFonseca opened this issue May 28, 2020 · 7 comments · Fixed by #34943
Closed
2 of 3 tasks

BUG: exponential moving window covariance fails for multiIndexed DataFrame #34440

PablocFonseca opened this issue May 28, 2020 · 7 comments · Fixed by #34943
Assignees
Labels
Bug MultiIndex Window rolling, ewma, expanding
Milestone

Comments

@PablocFonseca
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np 

columns = pd.MultiIndex.from_product([['a','b','c'],['x','y','w','z'], list(range(9))])
index = range(1000)
df = pd.DataFrame(
    np.random.normal(size=(len(index), len(columns))),
    index=index,
    columns=columns
    )
    
df.ewm(alpha=0.1).cov()  #Throws AssertionError: Length of order must be same as number of levels (4), got 3

Problem description

When calculating ewm covariance, pandas fails when the DataFrame has multiindex columns. However it works when columns are simple Index dataframes.
It works for:

pd.DataFrame(df.values).ewm(alpha=0.1).cov()

Expected Output

The covariance, actually only the last matrix (last level of index)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.8.3
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : 0.8.3
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.49.0

@PablocFonseca PablocFonseca added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 28, 2020
@arw2019
Copy link
Member

arw2019 commented May 29, 2020

Confirming that this bug exists in the master version of pandas.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 62c7dd3
python : 3.8.2.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-101-generic
Version : #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+1681.g62c7dd3e7.dirty
numpy : 1.17.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 46.4.0.post20200518
Cython : 0.29.19
pytest : 5.4.2
hypothesis : 5.15.1
sphinx : 3.0.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.14.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : 0.4.0
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.49.1

@arw2019
Copy link
Member

arw2019 commented May 29, 2020

For the purposes of debugging I tweaked @PablocFonseca's snippet to make the DataFrames are smaller and that way easier to inspect. I believe the snippet below throws the same error as the original report.

import pandas as pd
import numpy as np 

columns = pd.MultiIndex.from_product([['a', 'b'],['x','y'], list(range(2))])
index = range(3)
df = pd.DataFrame(
    np.random.normal(size=(len(index), len(columns))),
    index=index,
    columns=columns
    )
    
df.ewm(alpha=0.1).cov()  #Throws AssertionError: Length of order must be same as number of levels (4), got 3

@arw2019
Copy link
Member

arw2019 commented May 29, 2020

I tracked down the origin of the bug to the _flex_binary_moment method in pandas/core/window/common.py.

I haven't figured out the full details but I think that by the time of line 182:

result = result.reorder_levels([2, 0, 1]).sort_index()   # line 182

the DataFrame result has been incorrectly indexed. For my example the levels of result's MultiIndex are:

levels=[Index(['a', 'b'], dtype='object'), Index(['x', 'y'], dtype='object'), Int64Index([0, 1], dtype='int64'), Int64Index([0, 1, 2], dtype='int64')]

and then obviously reorder_levels fails in line 182 because that assumes that result has only three levels in its MultiIndex.

I haven't understood _flex_binary_moment well enough to quite know what the correct behavior is and what adjustment is needed - but I'll keep digging!

@jorisvandenbossche
Copy link
Member

@PablocFonseca thanks for the report, and @arw2019 thanks for the confirmation and simple reproducer!

It's also failing on 0.25

@mroeschke mroeschke added MultiIndex Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2020
@arw2019
Copy link
Member

arw2019 commented Jun 20, 2020

take

@arw2019
Copy link
Member

arw2019 commented Jun 24, 2020

Would love your feedback on #34943 - I think I fixed this problem there

@Dinkarkumar
Copy link

I want to work on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants