-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Rolling standard deviation fails when used with win_type #26597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That seems to be the case. In [23]: df.rolling(3, win_type='blackman').agg('std')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-23-0df3ddf74018> in <module>
----> 1 df.rolling(3, win_type='blackman').agg('std')
~/sandbox/pandas/pandas/core/window.py in aggregate(self, arg, *args, **kwargs)
748 @Appender(_shared_docs['aggregate'])
749 def aggregate(self, arg, *args, **kwargs):
--> 750 result, how = self._aggregate(arg, *args, **kwargs)
751 if result is None:
752
~/sandbox/pandas/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
328 if isinstance(arg, str):
329 return self._try_aggregate_string_function(arg, *args,
--> 330 **kwargs), None
331
332 if isinstance(arg, dict):
~/sandbox/pandas/pandas/core/base.py in _try_aggregate_string_function(self, arg, *args, **kwargs)
295 f = getattr(np, arg, None)
296 if f is not None:
--> 297 return f(self, *args, **kwargs)
298
299 raise ValueError("{arg} is an unknown string function".format(arg=arg))
<__array_function__ internals> in std(*args, **kwargs)
~/sandbox/numpy/numpy/core/fromnumeric.py in std(a, axis, dtype, out, ddof, keepdims)
3354
3355 return _methods._std(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
-> 3356 **kwargs)
3357
3358
~/sandbox/numpy/numpy/core/_methods.py in _std(a, axis, dtype, out, ddof, keepdims)
214 def _std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False):
215 ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
--> 216 keepdims=keepdims)
217
218 if isinstance(ret, mu.ndarray):
~/sandbox/numpy/numpy/core/_methods.py in _var(a, axis, dtype, out, ddof, keepdims)
185 arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
186 else:
--> 187 arrmean = arrmean.dtype.type(arrmean / rcount)
188
189 # Compute sum of squared deviations from mean
~/sandbox/pandas/pandas/core/window.py in __getattr__(self, attr)
146
147 raise AttributeError("%r object has no attribute %r" %
--> 148 (type(self).__name__, attr))
149
150 def _dir_additions(self):
AttributeError: 'Window' object has no attribute 'dtype' it'd be good to verify if that's intentional, or whether it's an implementation detail. If it's intentional, then we should be able to verify which aggfuncs are compatible with which window types, and raise an informative error message before attempting the aggregation. @Connossor are you interested in doing that investigation? |
Hi @TomAugspurger, Thanks for the response, and yep agree with your plan. I did my best to look through the codebase, and it's not clear to me which aggfuncs are supposed to be compatible with which window types. It seems from here that if a string aggregation such as "std", then if it exists as a function in numpy it gets used: Lines 299 to 323 in cb00deb
From some research I think a std() ought to be useable with any window type. A formula for a weighted standard deviation is here: |
This issue may be related to issue #26462, where the mean() aggregation seems to have different behaviour for a rolling window versus a grouby-rolling window. |
@TomAugspurger I've investigated as you suggested, here is what I think is happening: This function seems to govern what class is actually used: we get a Lines 2626 to 2633 in addc5fc
Then, when the The Window class has no On the other hand, the On a related note: the Overall, it looks like we need two fixes:
What do you think? |
Your number 1. sounds reasonable. Not sure about 2. I'm not that familiar with this code. |
sure this could be changed |
Uh oh!
There was an error while loading. Please reload this page.
Code Sample, a copy-pastable example if possible
Problem description
When calculating rolling aggregations with a window function, sometimes there is an error which is quite hard to understand. I think it might be that the std() aggregation is not compatible with certain window types, or something like that- but it's not clear from the error messages or documentation what is going wrong.
Here is the stack trace that I see when I run the code sample above:
Expected Output
When I run the code above without the win_type paramater, everything works fine:
Result:
Thanks in advance for any tips!
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: