Skip to content

DOC: update docs on direct plotting with matplotlib (GH8614) #8655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jorisvandenbossche
Copy link
Member

Closes #8614

  • this adds a warning that you now have to use to_pydatetime as direct plotting with a DatetimeIndex does not work anymore (see Plotting of DatetimeIndex directly with matplotlib no longer gives datetime formatted axis (0.15) #8614)
  • I also removed for now the note on speed and explanation of the registered formatters. @agijsberts could you shed some light on this?
    • "The speed up for large data sets only applies to pandas 0.14.0 and later." Why only for pandas 0.14 or later? And from where does this speed-up come from?
    • "thereby extending date and time support to practically all plot types available in matplotlib" -> but if you plot directly with matplotlib, I think you don't use the pandas registered formatters? So that sentence seems not fully correct, is that possible? And isn't that the reason for the possible speed-up (matplotlib defaults formatter being faster as pandas' formatter)?

@agijsberts
Copy link
Contributor

@jorisvandenbossche I wrote that documentation to reflect the changes in PR #6650 that I prepared for issue #6636. In short, pandas does register its own formatters (see https://github.com/pydata/pandas/blob/master/pandas/tseries/converter.py#L27), so this behavior is entirely part of pandas. The PR obtained a speed-up by replacing a call to matplotlib's date2num (explicit for-loop) with epoch2num (vectorized), hence the statement that it's much faster since pandas 0.14.0.

It seems that the formatters registered by pandas are still correctly called:

In [1]: from pandas import DataFrame, date_range

In [2]: import matplotlib.units as units

In [3]: df = DataFrame(range(100), index = date_range('20130101', periods=100))

In [4]: units.registry.get_converter(df.index)
Out[4]: <pandas.tseries.converter.DatetimeConverter instance at 0x3f33f38>

So far I do not see any obvious cause for the new problems, but I'll dive deeper into this later on.

@jorisvandenbossche
Copy link
Member Author

Ah, yes, I was confusing the 'converter' (which are registered in the matplotlib units) and the formatting of the labels (which is also nicer in pandas, but this is something you only have when plotting with pandas' plot and not when plotting directly with matplotlibs plot, while the unit converter works for both)

But, two points:

  • your statement about the improved performance also holds true when plotting with pandas, so it is not specific to plotting directly with matplotlib, so it is seems a bit out of place in this section?

  • The units.registry.get_converter(df.index) does indeed still work. But, the problem is, that when plotting with plt.plot(df.index, df['col']), the df.index is first converted to a array of datetime64, and this is not recognized by matplotlib anymore:

    In [35]: np.asarray(df.index)
    Out[35]:
    array(['2013-01-01T01:00:00.000000000+0100',
           ...
           '2013-04-10T02:00:00.000000000+0200'], dtype='datetime64[ns]')
    
    In [36]: units.registry.get_converter(np.asarray(df.index))
    
    -> None
    

    To make it fully complex, the plt.fill_between in the example does not do this (and does still convert it to datetimes), and because of that the example now crashed.

@@ -1566,15 +1566,11 @@ customization is not (yet) supported by pandas. Series and DataFrame objects
behave like arrays and can therefore be passed directly to matplotlib functions
without explicit casts.

pandas also automatically registers formatters and locators that recognize date
indices, thereby extending date and time support to practically all plot types
available in matplotlib. Although this formatting does not provide the same
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"extending date and time support to practically all plot types available in matplotlib" -> @agijsberts but matplotlib by default has also a datetime converter?
https://github.com/matplotlib/matplotlib/blob/v1.4.2/lib/matplotlib/dates.py#L1380 We just overwrite it with ours?

@agijsberts
Copy link
Contributor

@jorisvandenbossche Re. your points:

  • df.plot() did not benefit from the PR as it uses (at least back then) a different converter and formatter. See Speed up DatetimeConverter for plotting #6636 (comment) for details.
  • You're right, plt.plot ends up looking for a datetime64 formatter, which does not exist. The easiest workaround to make things work as they were is to register pandas' DateTimeFormatter for datetime64:
from numpy import datetime64
import pandas as pd
import matplotlib.units as units
import matplotlib.pyplot as plt
df = pd.DataFrame(range(100), index = pd.date_range('20130101', periods=100))
units.registry[datetime64] = pd.tseries.converter.DatetimeConverter()
plt.plot(df.index, df)
plt.show()
  • The DateTimeConverter in matplotlib is limited to datetime.datetime and datetime.date. Of course you could use it if you first convert the index with to_pydatetime(), but it is wasteful to convert datetime64 to datetime and then again to float. An example with a DateTimeIndex of length 100000:
In [22]: %timeit DatetimeConverter.convert(df.index, None, None)
10000 loops, best of 3: 74.4 us per loop

In [23]: %timeit DateConverter.convert(df.index.to_pydatetime(), None, None)
100 loops, best of 3: 9.35 ms per loop
  • When not importing pandas, then I believe there two options to plot datetime64 as a time-axis:
    1. convert to datetime.datetime and use matplotlib's DateConverter (slow, see above)
    2. manually convert datetime64 to matplotlib's time representation with epoch2num(dt.asi8 / 1.0E9). Of course then you are still responsible for installing the date/time formatters. This conversion is however very fast and exactly what the PR implemented.

@jorisvandenbossche
Copy link
Member Author

So our DatetimeConverter already works for datetime64? So we should just register it, and the initial problem is solved! (#8614)

Simply doing units.registry[np.datetime64] = DatetimeConverter() solves the issue

@agijsberts Thanks a lot for shedding your light on this!

@agijsberts
Copy link
Contributor

@jorisvandenbossche I'm glad to help. And yes, DatetimeConverter should work for datetime64; it actually exploits the fact that DatetimeIndex is stored as datetime64[ns].

@jorisvandenbossche
Copy link
Member Author

OK, I will then close this PR as it is totally the wrong way :-)
and open an new one to register the converter for datetime64. Or would there be people who rely on the fact that datetime64 arrays are regarded as ints in matplotlib? Let's discuss further in #8655.

@jreback jreback modified the milestones: 0.15.2, 0.15.1 Oct 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plotting of DatetimeIndex directly with matplotlib no longer gives datetime formatted axis (0.15)
3 participants