You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NumPy defines an __array__ protocol that allows developers to implement classes that can be converted to an array by calling np.asarray() . That makes it easy to have a common interface between libraries and it's heavily used by pandas and sklearn.
It would be great to have a similar protocol for converting something to a pandas DataFrame. The goal would be to allow users to pass other data structures to libraries that expect a dataframe, say seaborn, as long as the data structures allow conversion to pd.DataFrame.
A workaround is for the developer of the new datastructure to provide an .asframe method, but that creates friction and requires the users to know what data type a particular library or function expects. If instead the developer of the datastructure can declare that conversion to a dataframe is possible, the library author (say seaborn) can request conversion to dataframe in a unified manner.
The implementation of this is probably pretty simple as it requires "only" a special case in pd.DataFrame.__init__. The main work is probably in adding it to developer documentation and publicizing it correctly.
I'd be happy to have a hasattr check on the data argument in DataFrame.__init__ (and probably Series.init) that simply forwards through all the relevant arguments to give control back to the library.
Our constructors are already quite complex so I might be missing something, but I'm hopeful that this is doable.
Sorry, I didn't see it. However, I think that it's useful to have the discussion in a wider setting than pandas. Though I am afraid that this discussion is going off track with complexity (out-of-core computation...) rather than on the 80/20 tradeoffs.
This is inspired by the discussion in scikit-learn/enhancement_proposals#25.
NumPy defines an
__array__
protocol that allows developers to implement classes that can be converted to an array by callingnp.asarray()
. That makes it easy to have a common interface between libraries and it's heavily used by pandas and sklearn.It would be great to have a similar protocol for converting something to a pandas
DataFrame
. The goal would be to allow users to pass other data structures to libraries that expect a dataframe, say seaborn, as long as the data structures allow conversion topd.DataFrame
.A workaround is for the developer of the new datastructure to provide an
.asframe
method, but that creates friction and requires the users to know what data type a particular library or function expects. If instead the developer of the datastructure can declare that conversion to a dataframe is possible, the library author (say seaborn) can request conversion to dataframe in a unified manner.The implementation of this is probably pretty simple as it requires "only" a special case in
pd.DataFrame.__init__
. The main work is probably in adding it to developer documentation and publicizing it correctly.cc @jorisvandenbossche
The text was updated successfully, but these errors were encountered: