Skip to content

Feature request: Protocol for converting something to a pandas DataFrame #30218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
amueller opened this issue Dec 11, 2019 · 4 comments
Open
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors Enhancement

Comments

@amueller
Copy link

This is inspired by the discussion in scikit-learn/enhancement_proposals#25.

NumPy defines an __array__ protocol that allows developers to implement classes that can be converted to an array by calling np.asarray() . That makes it easy to have a common interface between libraries and it's heavily used by pandas and sklearn.

It would be great to have a similar protocol for converting something to a pandas DataFrame. The goal would be to allow users to pass other data structures to libraries that expect a dataframe, say seaborn, as long as the data structures allow conversion to pd.DataFrame.

A workaround is for the developer of the new datastructure to provide an .asframe method, but that creates friction and requires the users to know what data type a particular library or function expects. If instead the developer of the datastructure can declare that conversion to a dataframe is possible, the library author (say seaborn) can request conversion to dataframe in a unified manner.

The implementation of this is probably pretty simple as it requires "only" a special case in pd.DataFrame.__init__. The main work is probably in adding it to developer documentation and publicizing it correctly.

cc @jorisvandenbossche

@TomAugspurger
Copy link
Contributor

Sorry this was ignored @amueller.

I'd be happy to have a hasattr check on the data argument in DataFrame.__init__ (and probably Series.init) that simply forwards through all the relevant arguments to give control back to the library.

Our constructors are already quite complex so I might be missing something, but I'm hopeful that this is doable.

@TomAugspurger TomAugspurger added API Design Constructors Series/DataFrame/Index/pd.array Constructors labels Mar 10, 2020
@amueller
Copy link
Author

No worries. Not sure @GaelVaroquaux saw it either before opening the discussion.

There's probably subtleties in the implementation but I think the main question is whether this is something the ecosystem wants.

@GaelVaroquaux
Copy link
Contributor

Sorry, I didn't see it. However, I think that it's useful to have the discussion in a wider setting than pandas. Though I am afraid that this discussion is going off track with complexity (out-of-core computation...) rather than on the 80/20 tradeoffs.

@thomasjpfan
Copy link
Contributor

Linking this issue to PR: #46141 which introduces the dataframe exchange protocol to pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors Enhancement
Projects
None yet
Development

No branches or pull requests

5 participants