Skip to content

ENH: add an option to read_csv to select a specific rows / sample of the csv file #3132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Mar 21, 2013 · 4 comments
Labels
Enhancement Ideas Long-Term Enhancement Discussions IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label

Comments

@jreback
Copy link
Contributor

jreback commented Mar 21, 2013

propose an option, sample or keeprows (#14285)

maybe taking a callable, for really just something like sample=10

to return you every 10th row

if it takes a callbale couuld use something like: lambda x: x % 10

easy way to get a sample of the csv file

useful to avoid specifying skiprows with a big list

and to solve an issue like this:

http://stackoverflow.com/questions/15555005/get-inferred-dataframe-types-iteratively-using-chunksize

@jreback jreback changed the title ENH: add an option to read_csv, sample, to select a sample of the csv file ENH: add an option to read_csv to select a specific rows / sample of the csv file Sep 23, 2016
@seven7e
Copy link

seven7e commented Sep 24, 2016

Oh, this issue has been opened for more than 3 years, and still opening? Why

@jorisvandenbossche
Copy link
Member

Oh, this issue has been opened for more than 3 years, and still opening? Why

Because nobody has implemented it. Pull requests with code are certainly welcome.

@seven7e
Copy link

seven7e commented Sep 24, 2016

I would like to have a try, though it's "Difficulty Intermediate" and I am a newbie for the development of Pandas.

@linebp
Copy link
Contributor

linebp commented Jun 14, 2017

This seems to be already solved by this #15059. skiprows takes a callable with the linenumber as argument and returns true or false on whether or not to keep the row.

@jreback jreback closed this as completed Jun 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Ideas Long-Term Enhancement Discussions IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

4 participants