Skip to content

Support reading random rows in read_csv #14285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seven7e opened this issue Sep 23, 2016 · 1 comment
Closed

Support reading random rows in read_csv #14285

seven7e opened this issue Sep 23, 2016 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv

Comments

@seven7e
Copy link

seven7e commented Sep 23, 2016

It is very common to read random rows in a large csv file, typically for testing with a small dataset, or fit the limit of memory. The parameter nrows is used for read the first n lines, but I didn't find any feature to read random lines. Such parameter might be named keeprows (opposite to skiprows), which supports:

  • int, e.g. keeprows=100 means keep 100 random lines (uniformly)
  • float in (0, 1), e.g. keeprows=0.05 means keep 5% of total lines
  • list of int(or iterable), e.g. keeprows=[1, 3, 8] mean to keep line 1, 3, and 8
@jreback
Copy link
Contributor

jreback commented Sep 23, 2016

duplicate of #3132

yes, this is a good idea.

@jreback jreback added Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv labels Sep 23, 2016
@jreback jreback added this to the No action milestone Sep 23, 2016
@jreback jreback closed this as completed Sep 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

2 participants