Support reading random rows in `read_csv` #14285

seven7e · 2016-09-23T06:45:37Z

It is very common to read random rows in a large csv file, typically for testing with a small dataset, or fit the limit of memory. The parameter nrows is used for read the first n lines, but I didn't find any feature to read random lines. Such parameter might be named keeprows (opposite to skiprows), which supports:

int, e.g. keeprows=100 means keep 100 random lines (uniformly)
float in (0, 1), e.g. keeprows=0.05 means keep 5% of total lines
list of int(or iterable), e.g. keeprows=[1, 3, 8] mean to keep line 1, 3, and 8

The text was updated successfully, but these errors were encountered:

jreback · 2016-09-23T12:02:00Z

duplicate of #3132

yes, this is a good idea.

jreback added Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv labels Sep 23, 2016

jreback added this to the No action milestone Sep 23, 2016

jreback closed this as completed Sep 23, 2016

jreback mentioned this issue Sep 23, 2016

ENH: add an option to read_csv to select a specific rows / sample of the csv file #3132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support reading random rows in `read_csv` #14285

Support reading random rows in `read_csv` #14285

seven7e commented Sep 23, 2016

jreback commented Sep 23, 2016

Uh oh!

Uh oh!

Support reading random rows in read_csv #14285

Support reading random rows in read_csv #14285

Comments

seven7e commented Sep 23, 2016

jreback commented Sep 23, 2016

Uh oh!

Support reading random rows in `read_csv` #14285

Support reading random rows in `read_csv` #14285