Skip to content

ENH: simple patch for read_json compression #16750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

colinhiggins
Copy link

@colinhiggins colinhiggins commented Jun 21, 2017

Addresses GH15644

The latest comment on #15644 from March suggest just adding compression to read_json. This does that.

Not sure what kind of tests should be added since its using pre-existing code. It works when I tested it manually, and test_fast.sh returned with:

9880 passed, 1280 skipped, 6 xfailed in 145.52 seconds

This is my first contribution, and I'm a bit confused about the whatsnew entry. I'm not sure which version the changes should be recorded in.

In that vein,git diff upstream/master --name-only -- '*.py' | flake8 --diff fails with:

fatal: bad revision 'upstream/master'

@TomAugspurger
Copy link
Contributor

I'm not sure which version the changes should be recorded in.

New feature, so it can go in whatsnew/v0.21.0.txt

In that vein,git diff upstream/master --name-only -- '*.py' | flake8 --diff fails with:

What's the output of git remote -v? That should include a couple lines like

upstream	https://github.com/pandas-dev/pandas (fetch)
upstream	https://github.com/pandas-dev/pandas (push)

but it may be named differently.

@colinhiggins
Copy link
Author

@TomAugspurger:

What's the output of git remote -v? That should include a couple lines like

$ git remote -v
origin	[email protected]:colinhiggins/pandas.git (fetch)
origin	[email protected]:colinhiggins/pandas.git (push)

I tried a few different forms, all failed with the same error, including:
https://github.com/pandas-dev/pandas
[email protected]:pandas-dev/pandas.git
git://github.com/pandas-dev/pandas

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 21, 2017 via email

@colinhiggins
Copy link
Author

colinhiggins commented Jun 21, 2017

Ah, right, duh. It ran with no output. Is that a pass?

@codecov
Copy link

codecov bot commented Jun 21, 2017

Codecov Report

Merging #16750 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #16750   +/-   ##
=======================================
  Coverage   90.93%   90.93%           
=======================================
  Files         161      161           
  Lines       49282    49282           
=======================================
  Hits        44816    44816           
  Misses       4466     4466
Flag Coverage Δ
#multiple 88.7% <ø> (ø) ⬆️
#single 40.17% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/json/json.py 100% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a98f5e...12b6012. Read the comment docs.

@colinhiggins
Copy link
Author

Reverted back to the path_or_buf API to maintain backward compatibility, added the whatsnew blurb, and fixed an indentation PEP8 violation.

@TomAugspurger
Copy link
Contributor

Changes look good, but we'll need tests for the new behavior.

You can compress https://github.com/pandas-dev/pandas/blob/master/pandas/tests/io/json/data/tsframe_iso_v012.json and add it to the repo, and then ensure we have basic coverage on this. We should also have a jsonlines version too probably.

@TomAugspurger TomAugspurger added this to the 0.21.0 milestone Jun 22, 2017
@TomAugspurger TomAugspurger added IO Data IO issues that don't fit into a more specific label IO JSON read_json, to_json, json_normalize labels Jun 22, 2017
@jreback
Copy link
Contributor

jreback commented Jun 22, 2017

see #13317 for how to do tests on this.

This needs similar types of testing.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add to .to_json as well. then tests become much easier.

@@ -24,6 +24,7 @@ New features
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
- Added ``__fspath__`` method to :class:`~pandas.HDFStore`, :class:`~pandas.ExcelFile`,
and :class:`~pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)
- The ``read_json`` method now supports a ``compression`` keyword, which allows you to read compressed json directly. The behavior of this is identical to the ``read_csv`` keyword and defaults to ``infer``. (:issue:`15644`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a new doc section in io.json (again see how the io.pickle compression section is done).

@@ -258,6 +258,13 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,

.. versionadded:: 0.19.0

compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use gzip,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs version added. also needs to be indendted as this won't render I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see pd.read_pickle doc-string as well.

@colinhiggins
Copy link
Author

Got it. I can take a crack at the to_json and tests, but it will have to wait for the weekend.

compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use gzip,
bz2, zip or xz if filepath_or_buffer is a string ending in '.gz', '.bz2',
'.zip', or 'xz', respectively, and no decompression otherwise. If using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a . Before xz

@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

closing this. it needs comprehensive tests as indicated above. pls comment if you'd like to reopen.

@jreback jreback closed this Jul 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: add gzip/bz2 compression to relevant read_* methods
4 participants