-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fixes timezone drop when encoding it in msgpack #25195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @cgangwar11! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on February 09, 2019 at 18:27 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #25195 +/- ##
===========================================
- Coverage 92.35% 42.84% -49.51%
===========================================
Files 166 166
Lines 52314 52315 +1
===========================================
- Hits 48312 22414 -25898
- Misses 4002 29901 +25899
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25195 +/- ##
=========================================
Coverage ? 92.16%
=========================================
Files ? 166
Lines ? 52314
Branches ? 0
=========================================
Hits ? 48214
Misses ? 4100
Partials ? 0
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.24.2.rst
Outdated
@@ -52,7 +52,7 @@ Bug Fixes | |||
**I/O** | |||
|
|||
- Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`) | |||
- | |||
- Bug in retaining timezone in DateTimeIndex when saving and loading the DataFrame from msgpack. (:issue:`25148`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double back ticks around msgpack. Single backticks around DatetimeIndex and DataFrame with :class:
before each
pandas/tests/io/test_packers.py
Outdated
@@ -393,6 +393,12 @@ def categorical_index(self): | |||
result = self.encode_decode(df) | |||
tm.assert_frame_equal(result, df) | |||
|
|||
def test_timezone_encode_decode(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you parameterize using the tz_aware_fixture
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I suspect the fix is too narrowly focused on FixedOffets.
pandas/io/packers.py
Outdated
@@ -390,7 +390,8 @@ def encode(obj): | |||
|
|||
# store tz info and data as UTC | |||
if tz is not None: | |||
tz = u(tz.zone) | |||
tz = u(tz.zone) if not isinstance(tz, _FixedOffset) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the timezone is not a FixedOffset? e.g. pytz.UTC
or dateutil.tz.tzutc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke
Encoding process of TimeStamp is using zone attribute and normalize the datetime to utc.
Now there is no zone attribute in FixedOffset hence we will not be able to encode it and loosing the offset value in the process.
So i extracted offset value and encode that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> Timestamp('2018-12-02 14:50:00',tz='UTC').tzinfo.zone
'UTC'
>>> Timestamp('2018-12-02 14:50:00',tz=pytz.FixedOffset(-420)).tzinfo.zone is None
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for the explanation.
I'd change this to a try
/except
block instead of an if
/else
.
Another option would be to str
the timezone otherwise?
In [3]: str(tz)
Out[3]: 'pytz.FixedOffset(60)'
In [4]: import dateutil
In [5]: str(dateutil.tz.tzutc())
Out[5]: 'tzutc()'
In [7]: dateutil.tz.tzutc().zone
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-d8163524e913> in <module>()
----> 1 dateutil.tz.tzutc().zone
AttributeError: 'tzutc' object has no attribute 'zone'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke
Converting the timezone object into string to encode is a great idea.
But I will have to use something like this in decoding side.
tz = eval('pytz.FixedOffset(60)' )
Using eval doesn't seem like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'd prefer your minutes extraction method then.
I imagine there will need to be separate handling for dateutil timezones all together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there must be something going else wrong here. The data is stored as utc and the tz is stored as well. what exactly is the problem?
doc/source/whatsnew/v0.24.2.rst
Outdated
@@ -52,7 +52,7 @@ Bug Fixes | |||
**I/O** | |||
|
|||
- Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`) | |||
- | |||
- Bug in retaining timezone in :class:`DateTimeIndex` when saving and loading the :class"`DataFrame` from ``msgpack``. (:issue:`25148`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeIndex, can you use :meth:read_msgpack
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
Yes we were encoding tz using two value (zone as a str) and (Time value in utc format).
As per pandas DateTimeIndex documentation tz could be
tz : pytz.timezone or dateutil.tz.tzfile
The current encoding version is capable of encoding pytz.timezone but not the dateutil timezones.
Infact code will throw this exception when we try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> df1 = pd.DataFrame({'A':pd.date_range(start="2018-08-01",end="2018-08-30",tz=dateutil.tz.tzutc())})
>>> df1.to_msgpack("~/xyz.msg")
>>> a = pd.read_msgpack("~/xyz.msg")
Traceback (most recent call last):
File "/home/chandan/Softwares/pandas/pandas/core/dtypes/common.py", line 2037, in pandas_dtype
npdtype = np.dtype(dtype)
TypeError: Invalid datetime unit in metadata string "[ns, tzutc()]"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/chandan/Softwares/pandas/pandas/io/packers.py", line 207, in read_msgpack
return read(fh)
File "/home/chandan/Softwares/pandas/pandas/io/packers.py", line 187, in read
unpacked_obj = list(unpack(fh, encoding=encoding, **kwargs))
File "pandas/io/msgpack/_unpacker.pyx", line 483, in pandas.io.msgpack._unpacker.Unpacker.__next__
return self._unpack(unpack_construct, None, 1)
File "pandas/io/msgpack/_unpacker.pyx", line 410, in pandas.io.msgpack._unpacker.Unpacker._unpack
ret = execute(&self.ctx, self.buf, self.buf_tail, &self.buf_head)
File "/home/chandan/Softwares/pandas/pandas/io/packers.py", line 667, in decode
blocks = [create_block(b) for b in obj[u'blocks']]
File "/home/chandan/Softwares/pandas/pandas/io/packers.py", line 667, in <listcomp>
blocks = [create_block(b) for b in obj[u'blocks']]
File "/home/chandan/Softwares/pandas/pandas/io/packers.py", line 648, in create_block
b[u'compress']), b[u'shape'])
File "/home/chandan/Softwares/pandas/pandas/io/packers.py", line 322, in unconvert
dtype = pandas_dtype(dtype).base
File "/home/chandan/Softwares/pandas/pandas/core/dtypes/common.py", line 2043, in pandas_dtype
dtype))
TypeError: data type 'datetime64[ns, tzutc()]' not understood
an you merge master and update |
good idea, but closing as stale |
git diff upstream/master -u -- "*.py" | flake8 --diff
After