-
Notifications
You must be signed in to change notification settings - Fork 429
Feature request: KinesisDataStreamEnvelope().parse() method seems to be missing data decompression step #6625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening your first issue here! We'll come back to you as soon as we can. |
Hi @Artur-T-Malas, thanks for opening this issue and reporting this! While I recognize that one of the possible use cases for Kinesis is ingesting CloudWatch logs, this is not the only use case. I say this because if we assume that we always need to base64 decode + gzip decompress and then return the value, we can break all use cases where customers use Kinesis to not parse CloudWatch logs and using plain text records. That said, I agree that we should solve this problem and improve this experience, but we need a solution that is able to handle both cases: gziped and plain text. If you have any solution, I'm more than happy to accept a PR, if not, I'll work this on the next week. Thanks |
Hi @leandrodamascena. I've created a PR #6656. To be honest it's my first PR to a project this big, so I'll greatly appreciate tips on what could be better. I hope that my solution with wrapping it in a |
|
Expected Behaviour
When provided with an event and a Pydantic model, the envelope's
parse()
method should correctly parse the data, including decompression of it (as as far as I am aware, it is compressed by the Kinesis itself).Current Behaviour
My Lambda function receives CloudWatch Logs via a Kinesis Data Stream. When using event parser with the
KinesisDataStreamEnvelope
either as a annotation abovelambda_handler
or explicitly calling envelope'sparse
method, I keep getting aUnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
.After debugging it appears as if there is a decompression step missing in the envelope's
parse
method between casting to bytes and decoding using 'utf-8'.Code snippet
Possible Solution
When manually parsing Kinesis Data Stream input in a Lambda function and locally (using a saved event), the following code works:
Fixing the envelope
To fix the envelope itself, importing
gzip
and adding a line:data = gzip.decompress(data)
in utilities/parser/envelopes/kinesis.py file between casting to bytes and decoding using 'utf-8' fixes the issue.
Since this could be an edge case, it should also be fine to wrap the
models.append(self._parse(data=data.decode('utf-8')...
line in atry
/except
clause catching theUnicodeDecodeError
exception and then performing the decompression before trying again to decode using 'utf-8'.Steps to Reproduce
parse
methodThank you
Powertools for AWS Lambda (Python) version
3.9.0
AWS Lambda function runtime
3.12
Packaging format used
PyPi
Debugging logs
The text was updated successfully, but these errors were encountered: