Skip to content

str: utf8 encoder allows encoding invalid code points #6943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thestinger opened this issue Jun 5, 2013 · 3 comments
Closed

str: utf8 encoder allows encoding invalid code points #6943

thestinger opened this issue Jun 5, 2013 · 3 comments
Labels
A-Unicode Area: Unicode

Comments

@thestinger
Copy link
Contributor

The highest valid code point is 1114111 (0x10FFFF) and the modern UTF-8
standard guarantees that the maximum number of bytes needed to encode a
code point is 4 (instead of 6, in the legacy standard).

From https://tools.ietf.org/html/rfc3629:

Changes from RFC 2279

o Restricted the range of characters to 0000-10FFFF (the UTF-16
accessible range).

@thestinger
Copy link
Contributor Author

Nominating this for the backwards compatible milestone. A simple attempted fix breaks a test in the json module so we have at least a bit of code depending on this incorrect behaviour.

#5151 (the failed pull request) has more details

@bluss
Copy link
Member

bluss commented Jun 24, 2013

See also issue #3787

@thestinger
Copy link
Contributor Author

Fixed.

flip1995 pushed a commit to flip1995/rust that referenced this issue Mar 25, 2021
Ignore str::len() in or_fun_call lint.

changelog: Changed `or_fun_call` to ignore `str::len`, in the same way it ignores `slice::len` and `array::len`

Closes rust-lang#6943
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Unicode Area: Unicode
Projects
None yet
Development

No branches or pull requests

2 participants