str: utf8 encoder allows encoding invalid code points #6943

thestinger · 2013-06-05T02:38:31Z

The highest valid code point is 1114111 (0x10FFFF) and the modern UTF-8
standard guarantees that the maximum number of bytes needed to encode a
code point is 4 (instead of 6, in the legacy standard).

From https://tools.ietf.org/html/rfc3629:

Changes from RFC 2279

o Restricted the range of characters to 0000-10FFFF (the UTF-16
accessible range).

thestinger · 2013-06-08T06:03:30Z

Nominating this for the backwards compatible milestone. A simple attempted fix breaks a test in the json module so we have at least a bit of code depending on this incorrect behaviour.

#5151 (the failed pull request) has more details

bluss · 2013-06-24T17:18:39Z

See also issue #3787

thestinger · 2013-07-09T20:11:06Z

Fixed.

Ignore str::len() in or_fun_call lint. changelog: Changed `or_fun_call` to ignore `str::len`, in the same way it ignores `slice::len` and `array::len` Closes rust-lang#6943

thestinger closed this as completed Jul 9, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

str: utf8 encoder allows encoding invalid code points #6943

str: utf8 encoder allows encoding invalid code points #6943

thestinger commented Jun 5, 2013

thestinger commented Jun 8, 2013

bluss commented Jun 24, 2013

thestinger commented Jul 9, 2013

str: utf8 encoder allows encoding invalid code points #6943

str: utf8 encoder allows encoding invalid code points #6943

Comments

thestinger commented Jun 5, 2013

thestinger commented Jun 8, 2013

bluss commented Jun 24, 2013

thestinger commented Jul 9, 2013