Skip to content

Fix GH-9535: The behavior of mb_strcut in mbstring has been changed in PHP8.1 #9562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ext/mbstring/libmbfl/mbfl/mbfilter.c
Original file line number Diff line number Diff line change
Expand Up @@ -1276,6 +1276,7 @@ mbfl_strcut(
bk = _bk;
}

decoder->illegal_mode = MBFL_OUTPUTFILTER_ILLEGAL_MODE_NONE;
(*encoder->filter_flush)(encoder);

if (bk.decoder.filter_dtor)
Expand Down
210 changes: 210 additions & 0 deletions ext/mbstring/tests/gh9535.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
--TEST--
GH-9535 (mb_strcut(): The behavior of mb_strcut in mbstring has been changed in PHP8.1)
--EXTENSIONS--
mbstring
--FILE--
<?php
$encodings = [
'BASE64',
'HTML-ENTITIES',
'Quoted-Printable',
'UTF-16',
'UTF-16BE',
'UTF-16LE',
'UTF-7',
'UTF7-IMAP',
'JIS',
'ISO-2022-JP',
'ISO-2022-JP-MS',
'GB18030',
'HZ',
'ISO-2022-KR',
'ISO-2022-JP-2004',
'ISO-2022-JP-MOBILE#KDDI',
'CP50220',
'CP50221',
'CP50222',
];

$input = '宛如繁星般宛如皎月般';
$bytes_length = 15;
foreach($encodings as $encoding) {
$converted_str = mb_convert_encoding($input, $encoding, mb_internal_encoding());
$cut_str = mb_strcut($converted_str, 0, $bytes_length, $encoding);
$reconverted_str = mb_convert_encoding($cut_str, mb_internal_encoding(), $encoding);
echo $encoding.': '.$reconverted_str.PHP_EOL;
}

echo PHP_EOL;

$input = '星のように月のように';
$bytes_length = 20;
foreach($encodings as $encoding) {
$converted_str = mb_convert_encoding($input, $encoding, mb_internal_encoding());
$cut_str = mb_strcut($converted_str, 0, $bytes_length, $encoding);
$reconverted_str = mb_convert_encoding($cut_str, mb_internal_encoding(), $encoding);
echo $encoding.': '.$reconverted_str.PHP_EOL;
}

echo PHP_EOL;

$input = 'あaいb';
$bytes_length = 10;
foreach($encodings as $encoding) {
$converted_str = mb_convert_encoding($input, $encoding, mb_internal_encoding());
$cut_str = mb_strcut($converted_str, 0, $bytes_length, $encoding);
$reconverted_str = mb_convert_encoding($cut_str, mb_internal_encoding(), $encoding);
echo $encoding.': '.$reconverted_str.PHP_EOL;
}

echo PHP_EOL;

$input = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA';
$bytes_length = 10;
foreach($encodings as $encoding) {
$converted_str = mb_convert_encoding($input, $encoding, mb_internal_encoding());
$cut_str = mb_strcut($converted_str, 0, $bytes_length, $encoding);
$reconverted_str = mb_convert_encoding($cut_str, mb_internal_encoding(), $encoding);
echo $encoding.': '.$reconverted_str.PHP_EOL;
}

echo PHP_EOL;

$input = '???';
$bytes_length = 2;
foreach($encodings as $encoding) {
$converted_str = mb_convert_encoding($input, $encoding, mb_internal_encoding());
$cut_str = mb_strcut($converted_str, 0, $bytes_length, $encoding);
$reconverted_str = mb_convert_encoding($cut_str, mb_internal_encoding(), $encoding);
echo $encoding.': '.$reconverted_str.PHP_EOL;
}

echo PHP_EOL;

foreach($encodings as $encoding) {
var_dump(mb_strcut($input, 0, $bytes_length, $encoding));
}

?>
--XFAIL--
Discussion: https://github.com/php/php-src/pull/9562
--EXPECTF--
BASE64: 宛如繁
HTML-ENTITIES: 宛&#22914
Quoted-Printable: %s
UTF-16: 宛如繁星般宛如
UTF-16BE: 宛如繁星般宛如
UTF-16LE: 宛如繁星般宛如
UTF-7: 宛如繁星
UTF7-IMAP: 宛如繁星
JIS: 宛如繁星般
ISO-2022-JP: 宛如繁星般
ISO-2022-JP-MS: 宛如繁星
GB18030: 宛如繁星般宛如
HZ: 宛如繁星般
ISO-2022-KR: 宛如繁星
ISO-2022-JP-2004: 宛如繁星
ISO-2022-JP-MOBILE#KDDI: 宛如繁星
CP50220: 宛如繁星
CP50221: 宛如繁星
CP50222: 宛如繁星

BASE64: 星のように
HTML-ENTITIES: 星の&#12
Quoted-Printable: 星の
UTF-16: 星のように月のように
UTF-16BE: 星のように月のように
UTF-16LE: 星のように月のように
UTF-7: 星のように月
UTF7-IMAP: 星のように月
JIS: 星のように月の
ISO-2022-JP: 星のように月の
ISO-2022-JP-MS: 星のように月の
GB18030: 星のように月のように
HZ: 星のように月のよ
ISO-2022-KR: 星のように月の
ISO-2022-JP-2004: 星のように月の
ISO-2022-JP-MOBILE#KDDI: 星のように月の
CP50220: 星のように月の
CP50221: 星のように月の
CP50222: 星のように月の

BASE64: %s
HTML-ENTITIES: あa&
Quoted-Printable: あa
UTF-16: あaいb
UTF-16BE: あaいb
UTF-16LE: あaいb
UTF-7: あa
UTF7-IMAP: あa
JIS: あa
ISO-2022-JP: あa
ISO-2022-JP-MS: あa
GB18030: あaいb
HZ: あa
ISO-2022-KR: あa
ISO-2022-JP-2004: あa
ISO-2022-JP-MOBILE#KDDI: あa
CP50220: あa
CP50221: あa
CP50222: あa

BASE64: AAAAAA
HTML-ENTITIES: AAAAAAAAAA
Quoted-Printable: AAAAAAAAAA
UTF-16: AAAAA
UTF-16BE: AAAAA
UTF-16LE: AAAAA
UTF-7: AAAAAAAAAA
UTF7-IMAP: AAAAAAAAAA
JIS: AAAAAAAAAA
ISO-2022-JP: AAAAAAAAAA
ISO-2022-JP-MS: AAAAAAAAAA
GB18030: AAAAAAAAAA
HZ: AAAAAAAAAA
ISO-2022-KR: AAAAAAAAAA
ISO-2022-JP-2004: AAAAAAAAAA
ISO-2022-JP-MOBILE#KDDI: AAAAAAAAAA
CP50220: AAAAAAAAAA
CP50221: AAAAAAAAAA
CP50222: AAAAAAAAAA

BASE64:%s
HTML-ENTITIES: ??
Quoted-Printable: ??
UTF-16: ?
UTF-16BE: ?
UTF-16LE: ?
UTF-7: ??
UTF7-IMAP: ??
JIS: ??
ISO-2022-JP: ??
ISO-2022-JP-MS: ??
GB18030: ??
HZ: ??
ISO-2022-KR: ??
ISO-2022-JP-2004: ??
ISO-2022-JP-MOBILE#KDDI: ??
CP50220: ??
CP50221: ??
CP50222: ??

string(0) ""
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"
string(2) "??"