Skip to content

[RFC] Add RFC 3986 and WHATWG compliant URL parsing support #14461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

kocsismate
Copy link
Member

@kocsismate kocsismate commented Jun 3, 2024

2nd take after the failed experiment with #11315

RFC: https://wiki.php.net/rfc/url_parsing_api

@@ -3715,7 +3715,6 @@ function uniqid(string $prefix = "", bool $more_entropy = false): string {}

/**
* @return int|string|array<string, int|string>|null|false
* @compile-time-eval
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed for benchmarking purposes


static void cleanup_parser(void)
{
if (++URL_G(urls) % 500 == 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach is copy-pasted from lexbor/lexbor#206

@kocsismate kocsismate force-pushed the ext-url2 branch 5 times, most recently from 823e4c6 to e65deb7 Compare June 6, 2024 08:23
@kocsismate kocsismate changed the title Add ext/url based on Lexbor [RFC] Add ext/url based on Lexbor Jun 12, 2024
kocsismate added a commit to kocsismate/php-src that referenced this pull request May 11, 2025
kocsismate added a commit to kocsismate/php-src that referenced this pull request May 11, 2025
kocsismate added a commit to kocsismate/php-src that referenced this pull request May 13, 2025
kocsismate added a commit to kocsismate/php-src that referenced this pull request May 13, 2025
kocsismate added a commit to kocsismate/php-src that referenced this pull request May 13, 2025

zval property_errors_default_value;
ZVAL_UNDEF(&property_errors_default_value);
zend_string *property_errors_name = zend_string_init("errors", sizeof("errors") - 1, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the new strings from zend_string.h to PropertyInfo::PHP_85_KNOWN in gen_stub.php so they can be used here

lexbor_str_init_append(&str, lexbor_parser->mraw, (const lxb_char_t *) Z_STRVAL_P(value), Z_STRLEN_P(value)); \
} else if (Z_TYPE_P(value) == IS_LONG && Z_LVAL_P(value) != 0) { \
ZVAL_STR(value, zend_long_to_str(Z_LVAL_P(value))); \
lexbor_str_init_append(&str, lexbor_parser->mraw, (const lxb_char_t *) Z_STRVAL_P(value), Z_STRLEN_P(value)); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few lines here indented with spaces rather than tabs

case URI_COMPONENT_READ_NORMALIZED_UNICODE: /* Intentional fallthrough */ \
case URI_COMPONENT_READ_NORMALIZED_ASCII: { \
ZVAL_STRINGL(retval, (const char *) start, len); \
break; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
break; \
break; \


if (lexbor_uri->path.opaque) {
LEXBOR_READ_ASCII_URI_COMPONENT(lexbor_uri->path.str.data, lexbor_uri->path.str.length, read_mode, retval);
} else if (lexbor_uri->path.str.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two blocks look the same - can they be combined?

return smart_str_extract(&uri_str);
}

static void lexbor_free_uri(void *uri)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this function isn't needed, suggest removing the handler and making it optional in the lexbor_uri_handler specification above

if (uri_handler_register(&parse_url_uri_handler) == FAILURE) {
return FAILURE;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just return uri_handler_register(&parse_url_uri_handler); ?

ZVAL_STRINGL(retval, (const char *) start, len); \
break; \
} \
EMPTY_SWITCH_DEFAULT_CASE() \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the EMPTY_SWITCH_DEFAULT_CASE() and the fact that all of the cases fall through, what is the point of even having a switch?

zend_string *that_str = that_internal_uri->handler->uri_to_string(
that_internal_uri->uri, URI_RECOMPOSITION_NORMALIZED_ASCII, exclude_fragment);
if (that_str == NULL) {
zend_throw_exception_ex(NULL, 0, "Cannot recompose %s to string", ZSTR_VAL(that_object->ce->name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this_str be released here?


ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_class_Uri_Rfc3986_Uri_parse, 0, 1, IS_STATIC, 1)
ZEND_ARG_TYPE_INFO(0, uri, IS_STRING, 0)
ZEND_ARG_OBJ_INFO_WITH_DEFAULT_VALUE(0, baseUrl, Uri\\Rfc3986\\\125ri, 1, "null")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has the parameter type as Uri\\Rfc3986\\\125ri - any idea why it isn't Uri\\Rfc3986\\Uri? Does it matter? Similar below

};

#define URIPARSER_PARSE_STR(uri_str, internal_uri, errors) do { \
zend_string *str = smart_str_extract(&uri_str); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line got indented with spaces rather than a tab

nielsdos pushed a commit to kocsismate/php-src that referenced this pull request May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants