[fix] TrackerPatternsDB.clean_url: don't delete query argument from new_url (#5339)

The query argument for URLs like:

- 'http://example.org?q='       --> query_str is 'q='
- 'http://example.org?/foo/bar' --> query_str is 'foo/bar'

is a *simple string* and not a key/value dict.  This string may only be removed
from the URL if one of the patterns matches.

BTW get_pretty_url(): keep such a *simple string* in the path element.

Closes: https://github.com/searxng/searxng/issues/5299

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This commit is contained in:
Markus Heiser
2025-10-20 11:20:33 +02:00
committed by GitHub
parent d84ae96cf9
commit 33e798b01b
2 changed files with 36 additions and 9 deletions

View File

@@ -356,6 +356,12 @@ def get_pretty_url(parsed_url: urllib.parse.ParseResult):
path = parsed_url.path
path = path[:-1] if len(path) > 0 and path[-1] == '/' else path
path = unquote(path.replace("/", " "))
# Keep the query argument for URLs like:
# - 'http://example.org?/foo/bar' --> parsed_url.query is 'foo/bar'
query_args: list[tuple[str, str]] = list(urllib.parse.parse_qsl(parsed_url.query))
if not query_args and parsed_url.query:
path += (" .." if len(parsed_url.query) > 24 else " ") + parsed_url.query[-24:]
return [parsed_url.scheme + "://" + parsed_url.netloc, path]