searxng/searx/plugins/tracker_url_remover.py

'''
searx is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

searx is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with searx. If not, see < http://www.gnu.org/licenses/ >.

(C) 2015 by Adam Tauber, <asciimoo@gmail.com>
'''

from flask_babel import gettext
import re
from searx.url_utils import urlunparse, parse_qsl, urlencode

regexes = {re.compile(r'utm_[^&]+'),
           re.compile(r'(wkey|wemail)[^&]*'),
           re.compile(r'&$')}

name = gettext('Tracker URL remover')
description = gettext('Remove trackers arguments from the returned URL')
default_on = True
preference_section = 'privacy'


def on_result(request, search, result):
    if 'parsed_url' not in result:
        return True

    query = result['parsed_url'].query

    if query == "":
        return True
    parsed_query = parse_qsl(query)

    changes = 0
    for i, (param_name, _) in enumerate(list(parsed_query)):
        for reg in regexes:
            if reg.match(param_name):
                parsed_query.pop(i - changes)
                changes += 1
                result['parsed_url'] = result['parsed_url']._replace(query=urlencode(parsed_query))
                result['url'] = urlunparse(result['parsed_url'])
                break

    return True
Add a plugin to remove trackers from results URLs 2015-06-09 14:16:07 +00:00			`'''`
			`searx is free software: you can redistribute it and/or modify`
			`it under the terms of the GNU Affero General Public License as published by`
			`the Free Software Foundation, either version 3 of the License, or`
			`(at your option) any later version.`

			`searx is distributed in the hope that it will be useful,`
			`but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`GNU Affero General Public License for more details.`

			`You should have received a copy of the GNU Affero General Public License`
			`along with searx. If not, see < http://www.gnu.org/licenses/ >.`

			`(C) 2015 by Adam Tauber, <asciimoo@gmail.com>`
			`'''`

[enh][fix] update to latest dependencies ++ fix tests & travis test runner WARNING: dependency changes 2016-07-04 20:46:43 +00:00			`from flask_babel import gettext`
Add a plugin to remove trackers from results URLs 2015-06-09 14:16:07 +00:00			`import re`
[fix] update query params sanitization - closes #722 2019-10-14 12:58:20 +00:00			`from searx.url_utils import urlunparse, parse_qsl, urlencode`
Add a plugin to remove trackers from results URLs 2015-06-09 14:16:07 +00:00
[fix] update query params sanitization - closes #722 2019-10-14 12:58:20 +00:00			`regexes = {re.compile(r'utm_[^&]+'),`
			`re.compile(r'(wkey\|wemail)[^&]*'),`
A bit of cleanup of the code - regexes in a array - regexes applied only on the last part of the url 2015-06-15 18:34:02 +00:00			`re.compile(r'&$')}`
Add a plugin to remove trackers from results URLs 2015-06-09 14:16:07 +00:00
			`name = gettext('Tracker URL remover')`
			`description = gettext('Remove trackers arguments from the returned URL')`
			`default_on = True`
[enh] add simple theme (WIP) 2017-02-12 14:06:01 +00:00			`preference_section = 'privacy'`
Add a plugin to remove trackers from results URLs 2015-06-09 14:16:07 +00:00

Change plugin API : - pre_search(request, search) - post_search(request, search) - on_result(request, search, result) with - request is the Flask request - search a searx.Search instance - result a searx result as usual 2016-10-22 12:01:53 +00:00			`def on_result(request, search, result):`
add initial support for offline engines && command engine 2019-09-23 15:14:32 +00:00			`if 'parsed_url' not in result:`
			`return True`

Change plugin API : - pre_search(request, search) - post_search(request, search) - on_result(request, search, result) with - request is the Flask request - search a searx.Search instance - result a searx result as usual 2016-10-22 12:01:53 +00:00			`query = result['parsed_url'].query`
Add a plugin to remove trackers from results URLs 2015-06-09 14:16:07 +00:00
Use parsed_url 2015-06-18 12:27:15 +00:00			`if query == "":`
A bit of cleanup of the code - regexes in a array - regexes applied only on the last part of the url 2015-06-15 18:34:02 +00:00			`return True`
[fix] update query params sanitization - closes #722 2019-10-14 12:58:20 +00:00			`parsed_query = parse_qsl(query)`

fix out of range error in tracker remover plugin 2019-10-23 06:17:00 +00:00			`changes = 0`
[fix] pep8 2019-10-14 13:09:39 +00:00			`for i, (param_name, _) in enumerate(list(parsed_query)):`
[fix] update query params sanitization - closes #722 2019-10-14 12:58:20 +00:00			`for reg in regexes:`
			`if reg.match(param_name):`
fix out of range error in tracker remover plugin 2019-10-23 06:17:00 +00:00			`parsed_query.pop(i - changes)`
			`changes += 1`
			`result['parsed_url'] = result['parsed_url']._replace(query=urlencode(parsed_query))`
			`result['url'] = urlunparse(result['parsed_url'])`
[fix] update query params sanitization - closes #722 2019-10-14 12:58:20 +00:00			`break`

Add a plugin to remove trackers from results URLs 2015-06-09 14:16:07 +00:00			`return True`