This patch adds an additional *isinstance* check within the ast parser to check
for float along with int, fixing the underlying issue.
Co-Authored: Markus Heiser <markus.heiser@darmarit.de>
The user_agent attribute of the Flask request object is an instance of
the werkzeug.user_agent.UserAgent class.
This will fix the following error of the self_info plugin:
> ERROR:searx.plugins.self_info: Exception while calling post_search
> Traceback (most recent call last):
> File "searx/plugins/__init__.py", line 203, in call
> ret = getattr(plugin, plugin_type)(*args, **kwargs)
> File "searx/plugins/self_info.py", line 31, in post_search
> search.result_container.answers['user-agent'] = {'answer': gettext('Your user-agent is: ') + ua}
> TypeError: can only concatenate str (not "UserAgent") to str
The intention of this PR is to modernize the settings_loader implementations.
The concept is old (remember, this is partly from 2014), back then we only had
one config file, meanwhile we have had a folder with config files for a very
long time. Callers can now load a YAML configuration from this folder as
follows ::
settings_loader.get_yaml_cfg('my-config.yml')
- BTW this is a fix of #3557.
- Further the `existing_filename_or_none` construct dates back to times when
there was not yet a `pathlib.Path` in all Python versions we supported in the
past.
- Typehints have been added wherever appropriate
At the same time, this patch should also be downward compatible and not
introduce a new environment variable. The localization of the folder with the
configurations is further based on:
SEARXNG_SETTINGS_PATH (wich defaults to /etc/searxng/settings.yml)
Which means, the default config folder is `/etc/searxng/`.
ATTENTION: intended functional changes!
If SEARXNG_SETTINGS_PATH was set and pointed to a not existing file, the
previous implementation silently loaded the default configuration. This
behavior has been changed: if the file or folder does not exist, an
EnvironmentError exception will be thrown in future.
Closes: https://github.com/searxng/searxng/issues/3557
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This change does the following things:
- the `ip` keyword is now case-insensitive
- if the query includes `my ip` it will now also match
In order to avoid too many false matches, the `ip` keyword alone matches only if
it's the _only_ word, but the inclusion of `my` loosens that to be inclusive of
users type a phrase (eg, "what is my ip", "tell me my ip", "my IP address",
etc).
Better answer context
Previously this plugin simply dumped your IP or user-agent string as an answer.
This tiny change just adds some text to contextualize those answers (eg, "Your
IP is: 1.2.3.4" instead of just "1.2.3.4").
- l10n support: parse and format decimal numbers by babel
- ability to add additional units
- improved unit detection (symbols are not unique)
- support for alias units (0,010C to F --> 32,018 °F)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the past, some files were tested with the standard profile, others with a
profile in which most of the messages were switched off ... some files were not
checked at all.
- ``PYLINT_SEARXNG_DISABLE_OPTION`` has been abolished
- the distinction ``# lint: pylint`` is no longer necessary
- the pylint tasks have been reduced from three to two
1. ./searx/engines -> lint engines with additional builtins
2. ./searx ./searxng_extra ./tests -> lint all other python files
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch was inspired by the discussion around PR-2882 [2]. The goals of this
patch are:
1. Convert plugin searx.plugin.limiter to normal code [1]
2. isolation of botdetection from the limiter [2]
3. searx/{tools => botdetection}/config.py and drop searx.tools
4. in URL /config, 'limiter.enabled' is true only if the limiter is really
enabled (Redis is available).
This patch moves all the code that belongs to botdetection into namespace
searx.botdetection and code that belongs to limiter is placed in namespace
searx.limiter.
Tthe limiter used to be a plugin at some point botdetection was added, it was
not a plugin. The modularization of these two components was long overdue.
With the clear modularization, the documentation could then also be organized
according to the architecture.
[1] https://github.com/searxng/searxng/pull/2882
[2] https://github.com/searxng/searxng/pull/2882#issuecomment-1741716891
To test:
- check the app works without the limiter, check `/config`
- check the app works with the limiter and with the token, check `/config`
- make docs.live .. and read
- http://0.0.0.0:8000/admin/searx.limiter.html
- http://0.0.0.0:8000/src/searx.botdetection.html#botdetection
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
For correct determination of the IP to the request the function
botdetection.get_real_ip() is implemented. This fonction is used in the
ip_limit and link_token method of the botdetection and it is used in the
self_info plugin.
A documentation about the X-Forwarded-For header has been added.
[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566211059
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In order to be able to meet the outstanding requirements, the implementation is
modularized and supplemented with documentation.
This patch does not contain functional change, except it fixes issue #2455
----
Aktivate limiter in the settings.yml and simulate a bot request by::
curl -H 'Accept-Language: de-DE,en-US;q=0.7,en;q=0.3' \
-H 'Accept: text/html'
-H 'User-Agent: xyz' \
-H 'Accept-Encoding: gzip' \
'http://127.0.0.1:8888/search?q=foo'
In the LOG:
DEBUG searx.botdetection.link_token : missing ping for this request: .....
Since ``BURST_MAX_SUSPICIOUS = 2`` you can repeat the query above two time
before you get a "Too Many Requests" response.
Closes: https://github.com/searxng/searxng/issues/2455
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Block requests from PetalBlock. Normally robots.txt is enough to stop
PetalBlock from making requests [1]. However, if SearXNG is offered below a
path (example.org/search), then the robots.txt is not available in the root
paths of the domain / subdomain.
[1] https://webmaster.petalsearch.com/site/petalbot
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- requests without HTTP header 'Connection' or missing 'User-Agent' will be
blocked by the limiter
- re_bot is related to 'User-Agent' and has been renamed to block_user_agent
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In debug mode more detailed logging is needed to evaluate if an access should
have been blocked by the limiter.
BTW: remove duplicate code checking bot signature ``re_bot.match(user_agent)``
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
When the user choose "Auto-detected", the choice remains on the following queries.
The detected language is displayed.
For example "Auto-detected (en)":
* the next query language is going to be auto detected
* for the current query, the detected language is English.
This replace the autodetect_search_language plugin.
- Add documentation to the plugin
- Harmonize FastText language model with SearXNG's language model
Reosurces::
import fasttext # --> +10 MB
fasttext.load_model(str(data_dir / 'lid.176.ftz')) # --> +4MB
Suggested-by: @dalf
- To speed up and simplify the deployment use fasttext-wheel instead of fasttext
- Building numpy on the Alpine Linux of docker-images takes ages --> install
py3-numpy from Alpines package manager (apk)
- Alpine Linux on docker-images (musl libc) do not support fasttext-wheel (gnu
libc) --> patch Dockerfile and build from fastetxt:
sed -i s/fasttext-wheel/fasttext/ requirements.txt
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Currentty, when oa_doi_rewrite find a DOI in the result URL, it replace the URL.
In this commit, the plugin adds the key "doi" to the result,
so the paper.html can show it.