Commit Graph

152 Commits

Author SHA1 Message Date
Markus Heiser 2039060b64 [mod] revision of the settings_loader
The intention of this PR is to modernize the settings_loader implementations.
The concept is old (remember, this is partly from 2014), back then we only had
one config file, meanwhile we have had a folder with config files for a very
long time.  Callers can now load a YAML configuration from this folder as
follows ::

    settings_loader.get_yaml_cfg('my-config.yml')

- BTW this is a fix of #3557.

- Further the `existing_filename_or_none` construct dates back to times when
  there was not yet a `pathlib.Path` in all Python versions we supported in the
  past.

- Typehints have been added wherever appropriate

At the same time, this patch should also be downward compatible and not
introduce a new environment variable. The localization of the folder with the
configurations is further based on:

    SEARXNG_SETTINGS_PATH (wich defaults to /etc/searxng/settings.yml)

Which means, the default config folder is `/etc/searxng/`.

ATTENTION: intended functional changes!

 If SEARXNG_SETTINGS_PATH was set and pointed to a not existing file, the
 previous implementation silently loaded the default configuration.  This
 behavior has been changed: if the file or folder does not exist, an
 EnvironmentError exception will be thrown in future.

Closes: https://github.com/searxng/searxng/issues/3557
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-07-14 18:10:06 +02:00
Markus Heiser acf3f109b2 [doc] hostname plugin: improve online documentation
The data types (list & map) should be made clearer, as these sometimes lead to
misunderstandings.

[1] https://github.com/searxng/searxng/issues/3558#issuecomment-2175058128

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-06-18 12:06:06 +02:00
Jeff Alyanak efd69c4ca9 [feat] plugin Self Information: improve keyword matching
This change does the following things:

- the `ip` keyword is now case-insensitive
- if the query includes `my ip` it will now also match

In order to avoid too many false matches, the `ip` keyword alone matches only if
it's the _only_ word, but the inclusion of `my` loosens that to be inclusive of
users type a phrase (eg, "what is my ip", "tell me my ip", "my IP address",
etc).

Better answer context

Previously this plugin simply dumped your IP or user-agent string as an answer.
This tiny change just adds some text to contextualize those answers (eg, "Your
IP is: 1.2.3.4" instead of just "1.2.3.4").
2024-06-17 14:12:37 +02:00
Bnyro f5eb56b63f [refactor] hostnames plugin: add fallback for old hostname_replace plugin
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2024-06-07 14:42:52 +02:00
Markus Heiser 845a0b678d [doc] add 'hostnames' plugin to the online documentation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-06-07 14:42:52 +02:00
Bnyro aa59bfbf60 [feat] hostname replace plugin: support for external list file 2024-06-07 14:42:52 +02:00
Bnyro 3bec04079c [feat] hostname replace plugin: possibility to prioritize certain websites
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2024-06-07 14:42:52 +02:00
Markus Heiser 056968cc39 [fix] unit converter operating backwards (from_si <-> to_si)
The factors for from_si and to_si were reversed.

Closes: https://github.com/searxng/searxng/issues/3497
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-05-28 11:58:13 +02:00
Bnyro 383d873597 [fix] unit converter plugin: can't be disabled in settings 2024-05-09 17:40:37 +02:00
Bnyro 72be98e12f [feat] plugins: new calculator plugin 2024-05-09 17:23:38 +02:00
Markus Heiser 742303d030 [mod] improve unit converter plugin
- l10n support: parse and format decimal numbers by babel
- ability to add additional units
- improved unit detection (symbols are not unique)
- support for alias units (0,010C to F --> 32,018 °F)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-05-09 17:16:31 +02:00
Bnyro 46efb2f36d [feat] plugins: new unit converter plugin 2024-04-27 18:11:33 +02:00
Markus Heiser 542f7d0d7b [mod] pylint all files with one profile / drop PYLINT_SEARXNG_DISABLE_OPTION
In the past, some files were tested with the standard profile, others with a
profile in which most of the messages were switched off ... some files were not
checked at all.

- ``PYLINT_SEARXNG_DISABLE_OPTION`` has been abolished
- the distinction ``# lint: pylint`` is no longer necessary
- the pylint tasks have been reduced from three to two

  1. ./searx/engines -> lint engines with additional builtins
  2. ./searx ./searxng_extra ./tests -> lint all other python files

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-11 14:55:38 +01:00
Markus Heiser 50d5a9ff60 [fix] issues reported by pylint 3.1.0
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-09 09:28:13 +01:00
Markus Heiser a7b51f023e [black] upgrade black 22.12.0 --> 24.2.0
The issue discussed in [1] has been solved since [2] has been merged into black
/ now we can upgrade without touching 69 files as it was needed with black
23.1.0 [3].

[1] https://github.com/searxng/searxng/pull/2159#issuecomment-1425723977
[2] https://github.com/psf/black/pull/4060
[3] https://github.com/searxng/searxng/pull/2159/files

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-09 08:15:50 +01:00
Markus Heiser fd814aac86 [mod] isolation of botdetection from the limiter
This patch was inspired by the discussion around PR-2882 [2].  The goals of this
patch are:

1. Convert plugin searx.plugin.limiter to normal code [1]
2. isolation of botdetection from the limiter [2]
3. searx/{tools => botdetection}/config.py and drop searx.tools
4. in URL /config, 'limiter.enabled' is true only if the limiter is really
   enabled (Redis is available).

This patch moves all the code that belongs to botdetection into namespace
searx.botdetection and code that belongs to limiter is placed in namespace
searx.limiter.

Tthe limiter used to be a plugin at some point botdetection was added, it was
not a plugin.  The modularization of these two components was long overdue.
With the clear modularization, the documentation could then also be organized
according to the architecture.

[1] https://github.com/searxng/searxng/pull/2882
[2] https://github.com/searxng/searxng/pull/2882#issuecomment-1741716891

To test:

- check the app works without the limiter, check `/config`
- check the app works with the limiter and with the token, check `/config`
- make docs.live .. and read
  - http://0.0.0.0:8000/admin/searx.limiter.html
  - http://0.0.0.0:8000/src/searx.botdetection.html#botdetection

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-11-01 06:44:56 +01:00
Emilien Devos 47721a3485 add new parameter called server.public_instance
for enabling by default advanced limiter functions
in the future allow us to add features just for the public instances
2023-09-25 22:31:14 +02:00
Markus Heiser 317db5b04f [mod] preferences: implement drop-down menu for hotkeys (default, vim)
Replace the on/off checkbox of the vim-hotkeys in the preferences by a drop-down
menu.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-09-19 08:06:06 +02:00
Bnyro a55e0ac553 [feat] search on category select without JS
Co-authored-by: Alexandre Flament <alex@al-f.net>
2023-09-18 21:29:11 +02:00
jazzzooo 223b3487c3 [fix] spelling 2023-09-18 16:20:27 +02:00
Markus Heiser 281e36f4b7 [fix] limiter: replace real_ip by IPv4/v6 network
Closes: https://github.com/searxng/searxng/issues/2477
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 15:51:14 +02:00
Markus Heiser 38431d2e14 [fix] correct determination of the IP for the request
For correct determination of the IP to the request the function
botdetection.get_real_ip() is implemented.  This fonction is used in the
ip_limit and link_token method of the botdetection and it is used in the
self_info plugin.

A documentation about the X-Forwarded-For header has been added.

[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566211059

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 14:38:53 +02:00
Markus Heiser b8c7c2c9aa [mod] botdetection - improve ip_limit and link_token methods
- counting requests in LONG_WINDOW and BURST_WINDOW is not needed when the
  request is validated by the link_token method [1]

- renew a ping-key on validation [2], this is needed for infinite scrolling,
  where no new token (CSS) is loaded. / this does not fix the BURST_MAX issue in
  the vanilla limiter

- normalize the counter names of the ip_limit method to 'ip_limit.*'

- just integrate the ip_limit method straight forward in the limiter plugin /
  non intermediate code --> ip_limit now returns None or a werkzeug.Response
  object that can be passed by the plugin to the flask application / non
  intermediate code that returns a tuple

[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566113277
[2] https://github.com/searxng/searxng/pull/2357#discussion_r1208542206
[3] https://github.com/searxng/searxng/pull/2357#issuecomment-1566125979

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 14:38:53 +02:00
Markus Heiser 66fdec0eb9 [mod] limiter: add config file /etc/searxng/limiter.toml
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-01 14:38:53 +02:00
Markus Heiser 1ec325adcc [mod] limiter -> botdetection: modularization and documentation
In order to be able to meet the outstanding requirements, the implementation is
modularized and supplemented with documentation.

This patch does not contain functional change, except it fixes issue #2455

----

Aktivate limiter in the settings.yml and simulate a bot request by::

    curl -H 'Accept-Language: de-DE,en-US;q=0.7,en;q=0.3' \
         -H 'Accept: text/html'
         -H 'User-Agent: xyz' \
         -H 'Accept-Encoding: gzip' \
         'http://127.0.0.1:8888/search?q=foo'

In the LOG:

    DEBUG   searx.botdetection.link_token : missing ping for this request: .....

Since ``BURST_MAX_SUSPICIOUS = 2`` you can repeat the query above two time
before you get a "Too Many Requests" response.

Closes: https://github.com/searxng/searxng/issues/2455
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-29 14:54:56 +02:00
Markus Heiser 5226044c13 [mod] limiter: add random token to the limiter URL
By adding a random component in the limiter URL a bot can no longer send a ping
by request a static URL.

Related: https://github.com/searxng/searxng/pull/2357#issuecomment-1518525094
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-29 14:54:56 +02:00
Markus Heiser dba569462d [mod] limiter: reduce request rates for requests without a ping
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-29 14:54:56 +02:00
Markus Heiser 823c490c84 [mod] limiter: block requests from PetalBot
Block requests from PetalBlock.  Normally robots.txt is enough to stop
PetalBlock from making requests [1].  However, if SearXNG is offered below a
path (example.org/search), then the robots.txt is not available in the root
paths of the domain / subdomain.

[1] https://webmaster.petalsearch.com/site/petalbot

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-30 09:49:26 +02:00
Markus Heiser 8c83547683 [mod] limiter: block unmaintained Farside instances
Since [bb3a01f8] has been merged to the Farside project, Farside instances do no
longer need to send requests to SearXNG instances [1].

There are some old unmaintained Farside instances on the web that continue to
query SearXNG instances --> we can safely block their requests.

[1] https://github.com/benbusby/farside/issues/95
[bb3a01f8] https://github.com/benbusby/farside/commit/bb3a01f8

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-13 16:53:37 +02:00
Markus Heiser 03f94962b6 [fix] limiter: never block a /healthz request
Related: https://github.com/searxng/searxng/issues/2310#issuecomment-1494417531
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-03 19:36:28 +02:00
Markus Heiser 66810ce711 [mod] limiter: minor improvements
- requests without HTTP header 'Connection' or missing 'User-Agent' will be
  blocked by the limiter

- re_bot is related to 'User-Agent' and has been renamed to block_user_agent

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-01 19:42:49 +02:00
Markus Heiser afd8fcce36 [mod] plugin limiter: improve the log messages
In debug mode more detailed logging is needed to evaluate if an access should
have been blocked by the limiter.

BTW: remove duplicate code checking bot signature ``re_bot.match(user_agent)``

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-01 09:20:58 +02:00
Alexandre Flament 6748e8e2d5 Add "Auto-detected" as a language.
When the user choose "Auto-detected", the choice remains on the following queries.
The detected language is displayed.

For example "Auto-detected (en)":
* the next query language is going to be auto detected
* for the current query, the detected language is English.

This replace the autodetect_search_language plugin.
2023-02-17 15:17:36 +00:00
Markus Heiser bb83036f48 [fix] typo in searx/plugins/tor_check.py
Related: https://github.com/searxng/searxng/pull/2189

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-17 13:09:14 +01:00
Allan Nordhøy 2be373a18f [fix] spelling: Tor, SearXNG 2023-02-14 17:13:53 +01:00
ArtikusHG 735e388cec
Merge branch 'master' into fasttext 2022-12-16 19:43:10 +00:00
ArtikusHG 1f8f8c1e91 Replace langdetect with fasttext 2022-12-16 21:07:39 +02:00
Alexandre Flament 9e9f57e48b
Merge pull request #1954 from dalf/fix.redis.init.2
[fix] follow up of PR-1856
2022-12-14 07:08:19 +01:00
Markus Heiser ed901ab18e [mod] improve 'Autodetect search language' plugin
- Add documentation to the plugin
- Harmonize FastText language model with SearXNG's language model

Reosurces::

    import fasttext                                    # --> +10 MB
    fasttext.load_model(str(data_dir / 'lid.176.ftz')) # --> +4MB

Suggested-by: @dalf

- To speed up and simplify the deployment use fasttext-wheel instead of fasttext
- Building numpy on the Alpine Linux of docker-images takes ages --> install
  py3-numpy from Alpines package manager (apk)
- Alpine Linux on docker-images (musl libc) do not support fasttext-wheel (gnu
  libc) --> patch Dockerfile and build from fastetxt:

     sed -i s/fasttext-wheel/fasttext/ requirements.txt

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-12-11 11:26:07 +01:00
ArtikusHG 9925a20950 [mod] new plugin: Autodetect search language 2022-12-10 13:11:47 +01:00
Alexandre Flament b971167ced move searx.shared.redisdb to searx.redisdb 2022-12-10 09:26:38 +01:00
Alexandre Flament fe419e355b The checker requires Redis
Remove the abstraction in searx.shared.SharedDict.
Implement a basic and dedicated scheduler for the checker using a Redis script.
2022-11-05 12:04:50 +01:00
Markus Heiser ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Alexandre FLAMENT 593026ad9c oa_doi_rewrite: add the doi to the result when it is found.
Currentty, when oa_doi_rewrite find a DOI in the result URL, it replace the URL.
In this commit, the plugin adds the key "doi" to the result,
so the paper.html can show it.
2022-09-23 20:45:58 +02:00
Léon Tiekötter 221740f76e
[mod] limiter plugin: Accept-Encoding handling
Only raise "suspicious Accept-Encoding" when both "gzip" and "deflate" are missing from Accept-Encoding.
Prevent Browsers which only implement one compression solution from being blocked by the limiter plugin.
Example Browser which is currently blocked: Lynx Browser (https://lynx.invisible-island.net)
2022-08-25 23:21:30 +02:00
Solirs 6d646129c3 [mod] add tor_check plugin - convenient tor checking trough searxng 2022-07-19 07:34:54 +02:00
mrpaulblack 38385e48cf fix: return body on limiter block so there is not just a blank page 2022-07-05 22:57:26 +02:00
Alexandre Flament ea0cddba0b
Merge pull request #1047 from return42/redis-lib
Add a redis library to generalize DB functions we need in SearXNG.
2022-06-06 10:59:11 +02:00
Markus Heiser 2de007138c [fix] prepare for pylint 2.14.0
Remove issue reported by Pylint 2.14.0:

- no-self-use: has been moved to optional extension [1]
- The refactoring checker now also raises 'consider-using-generator' messages
  for max(), min() and sum(). [2]

.pylintrc:
  - <option name>-hint has been removed since long, Pylint 2.14.0 raises an
    error on invalid options
  - bad-continuation and bad-whitespace have been removed [3]

[1] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/summary.html#removed-checkers
[2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/full.html#what-s-new-in-pylint-2-14-0
[2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.6/summary.html#summary-release-highlights

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-06-03 15:41:52 +02:00
Markus Heiser 4b185f0e11 [mod] plugins/limiter.py - use searx.redislib.incr_sliding_window
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-05-30 11:10:30 +02:00