Commit Graph

128 Commits

Author SHA1 Message Date
Markus Heiser 1ec325adcc [mod] limiter -> botdetection: modularization and documentation
In order to be able to meet the outstanding requirements, the implementation is
modularized and supplemented with documentation.

This patch does not contain functional change, except it fixes issue #2455

----

Aktivate limiter in the settings.yml and simulate a bot request by::

    curl -H 'Accept-Language: de-DE,en-US;q=0.7,en;q=0.3' \
         -H 'Accept: text/html'
         -H 'User-Agent: xyz' \
         -H 'Accept-Encoding: gzip' \
         'http://127.0.0.1:8888/search?q=foo'

In the LOG:

    DEBUG   searx.botdetection.link_token : missing ping for this request: .....

Since ``BURST_MAX_SUSPICIOUS = 2`` you can repeat the query above two time
before you get a "Too Many Requests" response.

Closes: https://github.com/searxng/searxng/issues/2455
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-29 14:54:56 +02:00
Markus Heiser 5226044c13 [mod] limiter: add random token to the limiter URL
By adding a random component in the limiter URL a bot can no longer send a ping
by request a static URL.

Related: https://github.com/searxng/searxng/pull/2357#issuecomment-1518525094
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-29 14:54:56 +02:00
Markus Heiser dba569462d [mod] limiter: reduce request rates for requests without a ping
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-29 14:54:56 +02:00
Markus Heiser 823c490c84 [mod] limiter: block requests from PetalBot
Block requests from PetalBlock.  Normally robots.txt is enough to stop
PetalBlock from making requests [1].  However, if SearXNG is offered below a
path (example.org/search), then the robots.txt is not available in the root
paths of the domain / subdomain.

[1] https://webmaster.petalsearch.com/site/petalbot

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-30 09:49:26 +02:00
Markus Heiser 8c83547683 [mod] limiter: block unmaintained Farside instances
Since [bb3a01f8] has been merged to the Farside project, Farside instances do no
longer need to send requests to SearXNG instances [1].

There are some old unmaintained Farside instances on the web that continue to
query SearXNG instances --> we can safely block their requests.

[1] https://github.com/benbusby/farside/issues/95
[bb3a01f8] https://github.com/benbusby/farside/commit/bb3a01f8

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-13 16:53:37 +02:00
Markus Heiser 03f94962b6 [fix] limiter: never block a /healthz request
Related: https://github.com/searxng/searxng/issues/2310#issuecomment-1494417531
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-03 19:36:28 +02:00
Markus Heiser 66810ce711 [mod] limiter: minor improvements
- requests without HTTP header 'Connection' or missing 'User-Agent' will be
  blocked by the limiter

- re_bot is related to 'User-Agent' and has been renamed to block_user_agent

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-01 19:42:49 +02:00
Markus Heiser afd8fcce36 [mod] plugin limiter: improve the log messages
In debug mode more detailed logging is needed to evaluate if an access should
have been blocked by the limiter.

BTW: remove duplicate code checking bot signature ``re_bot.match(user_agent)``

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-01 09:20:58 +02:00
Alexandre Flament 6748e8e2d5 Add "Auto-detected" as a language.
When the user choose "Auto-detected", the choice remains on the following queries.
The detected language is displayed.

For example "Auto-detected (en)":
* the next query language is going to be auto detected
* for the current query, the detected language is English.

This replace the autodetect_search_language plugin.
2023-02-17 15:17:36 +00:00
Markus Heiser bb83036f48 [fix] typo in searx/plugins/tor_check.py
Related: https://github.com/searxng/searxng/pull/2189

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-17 13:09:14 +01:00
Allan Nordhøy 2be373a18f [fix] spelling: Tor, SearXNG 2023-02-14 17:13:53 +01:00
ArtikusHG 735e388cec
Merge branch 'master' into fasttext 2022-12-16 19:43:10 +00:00
ArtikusHG 1f8f8c1e91 Replace langdetect with fasttext 2022-12-16 21:07:39 +02:00
Alexandre Flament 9e9f57e48b
Merge pull request #1954 from dalf/fix.redis.init.2
[fix] follow up of PR-1856
2022-12-14 07:08:19 +01:00
Markus Heiser ed901ab18e [mod] improve 'Autodetect search language' plugin
- Add documentation to the plugin
- Harmonize FastText language model with SearXNG's language model

Reosurces::

    import fasttext                                    # --> +10 MB
    fasttext.load_model(str(data_dir / 'lid.176.ftz')) # --> +4MB

Suggested-by: @dalf

- To speed up and simplify the deployment use fasttext-wheel instead of fasttext
- Building numpy on the Alpine Linux of docker-images takes ages --> install
  py3-numpy from Alpines package manager (apk)
- Alpine Linux on docker-images (musl libc) do not support fasttext-wheel (gnu
  libc) --> patch Dockerfile and build from fastetxt:

     sed -i s/fasttext-wheel/fasttext/ requirements.txt

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-12-11 11:26:07 +01:00
ArtikusHG 9925a20950 [mod] new plugin: Autodetect search language 2022-12-10 13:11:47 +01:00
Alexandre Flament b971167ced move searx.shared.redisdb to searx.redisdb 2022-12-10 09:26:38 +01:00
Alexandre Flament fe419e355b The checker requires Redis
Remove the abstraction in searx.shared.SharedDict.
Implement a basic and dedicated scheduler for the checker using a Redis script.
2022-11-05 12:04:50 +01:00
Markus Heiser ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Alexandre FLAMENT 593026ad9c oa_doi_rewrite: add the doi to the result when it is found.
Currentty, when oa_doi_rewrite find a DOI in the result URL, it replace the URL.
In this commit, the plugin adds the key "doi" to the result,
so the paper.html can show it.
2022-09-23 20:45:58 +02:00
Léon Tiekötter 221740f76e
[mod] limiter plugin: Accept-Encoding handling
Only raise "suspicious Accept-Encoding" when both "gzip" and "deflate" are missing from Accept-Encoding.
Prevent Browsers which only implement one compression solution from being blocked by the limiter plugin.
Example Browser which is currently blocked: Lynx Browser (https://lynx.invisible-island.net)
2022-08-25 23:21:30 +02:00
Solirs 6d646129c3 [mod] add tor_check plugin - convenient tor checking trough searxng 2022-07-19 07:34:54 +02:00
mrpaulblack 38385e48cf fix: return body on limiter block so there is not just a blank page 2022-07-05 22:57:26 +02:00
Alexandre Flament ea0cddba0b
Merge pull request #1047 from return42/redis-lib
Add a redis library to generalize DB functions we need in SearXNG.
2022-06-06 10:59:11 +02:00
Markus Heiser 2de007138c [fix] prepare for pylint 2.14.0
Remove issue reported by Pylint 2.14.0:

- no-self-use: has been moved to optional extension [1]
- The refactoring checker now also raises 'consider-using-generator' messages
  for max(), min() and sum(). [2]

.pylintrc:
  - <option name>-hint has been removed since long, Pylint 2.14.0 raises an
    error on invalid options
  - bad-continuation and bad-whitespace have been removed [3]

[1] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/summary.html#removed-checkers
[2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/full.html#what-s-new-in-pylint-2-14-0
[2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.6/summary.html#summary-release-highlights

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-06-03 15:41:52 +02:00
Markus Heiser 4b185f0e11 [mod] plugins/limiter.py - use searx.redislib.incr_sliding_window
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-05-30 11:10:30 +02:00
Émilien Devos 66b77c46c7
Fix typo 2022-05-10 20:45:59 +00:00
Alexandre Flament 9b3efa6d8a theme: remove __common__ 2022-05-07 19:40:48 +02:00
Markus Heiser 37493b0a1e [doc] add some documentation about the limiter plugin (and redis)
Requested-by: https://github.com/searxng/searxng/discussions/993#discussioncomment-2396914
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-25 11:26:10 +01:00
Alexandre Flament 56e34947a6 [mod] infinite_scroll as preference
* oscar theme: code from searx/plugins/infinite_scroll.py
* simple theme: new implementation

Co-authored-by: Markus Heiser <markus.heiser@darmarIT.de>
2022-02-20 22:58:51 +01:00
Alexandre Flament 29182eb1c9
Merge pull request #899 from dalf/limiter_update
[limiter] update
2022-02-18 22:17:26 +01:00
Markus Heiser 7352c6bc79 [mod] templates: rename field for <iframe> URL to iframe_src
Rename result field data_src to iframe_src

Suggested-by: @dalf https://github.com/searxng/searxng/pull/882#issuecomment-1037997402
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-18 19:00:49 +01:00
Markus Heiser 795e8af61d [fix] hostname_replace.py: don't stop replace URL in fields
This is a rewrite of the hostname_replace.py that:

- don't stop to replace URL in fields ('data_src', 'audio_src') if there isn't a
  'parsed_url',
- adds a comment about keep or remove a result from the result list
- adds a loop over ['data_src', 'audio_src'] instead of doubling code lines

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-18 18:59:58 +01:00
Alexandre Flament d1b7debac6 [limiter] update 2022-02-17 20:27:02 +01:00
Markus Heiser 98cab4cf75 [mod] result_templates/default.html replace embedded HTML by data_src audio_src
Embedded HTML breaks SearXNG architecture.  To modularize, HTML is generated in
the templates (oscar & simple) and result parameter 'embedded' is replaced by
'data_src' (and 'audio_src'), an URL for embedded content (<iframe>).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-13 14:20:47 +01:00
Markus Heiser b9a2e8b387 [mod] hostname_replace: replace hostnames in result's data_src param
To test you need to redirect embeded videos (e.g.) from youtube to a invidios
instance.  Search for videos using engine `!youtube lebowski`.  The result URLs
and the embeded videos should link to the invidios instance.

Here is an example of such a `hostname_replace` configuration::

    hostname_replace:

      # youtube --> Invidious

      '(.*\.)?youtube-nocookie\.com': 'invidio.xamh.de'
      '(.*\.)?youtube\.com$': 'invidio.xamh.de'
      '(.*\.)?invidious\.snopyta\.org$': 'invidio.xamh.de'
      '(.*\.)?vid\.puffyan\.us': 'invidio.xamh.de'
      '(.*\.)?invidious\.kavin\.rocks$': 'invidio.xamh.de'
      '(.*\.)?inv\.riverside\.rocks$': 'invidio.xamh.de'

Closes: https://github.com/searxng/searxng/issues/873
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-13 14:20:47 +01:00
Alexandre Flament b99ccd7c02 plugin limiter: check only /image_proxy and /search
also adjust the number of req/time
2022-02-12 15:57:07 +01:00
Alexandre Flament f79b0fce06 [enh] limiter plugin
can replace filtron:
* rate limite the number of request per IP and per (IP, User-Agent)
* block some bots

use Redis
data stored in Redis never contains the IP addresses, only HMAC using the secret_key

Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-02 09:15:59 +01:00
Martin Fischer 6d43cf7952 [typing] add optional attrs to Plugin 2022-01-17 11:42:48 +01:00
Martin Fischer bb06758a7b [refactor] add type hints & remove Setting._post_init
Previously the Setting classes used a horrible _post_init
hack that prevented proper type checking.
2022-01-06 14:21:14 +01:00
Markus Heiser 3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Markus Heiser fcdc2c2cd2 [format.python] disable py code formatting for some hunks of code
Disable the python code formatting from python-black, where the readability of
code suffers by formatting.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:16:03 +01:00
Markus Heiser 5731b6b700 [mod] searx.plugins.prepare_package_resources() - use generators
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-07 18:41:56 +02:00
Markus Heiser aa5a5147b2 [fix] searx.plugins.initialize() - don't miss module & module-name
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-07 18:41:41 +02:00
Alexandre Flament 2b4fef7118 plugins: refactor initialization
add a new function "init" call when the app starts.
The function can:
* return False to disable the plugin.
* modify the Flask app.
2021-10-06 19:18:19 +02:00
Alexandre Flament 0f43b39eac [enh] add hostname_replace plugin
* backport of https://github.com/searx/searx/pull/2724
* allow to remove result if the replacement is the boolean value false
2021-09-11 13:23:06 +02:00
Alexandre Flament b941763e20 [mod] ahmia_filter: use on_result instead of post_search
see commit 6c9ae7911e9639bc46cd53af215734b4bdb61ba9
2021-09-09 11:31:46 +02:00
Alexandre Flament fc20c561bf [mod] oa_doi_rewrite plugin: get_doi_resolver: remove args parameter
doi_resolvers.get_value('preferences') already contains the value from
request.args.get('doi_resolver')
2021-09-07 19:14:36 +02:00
Alexandre Flament 3f3b5d6181 [mod] plugins: minor change
required attributes: display a different message
when the attribute has the wrong type
2021-08-14 18:03:31 +02:00
Alexandre Flament 881659ca9d [mod] oscar theme: /preferences : HTML detail order match visual tabs
First details about the general tab, then detail about UI tab, etc...
No functionnal change
2021-06-17 15:29:07 +02:00