We have built up detailed documentation of the *settings* and the *engines* over
the past few years. However, this documentation was still spread over various
chapters and was difficult to navigate in its entirety.
This patch rearranges the Settings & Engines documentation for better
readability.
To review new ordered docs::
make docs.clean docs.live
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The renderuing of the WEB page is very strange; except the firts position all
other positions of Anna's result page are enclosed in SGML comments. These
cooments are *uncommented* by some JS code, see query of the class
'.js-scroll-hidden' in Anna's HTML template [1].
[1] https://annas-software.org/AnnaArchivist/annas-archive/-/blob/main/allthethings/templates/macros/md5_list.html
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Replace lists with one item by the item, not before last currency has been
added. In this traceback 'MXN' is added to 'pesos' while pesos is no longer a
list as the optimization was carried out too early.
$ ./local/py3/bin/python searxng_extra/update/update_currencies.py
Traceback (most recent call last):
File "searxng_extra/update/update_currencies.py", line 164, in <module>
main()
File "searxng_extra/update/update_currencies.py", line 157, in main
add_currency_name(db, "pesos", 'MXN')
File "searxng_extra/update/update_currencies.py", line 89, in add_currency_name
iso4217_set.insert(0, iso4217)
AttributeError: 'str' object has no attribute 'insert'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- torznab engine using types and clearer code
- torznab option to hide torrent and magnet links.
- document the torznab engine
- add myself to authors
Closes: https://github.com/searxng/searxng/issues/1124
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
It seems that Google is rolling out a modified WEB API [1][2].
In the past there was only the UI language in the `hl` argument but nowadays it
seems a combination of the UI language and the "search region" is mixed in this
argument and the `gl` argument has been removed. I'm very surprised that google
is starting to mix the parameters of the UI with the parameters of the search
index.
This patch modifies the get_google_info(..) function. Beside Google-WEB this
function is also used by other Google services, here are some examples to test
region & language of ..
- Google-WEB: `!go dragon boat :en-CA`
- Google-News: `!gon dragon boat :en-CA`
- Google-Videos: `!gov bmw :en-CA`
- Goolge-Images `!goi bmw :en-CA`
- [1] https://github.com/searxng/searxng/issues/2515#issuecomment-1606294635
- [2] https://github.com/searxng/searxng/issues/2515#issuecomment-1607150817
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:
- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia
Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.
Related and (partial) fixed issue:
- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch implements a simple JSONEncoder just to fix#2502 / on the long term
SearXNG needs a data schema for the result items and a json generator for the
result list.
Closes: https://github.com/searxng/searxng/issues/2505
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Over the years the webapp module became more and more a mess. To improve the
modulaization a little this patch moves some implementations from the webapp
module to webutils module.
HINT: this patch brings non functional change
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
A blocklist and a passlist can be configured in /etc/searxng/limiter.toml::
[botdetection.ip_lists]
pass_ip = [
'51.15.252.168', # IPv4 of check.searx.space
]
block_ip = [
'93.184.216.34', # IPv4 of example.org
]
Closes: https://github.com/searxng/searxng/issues/2127
Closes: https://github.com/searxng/searxng/pull/2129
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
while PR #2357 [1] was being implemented the question came up:
would be better to change the PING resource from CSS to an image so that
some terminal based browser may still able to pass the test [1]
This patch implements a POC in where a <img src=token> tag is loaded instaed a
CSS.
To test this patch activate limiter and link_token method [3] and start a
developer instance::
make run
In your terminal browser open http://127.0.0.1:8888/search?q=foo
If the browser is suitable for the link_token method, it loads the image and the
following messages appear::
DEBUG searx.botdetection.limiter : OK 127.0.0.1/32: /clientft61aak7fzyu6o6v.svg ...
DEBUG searx.botdetection.link_token : token is valid --> True
DEBUG searx.botdetection.link_token : store ping_key for (client) network 127.0.0.1/32 (IP 127.0.0.1) -> SearXNG_limiter.ping[...]
Browsers that do not load images will be blocked: If you try by example::
lynx http://127.0.0.1:8888/search?q=foo
you will see a WARNING message like::
WARNING searx.botdetection.link_token : missing ping (IP: 127.0.0.1/32) / request: SearXNG_limiter.ping[...]
----
[1] 80aaef6c95
[2] https://github.com/searxng/searxng/pull/2357#issuecomment-1574898834
[3] activate limiter and link_token method
```diff
diff --git a/searx/botdetection/limiter.toml b/searx/botdetection/limiter.toml
index 71a231e8f..7e1dba755 100644
--- a/searx/botdetection/limiter.toml
+++ b/searx/botdetection/limiter.toml
@@ -17,6 +17,6 @@ ipv6_prefix = 48
filter_link_local = false
# acrivate link_token method in the ip_limit method
-link_token = false
+link_token = true
diff --git a/searx/settings.yml b/searx/settings.yml
index a82a3432d..e7b983afc 100644
--- a/searx/settings.yml
+++ b/searx/settings.yml
@@ -73,7 +73,7 @@ server:
# public URL of the instance, to ensure correct inbound links. Is overwritten
# by ${SEARXNG_URL}.
base_url: false # "http://example.com/location"
- limiter: false # rate limit the number of request on the instance, block some bots
+ limiter: true # rate limit the number of request on the instance, block some bots
# If your instance owns a /etc/searxng/settings.yml file, then set the following
# values there.
```
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The monolithic implementation of the limiter was divided into methods and
implemented in the Python package searx.botdetection. Detailed documentation on
the methods has been added.
The methods are divided into two groups:
1. Probe HTTP headers
- Method http_accept
- Method http_accept_encoding
- Method http_accept_language
- Method http_connection
- Method http_user_agent
2. Rate limit:
- Method ip_limit
- Method link_token (new)
The (reduced) implementation of the limiter is now in the module
searx.botdetection.limiter. The first group was transferred unchanged to this
module. The ip_limit contains the sliding windows implemented by the limiter so
far.
This merge also fixes some long outstandig issue:
- limiter does not evaluate the Accept-Language correct [1]
- limiter needs a IPv6 prefix to block networks instead of IPs [2]
Without additional configuration the limiter works as before (apart from the
bugfixes). For the commissioning of additional methods (link_toke), a
configuration must be made in an additional configuration file. Without this
configuration, the limiter runs as before (zero configuration).
The ip_limit Method implements the sliding windows of the vanilla limiter,
additionally the link_token method can be used in this method. The link_token
method can be used to investigate whether a request is suspicious. To activate
the link_token method in the ip_limit method add the following to your
/etc/searxng/limiter.toml::
[botdetection.ip_limit]
link_token = true
[1] https://github.com/searxng/searxng/issues/2455
[2] https://github.com/searxng/searxng/issues/2477
HINT: this patch has no functional change / it is the preparation for following changes and bugfixes
Over the years, the preferences template became an unmanageable beast. To make the source code more readable the monolith is splitted into elements. The splitting into elements also has the advantage that a new template can make use of them.
The reversed checkbox is a quirk that is only used in the prefereces and must be eliminated in the long term. For this the macro 'checkbox_onoff_reversed' was added to the preferences.html template. The 'checkbox' macro is also a quirk of the preferences.html we don't want to use in other templates (it is an input-checkbox in a HTML form that was misused for status display).
HINT: this patch has no functional change / it is the preparation for following
changes and bugfixes
Over the years, the preferences template became an unmanageable beast. To make
the source code more readable the monolith is splitted into elements. The
splitting into elements also has the advantage that a new template can make use
of them.
The reversed checkbox is a quirk that is only used in the prefereces and must be
eliminated in the long term. For this the macro 'checkbox_onoff_reversed' was
added to the preferences.html template. The 'checkbox' macro is also a quirk of
the preferences.html we don't want to use in other templates (it is an
input-checkbox in a HTML form that was misused for status display).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In my tests I see bots rotating IPs (with endless IP lists). If such a bot has
100 IPs and has three attempts (SUSPICIOUS_IP_MAX = 3) then it can successfully
send up to 300 requests in one day while rotating the IP. To block the bots for
a longer period of time the SUSPICIOUS_IP_WINDOW, as the time period in which an
IP is observed, must be increased.
For normal WEB-browsers this is no problem, because the SUSPICIOUS_IP_WINDOW is
deleted as soon as the CSS with the token is loaded.
SUSPICIOUS_IP_WINDOW = 3600 * 24 * 30
Time (sec) before sliding window for one suspicious IP expires.
SUSPICIOUS_IP_MAX = 3
Maximum requests from one suspicious IP in the :py:obj:`SUSPICIOUS_IP_WINDOW`."""
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
For correct determination of the IP to the request the function
botdetection.get_real_ip() is implemented. This fonction is used in the
ip_limit and link_token method of the botdetection and it is used in the
self_info plugin.
A documentation about the X-Forwarded-For header has been added.
[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566211059
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>