Follow up of #2269
The script to update the descriptions of the engines does no longer work since
PR #2269 has been merged.
searx/engines/wikipedia.py
==========================
1. There was a misusage of zh-classical.wikipedia.org:
- `zh-classical` is dedicate to classical Chinese [1] which is not
traditional Chinese [2].
- zh.wikipedia.org has LanguageConverter enabled [3] and is going to
dynamically show simplified or traditional Chinese according to the
HTTP Accept-Language header.
2. The update_engine_descriptions.py needs a list of all wikipedias. The
implementation from #2269 included only a reduced list:
- https://meta.wikimedia.org/wiki/Wikipedia_article_depth
- https://meta.wikimedia.org/wiki/List_of_Wikipedias
searxng_extra/update/update_engine_descriptions.py
==================================================
Before PR #2269 there was a match_language() function that did an approximation
using various methods. With PR #2269 there are only the types in the data model
of the languages, which can be recognized by babel. The approximation methods,
which are needed (only here) in the determination of the descriptions, must be
replaced by other methods.
[1] https://en.wikipedia.org/wiki/Classical_Chinese
[2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters
[3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter
Closes: https://github.com/searxng/searxng/issues/2330
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
BTW this fix an issue in wikipedia: SearXNG's locales zh-TW and zh-HK are now
using language `zh-classical` from wikipedia (and not `zh`).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Implements a fetch_traits function for the Wikipedia engines.
.. note::
Does not include migration of the request methode from 'supported_languages'
to 'traits' (EngineTraits) object!
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).
- add new engine option: send_accept_language_header
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
settings.yml:
* outgoing.networks:
* can contains network definition
* propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
* retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
* local_addresses can be "192.168.0.1/24" (it supports IPv6)
* support_ipv4 & support_ipv6: both True by default
see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
* either a full network description
* either reference an existing network
* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)
The commit uses api_result['title']
on some queries (like an IT error message), wikipedia returns an HTTP error 400.
this commit returns an empty result instead of showing an error to the user.
check HTTP response:
* detect some comme CAPTCHA challenge (no solving). In this case the engine is suspended for long a time.
* otherwise raise HTTPError as before
the check is done in poolrequests.py (was before in search.py).
update qwant, wikipedia, wikidata to use raise_for_httperror instead of raise_for_status
Add a new parameter "raise_for_status", set by default to True.
When True, any HTTP status code >= 300 raise an exception ( #2332 )
When False, the engine can manage the HTTP status code by itself.
Add match_language function in utils to match any user given
language code with a list of engine's supported languages.
Also add language_aliases dict on each engine to translate
standard language codes into the custom codes used by the engine.