The request function should not request a language (aka locale) that is not
supported by qwant. Select a locale like zh-TW ends in qwant's API error:
ERROR searx.engines.qwant news: exception : \
API error::locale must be one of the following values: \
en_gb, en_ie, en_us, en_ca, en_my, en_au, en_nz, de_de, de_ch, de_at, fr_fr, \
fr_be, fr_ch, fr_ca, fr_ad, fc_ca, co_fr, es_es, es_ar, es_cl, es_co, es_mx, \
es_pe, es_ad, ca_es, ca_ad, ca_fr, eu_es, eu_fr, it_it, it_ch, pt_pt, pt_ad, \
nl_be, nl_nl
The existing searx.utils.match_language function is unsuitable for this purpose,
it is replaced by function searx.locales.get_engine_locale that is based on the
methods from the babel package.
The quant's _fetch_supported_languages function has been revised to filter out
languages 8aka locales) not supported by qwant.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In PR #1071 the language catalog of dailymotion has been cleaned up, before
there had been over 7000 "languages" in the catalog.
As a side effect of this clean-up the language & region catalog in SearXNG has
been reduced [1].
This patch reduce the ``min_engines_per_lang`` from 13 to 12 to get the missed
languages back in language & region catalog of SearXNG.
[1] 3bb62823ec (diff-f3f00db0f87f95b882624a192e0aac21525638af0b18c9514e765fcf1991678d)
Requested-by: @tiekoetter in a Matrix chat
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- fix the issue of fetching more the 7000 *languages*
- improve the request function and filter by language & country
- implement time_range_support & safesearch
- add more fields to the response from dailymotion (allow_embed, length)
- better clean up of HTML tags in the 'content' field.
This is more or less a complete rework based on the '/videos' API from [1].
This patch cleans up the language list in SearXNG that has been polluted by the
ISO-639-3 2 and 3 letter codes from dailymotion languages which have never been
used.
[1] https://developers.dailymotion.com/tools/
Closes: https://github.com/searxng/searxng/issues/1065
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Languages are supported by mapping the language to a domain. If domain is not
found in :py:obj:`lang2domain` URL ``<lang>.search.yahoo.com`` is used.
BTW: fix issue reported at https://github.com/searx/searx/issues/3020
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The implementation uses the Qwant API (https://api.qwant.com/v3). The API is
undocumented but can be reverse engineered by reading the network log of
https://www.qwant.com/ queries.
This implementation is used by different qwant engines in the settings.yml::
- name: qwant
categories: general
...
- name: qwant news
categories: news
...
- name: qwant images
categories: images
...
- name: qwant videos
categories: videos
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To get meaningfull diffs, the json file has to be sorted. Before applying any
further content patch, the json file needs a inital sort (without changing any
content).
Sorted by::
import sys, json
with open('engines_languages.json') as f:
j = json.load(f)
with open('engines_languages.json', 'w') as f:
json.dump(j, f, indent=2, sort_keys=True)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Instead of a single line with 500000 characters use nicely formatted JSON.
Sort the lists in engine_languages.py so when updating it is possible to
more easily see the differences (search engines do change the order their
languages are listed in)
Add match_language function in utils to match any user given
language code with a list of engine's supported languages.
Also add language_aliases dict on each engine to translate
standard language codes into the custom codes used by the engine.
languages.py can change, so users may query on a language that is not
on the list anymore, even if it is still recognized by a few engines.
also made no and nb the same because they seem to return the same,
though most engines will only support one or the other.