Implement a scrapper for DuckDuckGo-Lite [1]. The existing DuckDuckGo [2]
engine does not support paging. DuckDuckgo-Lite is much faster, less verbose
and does have a paging option (reversed engineered from the input form of [1]).
[1] https://lite.duckduckgo.com/lite
[2] https://duckduckgo.com/
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
If there is no write access, there is no need for global. Remove global
statement if there is no assignment.
global-variable-not-assigned:
Using global for names but no assignment is done Used when a variable is
defined through the "global" statement but no assignment to this variable is
done.
In Pylint 2.11 the global-variable-not-assigned checker now catches global
variables that are never reassigned in a local scope and catches (reassigned)
functions [1][2]
[1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html
[2] https://github.com/PyCQA/pylint/issues/1375
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
the openstreetmap engine imports code from the wikidata engine.
before this commit, specific code make sure to copy the logger variable to the wikidata engine.
with this commit searx.engines.load_engine makes sure the .logger is initialized.
The implementation scans sys.modules for module name starting with searx.engines.
close#298
This is a workaround: inside engine code, any call to function in another engine can crash
since the logger won't be initialized except if it is done explicitly.
Instead of raising an exception and therefore hiding all results of the engine.
It make sense to remove that requirement in order to allow the implementation of
search engines that do not always have a description. In fact some search
engines that in 99% of the case have a description like Brave Search or Mojeek
crash completely if they for some reason included a result with no description.
To test this patch try Mojeek:
!mjk xyz
before and after the patch.
Suggested-by: 0xhtml in https://github.com/searx/searx/discussions/2933
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Pylint 2.10 added new default checks [1]:
use-list-literal
Emitted when list() is called with no arguments instead of using []
use-dict-literal
Emitted when dict() is called with no arguments instead of using {}
[1] https://pylint.pycqa.org/en/latest/whatsnew/2.10.html
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The implementation uses the Qwant API (https://api.qwant.com/v3). The API is
undocumented but can be reverse engineered by reading the network log of
https://www.qwant.com/ queries.
This implementation is used by different qwant engines in the settings.yml::
- name: qwant
categories: general
...
- name: qwant news
categories: news
...
- name: qwant images
categories: images
...
- name: qwant videos
categories: videos
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, google-news requests are redirected to confirm and get a CONSENT
Cookie from https://consent.google.de/s?continue=...
This patch adds a CONSENT Cookie to the google-news request to avoid
redirection.
The behavior of the CONTENTS cookies over all google engines seems similar but
the pattern is not yet fully clear to me, here are some random samples from my
analysis ..
Using common google search from different domains::
google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
google.fr: CONSENT=YES+srp.gws-{{date}}-0-RC2.fr+FX+826
When searching about videos (google-videos)::
google.es: CONSENT=YES+srp.gws-{{date}}-0-RC2.es+FX+076
google.de: CONSENT=YES+srp.gws-{{date}}-0-RC2.de+FX+171
Google news has only one domain for all languages::
news.google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
Using google-scholar search from different domains::
scholar.google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
scholar.google.fr: does not use such a cookie / did not ask the user
scholar.google.es: does not use such a cookie / did not ask the user
Interim summary:
Pattern is unclear and I won't apply the CONSENT cookie to all google engines.
More experience is need before we generalize the CONSENT cookies over all
google engines.
Related:
- e9a6ab401 [fix] youtube - send CONSENT Cookie to not be redirected
- https://github.com/benbusby/whoogle-search/issues/311
- https://github.com/benbusby/whoogle-search/issues/243
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since we added
- 1c67b6aec [enh] google engine: supports "default language"
there is a KeyError: 'hl in request,error pattern::
ERROR:searx.searx.search.processor.online:engine google news : exception : 'hl'
Traceback (most recent call last):
File "searx/search/processors/online.py", line 144, in search
search_results = self._search_basic(query, params)
File "searx/search/processors/online.py", line 118, in _search_basic
self.engine.request(query, params)
File "searx/engines/google_news.py", line 97, in request
if lang_info['hl'] == 'en':
KeyError: 'hl'
Closes: https://github.com/searxng/searxng/issues/154
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Same behaviour behaviour than Whoogle [1]. Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.
When searching for a locate place, the result are in the expect language,
without missing results [2]:
> When a language is not specified, the language interpretation is left up to
> Google to decide how the search results should be delivered.
The query parameters are copied from Whoogle. With the ``all`` language:
- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.
The new signature of function ``get_lang_info()`` is:
lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)
Argument ``supported_any_language`` is True for google.py and False for the other
google engines. With this patch the function now returns:
- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
- ``lang_info['subdomain']``
- ``lang_info['country']``
- ``lang_info['language']``
[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4