Commit Graph

1068 Commits

Author SHA1 Message Date
Markus Heiser d2faea423a [fix] rewrite Yahoo-News engine
Many things have been changed since last review of this engine.  This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-08 11:43:34 +01:00
Adam Tauber 44f4a9d49a [enh] add ability to send engine data to subsequent requests 2021-03-06 12:12:35 +01:00
Markus Heiser 4845183128 [mod] don't dump traceback of SearxEngineResponseException on init
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:

    ERROR:searx.engines:yggtorrent engine: Fail to initialize
    Traceback (most recent call last):
      File "share/searx/searx/engines/__init__.py", line 293, in engine_init
        init_fn(get_engine_from_settings(engine_name))
      File "share/searx/searx/engines/yggtorrent.py", line 42, in init
        resp = http_get(url, allow_redirects=False)
      File "share/searx/searx/poolrequests.py", line 197, in get
        return request('get', url, **kwargs)
      File "share/searx/searx/poolrequests.py", line 190, in request
        raise_for_httperror(response)
      File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
        raise_for_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
        raise_for_cloudflare_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
        raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
    searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000

For SearxEngineResponseException this is not needed.  Those types of exceptions
can be a normal use case.  E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:

    WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000

closes: #2612

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-05 17:26:22 +01:00
Markus Heiser d48e2e7b0b [enh] google scholar - python implementation of the engine
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-01 15:16:37 +01:00
Alexandre Flament f77983e174
Merge pull request #2602 from MarcAbonce/fix-bing-fetch-languages
Fix fetch_languages for Bing
2021-03-01 09:06:37 +01:00
GazoilKerozen 5f6ac3afa2
Add Freesound engine (#2596)
Add freesound engine with player.

Co-authored-by: Gazoil <maildeguzel@gmail.com>
2021-03-01 08:52:36 +01:00
Marc Abonce Seguin d6681fd33b remove articles number from engines_languages.json 2021-02-25 23:54:21 -07:00
Marc Abonce Seguin 9b6ffed061 fix fetch_languages for bing
Bing has a list of regions that it supports and some of these regions
may have more than one possible language.

In some cases, like Switzerland, these languages are always shown as
options, so there is no issue. But in other cases, like Andorra, Bing
will only show one language at the time, either the region's default or
the request's language if the latter is supported by that region.

For example, if the HTTP request is in French, Andorra will appear as
fr-AD but if the same page is requested in any other language Andorra
will appear as ca-AD.

This is specially a problem when Bing assumes that the request is in
English because it overrides enough language codes to make several major
languages like Arabic dissappear from the languages.py file.

To avoid that issue, I set the Accept-Language header to a language
that's only supported in one region to hopefully avoid these overrides.
2021-02-25 23:51:49 -07:00
Noémi Ványi 1be6ab2a91 Fix paging of Bing Images 2021-02-22 21:19:34 +01:00
datagram1 1d0a32a2c5 Added rumble.com video search engine. TODO video embedding.
Update rumble.py

some lines too long.

Disable Rumble engine

disabled : True

PEP8 fix

change line spacing
2021-02-20 12:48:56 +00:00
Alexandre Flament 44a6593c13
Merge pull request #2573 from unixfox/yggtorrent
update yggtorrent url + add it back
2021-02-16 08:22:07 +01:00
Emilien Devos 4b37e10dd9 fix yggtorrent url + add it back 2021-02-15 13:38:34 +01:00
Thorben Günther fbbd4cc21f
Improve peertube searching
At the moment videos without a description are not shown - setting
default content to "" fixes this.
Another current bug is that thumbnails are not displayed. This is caused
by a double slash in the url. For this every trailing slash is now
stripped (for backwards compatibility) and the API response is correctly
parsed.
2021-02-13 19:47:33 +01:00
Alexandre Flament 45027765e3
Merge pull request #2566 from dalf/remove-yandex
[remove] yandex engine
2021-02-12 17:12:07 +01:00
Alexandre Flament c22d4c764c [fix] duckduckgo engine: "!ddg !g" do not redirect to google
* searx understand "!ddg !g time" as : send "!g time" to DDG
* !g a DDG bang for Google: DDG return a HTTP redirect to Google

This commit adds a the allows_redirect param not to follow HTTP redirect.

The DDG engine returns a empty result as before without HTTP redirect.
2021-02-12 11:10:08 +01:00
Alexandre Flament d76660463b
Merge pull request #2562 from dalf/mod-json-engine
[mod] json_engine: add content_html_to_text and title_html_to_text
2021-02-12 10:58:28 +01:00
Alexandre Flament 7dcf67a47a
Merge pull request #2565 from dalf/upd-wikipedia
[upd] wikipedia engine: return an empty result on query with illegal characters
2021-02-12 10:57:05 +01:00
Alexandre Flament 2b60d0d243
Merge pull request #2564 from dalf/fix-seznam
[fix] fix seznam engine
2021-02-12 10:56:53 +01:00
Alexandre Flament 7e83818879
Merge pull request #2560 from dalf/fix-duckduckgo
Fix duckduckgo
2021-02-12 10:56:40 +01:00
Alexandre Flament 74c8b5606f
Merge pull request #2541 from return42/mediathekviewweb
[enh] add engine MediathekViewWeb (API)
2021-02-11 15:11:26 +01:00
Alexandre Flament 5d9db6c2f7 [remove] yandex engine 2021-02-11 14:28:06 +01:00
Alexandre Flament 35dd069402 [fix] fix seznam engine
no paging support
2021-02-11 12:53:19 +01:00
Alexandre Flament 7d6e69e2f9 [upd] wikipedia engine: return an empty result on query with illegal characters
on some queries (like an IT error message), wikipedia returns an HTTP error 400.
this commit returns an empty result instead of showing an error to the user.
2021-02-11 12:29:21 +01:00
Alexandre Flament ff84a1af35 [mod] json_engine: add content_html_to_text and title_html_to_text
Some JSON API returns HTML in either in the HTML or the content.
This commit adds two new parameters to the json_engine:
content_html_to_text and title_html_to_text, False by default.

If True, then the searx.utils.html_to_text removes the HTML tags.

Update crossref, openairedatasets and openairepublications engines
2021-02-10 16:42:11 +01:00
Alexandre Flament 436d366448
Merge pull request #2544 from mrwormo/congresslibrary
[Engine] Add Library of Congress engine
2021-02-10 10:13:46 +01:00
Alexandre Flament d2dac11392 [mod] duckduckgo engine: better support of the language preference
After the main request, send a second to https://duckduckgo.com/t/sl_h

See https://github.com/searx/searx/issues/2259
2021-02-09 14:36:43 +01:00
Markus Heiser bc1be3f0e9 [enh] add engine MediathekViewWeb (API)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-09 13:08:01 +01:00
mrwormo 051da88328 Add Library of Congress engine 2021-02-09 12:45:39 +01:00
Alexandre Flament 5e055b069b [fix) fix apk_mirror engine 2021-02-09 11:02:12 +01:00
Marc Abonce Seguin 64e81794fe add support for Chinese variants in Wikipedia 2021-02-08 21:56:45 -07:00
Hermógenes Oliveira 514faa9162 [feat] recoll: paged json support 2021-02-07 10:05:35 -03:00
mrwormo c4c1636b18 Add Creative Commons search engine 2021-02-04 11:31:35 +01:00
Alexandre Flament ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Markus Heiser 7f505bdc6f [fix] google: avoid unnecessary SearxEngineXPathException errors
Avoid SearxEngineXPathException errors when parsing non valid results::

    .//div[@class="yuRUbf"]//a/@href index 0 not found
    Traceback (most recent call last):
      File "./searx/engines/google.py", line 274, in response
        url = eval_xpath_getindex(result, href_xpath, 0)
      File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
        raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
    searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Markus Heiser 8cdad5d85d [fix] google-videos: parse values for 'length' & 'author'
The 'video.html' template from the 'oscar' design supports replacement
for *author* and *length*.  Google-videos does not have an author, alternatively
the publisher info from is used for the *author*.

Hint: these replacements are not supported by the 'simple' design.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:51:24 +01:00
Markus Heiser 89b3050b5c [fix] revise of the google-Video engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:39:30 +01:00
Alexandre Flament 8c46b767d0 [fix] google_news: avoid one HTTP redirect except for the English results
also add
params['soft_max_redirects'] = 1
to avoid false error reporting in /stats/errors
2021-01-24 08:53:35 +01:00
Markus Heiser 5f92dfcdbe [fix] google-news: query uses locale without country tag
Wthout country-region tag google will redirect to correct the contry tag [1]:

    SEARX_DEBUG=1 searx-checker -v "google news"
    ...
    https://news.google.com:443 "GET /search?q=computer&hl=en...      HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&.... HTTP/1.1" 200 None
    ...

[1] https://github.com/searx/searx/pull/2483#issuecomment-765600849

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-23 11:37:14 +01:00
Markus Heiser baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Alexandre Flament b405646749
Merge pull request #2451 from mrwormo/invidious-engine
[Fix] Invidious Engine
2021-01-16 19:25:45 +01:00
Alexandre Flament a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
mrwormo 2dff3887f0 [fix] Invidious engine by enabling requests by randomly picking amongst working instances 2021-01-14 12:12:56 +01:00
Alexandre Flament 3f8ebf70b1 [fix] pylint: use "raise ... from ..." 2020-12-20 09:46:53 +01:00
Alexandre Flament eb33ae6893 [fix] Python 3.9: use html.unescape instead of HTMLParser.unescape 2020-12-20 09:46:53 +01:00
Alexandre Flament 02fc4147ce [mod] dictzone, translated, currency_convert: use engine_type online_curency and online_dictionnary 2020-12-17 11:39:36 +01:00
Alexandre Flament 7ec8bc3ea7 [mod] split searx.search into different processors
see searx.search.processors.abstract.EngineProcessor

First the method searx call the get_params method.

If the return value is not None, then the searx call the method search.
2020-12-17 11:39:36 +01:00
lucky13820 fea8958e99
Fix the StartPage result title is showing the url
Fix the issue 2395 where StartPage result title is showing the url. https://github.com/searx/searx/issues/2395
2020-12-16 13:54:14 -08:00
Alexandre Flament 292b73a3fc
Merge pull request #2385 from joshu9h/patch-1
[Fix] Startpage
2020-12-14 17:56:48 +01:00
Alexandre Flament 36600118fb
Merge pull request #2372 from dalf/remove-broken-engines
[remove] remove searchcode_doc and twitter
2020-12-13 17:11:05 +01:00