Commit Graph

1108 Commits

Author SHA1 Message Date
Markus Heiser e9a6ab4015 [fix] youtube - send CONSENT Cookie to not be redirected
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking.  To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com

This patch adds a CONSENT Cookie to the youtube request to avoid redirection.

[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
2021-04-22 12:09:09 +02:00
Alexandre Flament c6d5605d27
Merge pull request #7 from searxng/metrics
Metrics
2021-04-22 08:34:17 +02:00
Alexandre Flament b7848e3422 [fix] searxng fix: sjp engine 2021-04-21 16:31:29 +02:00
Alexandre Flament 7acd7ffc02 [enh] rewrite and enhance metrics 2021-04-21 16:24:46 +02:00
Alexandre Flament aae7830d14 [mod] refactoring: processors
Report to the user suspended engines.

searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
2021-04-21 16:24:46 +02:00
Alexandre Flament 48720e20a8 Merge remote-tracking branch 'searx/master' 2021-04-19 09:35:12 +02:00
Noémi Ványi 8362257b9a
Merge pull request #2736 from plague-doctor/sjp
Add new engine: SJP - Słownik języka polskiego
2021-04-16 17:30:14 +02:00
Noémi Ványi e56323d3c8
Merge pull request #2759 from ypid/fix/typo
Fix grammar mistake in debug log output
2021-04-16 17:26:45 +02:00
Plague Doctor d275d7a35e Code refactoring. 2021-04-16 12:23:27 +10:00
Markus Heiser 062d589f86 [fix] xpath expressions to grap all items from bandcamp's response
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.

BTW: added bandcamp's favicon

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-15 08:52:11 +02:00
Kyle Anthony Williams 4d3c399ee9 [feat] add bandcamp engine 2021-04-15 08:52:11 +02:00
Alexandre Flament d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-04-12 17:25:56 +02:00
Robin Schneider dfc66ff0f0
Fix grammar mistake in debug log output 2021-04-11 22:12:53 +02:00
Alexandre Flament eaa694fb7d [enh] replace requests by httpx 2021-04-10 15:38:33 +02:00
Plague Doctor 599ff39ddf Fix conflicts 2021-04-09 06:54:03 +10:00
Plague Doctor 6631f11305 Add new engine: SJP 2021-04-08 10:21:54 +10:00
Plague Doctor 7035bed4ee Add new engine: Wordnik.com 2021-04-08 09:58:00 +10:00
Noémi Ványi 07f5edce3d Add Meilisearch engine
Website: https://www.meilisearch.com/
2021-04-06 21:57:05 +02:00
Alexandre Flament 725a69616b
Merge pull request #2681 from dalf/fix-wikipedia-title
[fix] wikipedia: remove HTML from the title
2021-03-27 17:43:36 +01:00
Noémi Ványi 9bb312c505 Remove duplicated key from dict in Semantic Scholar 2021-03-27 16:58:32 +01:00
Noémi Ványi f596f5767b fix Semantic Scholar engine 2021-03-27 16:54:01 +01:00
Adam Tauber 28286cf3f2 [fix] update seznam engine to be compatible with the new website 2021-03-27 15:29:04 +01:00
Alexandre Flament fcfcf662ff [fix] wikipedia: remove HTML from the title
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)

The commit uses api_result['title']
2021-03-25 08:31:39 +01:00
Adam Tauber 0ba71c3644 [fix] make ina engine compatible with the new response json 2021-03-25 01:20:41 +01:00
Adam Tauber 5f450fda74 [enh] add year filter to duckduckgo 2021-03-25 00:25:36 +01:00
Adam Tauber fd737dc9d8 [fix] remove debug code 2021-03-24 23:54:39 +01:00
Alexandre Flament 38c210d746
[mod] soundcloud: faster initialization
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.

This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
2021-03-21 09:29:53 +01:00
Adam Tauber 4c631ac6d0 [fix] remove debug code 2021-03-15 21:47:27 +01:00
Noémi Ványi 8158d8654a fix Microsoft Academic engine 2021-03-15 20:21:28 +01:00
Adam Tauber f97b4ff7b6 [fix] update youtube_noapi paging 2021-03-15 17:22:31 +01:00
Adam Tauber dd34ac396c
Merge pull request #2652 from kvch/solr-engine
Add Apache Solr engine
2021-03-15 15:39:39 +01:00
Alexandre Flament 1664258061
Merge pull request #2655 from return42/fix-imports
[fix] remove unused import from yahoo-news engine
2021-03-15 08:38:34 +01:00
Markus Heiser 6e1f1085ef [fix] remove unused import from yahoo-news engine
Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-14 15:13:57 +01:00
Markus Heiser 3703ebb22a [drop] Acgsou engine - www.acgsou.com no longer exists
- https://www.acgsou.com/ acgsou.com is redirected to 36dm.club
- @rinpatch do not plan on maintaining the engine [1]

[1] https://github.com/searx/searx/pull/1283#issuecomment-798783585

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-14 11:49:18 +01:00
Noémi Ványi ff527e2681 Add Solr engine 2021-03-13 21:18:09 +01:00
Alexandre Flament 92dd5e245e
Merge pull request #2626 from mikeri/solidtorrents
Add Solid Torrents engine
2021-03-12 19:45:22 +01:00
Alexandre Flament a1a492baed
Merge pull request #2641 from dalf/disable_http_by_default
[mod] by default allow only HTTPS, not HTTP
2021-03-12 19:21:46 +01:00
Markus Heiser 96422e5c9f [fix] APKMirror engine - update xpath selectors and fix img_src
BTW: make the code slightly more readable

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-09 08:34:57 +01:00
Markus Heiser d2faea423a [fix] rewrite Yahoo-News engine
Many things have been changed since last review of this engine.  This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-08 11:43:34 +01:00
Alexandre Flament 99e0651cea [mod] by default allow only HTTPS, not HTTP
Related to https://github.com/searx/searx/pull/2373
2021-03-08 11:35:08 +01:00
Michael Ilsaas 5549d58de3 Add Solid Torrents engine 2021-03-07 18:14:30 +01:00
Adam Tauber 44f4a9d49a [enh] add ability to send engine data to subsequent requests 2021-03-06 12:12:35 +01:00
Markus Heiser 4845183128 [mod] don't dump traceback of SearxEngineResponseException on init
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:

    ERROR:searx.engines:yggtorrent engine: Fail to initialize
    Traceback (most recent call last):
      File "share/searx/searx/engines/__init__.py", line 293, in engine_init
        init_fn(get_engine_from_settings(engine_name))
      File "share/searx/searx/engines/yggtorrent.py", line 42, in init
        resp = http_get(url, allow_redirects=False)
      File "share/searx/searx/poolrequests.py", line 197, in get
        return request('get', url, **kwargs)
      File "share/searx/searx/poolrequests.py", line 190, in request
        raise_for_httperror(response)
      File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
        raise_for_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
        raise_for_cloudflare_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
        raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
    searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000

For SearxEngineResponseException this is not needed.  Those types of exceptions
can be a normal use case.  E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:

    WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000

closes: #2612

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-05 17:26:22 +01:00
Markus Heiser d48e2e7b0b [enh] google scholar - python implementation of the engine
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-01 15:16:37 +01:00
Alexandre Flament f77983e174
Merge pull request #2602 from MarcAbonce/fix-bing-fetch-languages
Fix fetch_languages for Bing
2021-03-01 09:06:37 +01:00
GazoilKerozen 5f6ac3afa2
Add Freesound engine (#2596)
Add freesound engine with player.

Co-authored-by: Gazoil <maildeguzel@gmail.com>
2021-03-01 08:52:36 +01:00
Marc Abonce Seguin d6681fd33b remove articles number from engines_languages.json 2021-02-25 23:54:21 -07:00
Marc Abonce Seguin 9b6ffed061 fix fetch_languages for bing
Bing has a list of regions that it supports and some of these regions
may have more than one possible language.

In some cases, like Switzerland, these languages are always shown as
options, so there is no issue. But in other cases, like Andorra, Bing
will only show one language at the time, either the region's default or
the request's language if the latter is supported by that region.

For example, if the HTTP request is in French, Andorra will appear as
fr-AD but if the same page is requested in any other language Andorra
will appear as ca-AD.

This is specially a problem when Bing assumes that the request is in
English because it overrides enough language codes to make several major
languages like Arabic dissappear from the languages.py file.

To avoid that issue, I set the Accept-Language header to a language
that's only supported in one region to hopefully avoid these overrides.
2021-02-25 23:51:49 -07:00
Noémi Ványi 1be6ab2a91 Fix paging of Bing Images 2021-02-22 21:19:34 +01:00
datagram1 1d0a32a2c5 Added rumble.com video search engine. TODO video embedding.
Update rumble.py

some lines too long.

Disable Rumble engine

disabled : True

PEP8 fix

change line spacing
2021-02-20 12:48:56 +00:00