Commit Graph

1115 Commits

Author SHA1 Message Date
Markus Heiser dc29f1d826 [pylint] tag PYLINT_FILES by comment `# lint: pylint`
These py files are linted by `test.pylint`, all other files are linted by
`test.pep8`.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-26 20:18:20 +02:00
Markus Heiser 28b25185c5 [brand] searxng -- fix links to issue tracker & WEB-GUI
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-25 14:25:08 +02:00
Markus Heiser 8efabd3ab7 [mod] core.ac.uk engine
- add to list of pylint scripts
- add debug log messages
- move API key int `settings.yml`
- improved readability
- add some metadata to results

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-04-24 09:00:53 +02:00
spongebob33 7528e38c8a add core.ac.uk engine 2021-04-24 08:55:45 +02:00
Alexandre Flament d01741c9a2
Merge pull request #15 from return42/add-springer
Add a search engine for Springer Nature
2021-04-22 13:23:31 +02:00
Pierre Chevalier a80bf1ba97 [enh] Add Springer Nature engine
Springer Nature is a global publisher dedicated to providing service to research
community [1] with official API [2].

To test this PR, first get your API key following this page:

   https://dev.springernature.com/signup

In searx/engines/springer.py at line 24, add this API key.  I left my own key,
commented out in the line aboce.  Feel free to use it, if needed.

[1] https://www.springernature.com/
[2] https://dev.springernature.com/
2021-04-22 12:35:25 +02:00
habsinn 41a2e3785e [enh] add engine using API from "The Art Institute of Chicago" 2021-04-22 12:25:43 +02:00
Markus Heiser e9a6ab4015 [fix] youtube - send CONSENT Cookie to not be redirected
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking.  To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com

This patch adds a CONSENT Cookie to the youtube request to avoid redirection.

[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
2021-04-22 12:09:09 +02:00
Alexandre Flament c6d5605d27
Merge pull request #7 from searxng/metrics
Metrics
2021-04-22 08:34:17 +02:00
Alexandre Flament b7848e3422 [fix] searxng fix: sjp engine 2021-04-21 16:31:29 +02:00
Alexandre Flament 7acd7ffc02 [enh] rewrite and enhance metrics 2021-04-21 16:24:46 +02:00
Alexandre Flament aae7830d14 [mod] refactoring: processors
Report to the user suspended engines.

searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
2021-04-21 16:24:46 +02:00
Alexandre Flament 48720e20a8 Merge remote-tracking branch 'searx/master' 2021-04-19 09:35:12 +02:00
Noémi Ványi 8362257b9a
Merge pull request #2736 from plague-doctor/sjp
Add new engine: SJP - Słownik języka polskiego
2021-04-16 17:30:14 +02:00
Noémi Ványi e56323d3c8
Merge pull request #2759 from ypid/fix/typo
Fix grammar mistake in debug log output
2021-04-16 17:26:45 +02:00
Plague Doctor d275d7a35e Code refactoring. 2021-04-16 12:23:27 +10:00
Markus Heiser 062d589f86 [fix] xpath expressions to grap all items from bandcamp's response
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.

BTW: added bandcamp's favicon

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-15 08:52:11 +02:00
Kyle Anthony Williams 4d3c399ee9 [feat] add bandcamp engine 2021-04-15 08:52:11 +02:00
Alexandre Flament d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-04-12 17:25:56 +02:00
Robin Schneider dfc66ff0f0
Fix grammar mistake in debug log output 2021-04-11 22:12:53 +02:00
Alexandre Flament eaa694fb7d [enh] replace requests by httpx 2021-04-10 15:38:33 +02:00
Plague Doctor 599ff39ddf Fix conflicts 2021-04-09 06:54:03 +10:00
Plague Doctor 6631f11305 Add new engine: SJP 2021-04-08 10:21:54 +10:00
Plague Doctor 7035bed4ee Add new engine: Wordnik.com 2021-04-08 09:58:00 +10:00
Noémi Ványi 07f5edce3d Add Meilisearch engine
Website: https://www.meilisearch.com/
2021-04-06 21:57:05 +02:00
Alexandre Flament 725a69616b
Merge pull request #2681 from dalf/fix-wikipedia-title
[fix] wikipedia: remove HTML from the title
2021-03-27 17:43:36 +01:00
Noémi Ványi 9bb312c505 Remove duplicated key from dict in Semantic Scholar 2021-03-27 16:58:32 +01:00
Noémi Ványi f596f5767b fix Semantic Scholar engine 2021-03-27 16:54:01 +01:00
Adam Tauber 28286cf3f2 [fix] update seznam engine to be compatible with the new website 2021-03-27 15:29:04 +01:00
Alexandre Flament fcfcf662ff [fix] wikipedia: remove HTML from the title
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)

The commit uses api_result['title']
2021-03-25 08:31:39 +01:00
Adam Tauber 0ba71c3644 [fix] make ina engine compatible with the new response json 2021-03-25 01:20:41 +01:00
Adam Tauber 5f450fda74 [enh] add year filter to duckduckgo 2021-03-25 00:25:36 +01:00
Adam Tauber fd737dc9d8 [fix] remove debug code 2021-03-24 23:54:39 +01:00
Alexandre Flament 38c210d746
[mod] soundcloud: faster initialization
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.

This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
2021-03-21 09:29:53 +01:00
Adam Tauber 4c631ac6d0 [fix] remove debug code 2021-03-15 21:47:27 +01:00
Noémi Ványi 8158d8654a fix Microsoft Academic engine 2021-03-15 20:21:28 +01:00
Adam Tauber f97b4ff7b6 [fix] update youtube_noapi paging 2021-03-15 17:22:31 +01:00
Adam Tauber dd34ac396c
Merge pull request #2652 from kvch/solr-engine
Add Apache Solr engine
2021-03-15 15:39:39 +01:00
Alexandre Flament 1664258061
Merge pull request #2655 from return42/fix-imports
[fix] remove unused import from yahoo-news engine
2021-03-15 08:38:34 +01:00
Markus Heiser 6e1f1085ef [fix] remove unused import from yahoo-news engine
Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-14 15:13:57 +01:00
Markus Heiser 3703ebb22a [drop] Acgsou engine - www.acgsou.com no longer exists
- https://www.acgsou.com/ acgsou.com is redirected to 36dm.club
- @rinpatch do not plan on maintaining the engine [1]

[1] https://github.com/searx/searx/pull/1283#issuecomment-798783585

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-14 11:49:18 +01:00
Noémi Ványi ff527e2681 Add Solr engine 2021-03-13 21:18:09 +01:00
Alexandre Flament 92dd5e245e
Merge pull request #2626 from mikeri/solidtorrents
Add Solid Torrents engine
2021-03-12 19:45:22 +01:00
Alexandre Flament a1a492baed
Merge pull request #2641 from dalf/disable_http_by_default
[mod] by default allow only HTTPS, not HTTP
2021-03-12 19:21:46 +01:00
Markus Heiser 96422e5c9f [fix] APKMirror engine - update xpath selectors and fix img_src
BTW: make the code slightly more readable

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-09 08:34:57 +01:00
Markus Heiser d2faea423a [fix] rewrite Yahoo-News engine
Many things have been changed since last review of this engine.  This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-08 11:43:34 +01:00
Alexandre Flament 99e0651cea [mod] by default allow only HTTPS, not HTTP
Related to https://github.com/searx/searx/pull/2373
2021-03-08 11:35:08 +01:00
Michael Ilsaas 5549d58de3 Add Solid Torrents engine 2021-03-07 18:14:30 +01:00
Adam Tauber 44f4a9d49a [enh] add ability to send engine data to subsequent requests 2021-03-06 12:12:35 +01:00
Markus Heiser 4845183128 [mod] don't dump traceback of SearxEngineResponseException on init
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:

    ERROR:searx.engines:yggtorrent engine: Fail to initialize
    Traceback (most recent call last):
      File "share/searx/searx/engines/__init__.py", line 293, in engine_init
        init_fn(get_engine_from_settings(engine_name))
      File "share/searx/searx/engines/yggtorrent.py", line 42, in init
        resp = http_get(url, allow_redirects=False)
      File "share/searx/searx/poolrequests.py", line 197, in get
        return request('get', url, **kwargs)
      File "share/searx/searx/poolrequests.py", line 190, in request
        raise_for_httperror(response)
      File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
        raise_for_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
        raise_for_cloudflare_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
        raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
    searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000

For SearxEngineResponseException this is not needed.  Those types of exceptions
can be a normal use case.  E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:

    WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000

closes: #2612

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-05 17:26:22 +01:00