Commit Graph

1066 Commits

Author SHA1 Message Date
Alexandre Flament d703119d3a [enh] add raise_for_httperror
check HTTP response:
* detect some comme CAPTCHA challenge (no solving). In this case the engine is suspended for long a time.
* otherwise raise HTTPError as before

the check is done in poolrequests.py (was before in search.py).

update qwant, wikipedia, wikidata to use raise_for_httperror instead of raise_for_status
2020-12-11 14:37:08 +01:00
Alexandre Flament 033f39bff7
Merge pull request #2376 from dalf/fix-mojeek
Fix mojeek
2020-12-11 13:14:54 +01:00
Alexandre Flament 6bc6d5e9fd
Merge pull request #2371 from dalf/mod-genius
[mod) genious: return valid results even if contents are empty
2020-12-11 13:14:03 +01:00
Alexandre Flament d41cafd5f3 [fix] xpath, mojeek: fix commit 58d72f2692
before commit 58d72f2, category was not set in xpath.py,
so searx/engines/__init__py was setting the category to ['general']

the commit 58d72f2 set the category to [] which is not replaced by searx/engines/__init__.py
consequence: the mojeek engine is hidden in the preferences.

this commit revert the xpath.py change.

close #2368
2020-12-10 10:52:06 +01:00
Noémi Ványi 3a63dfbdd7 display if an engine does not support https
Closes #302
2020-12-09 20:49:54 +01:00
Alexandre Flament 1c9e7cef50 [remove] remove searchcode_doc and twitter
* twitter: the API has changed. the engine needs to rewritten.
* searchcode_doc: the API about documentation doesn't exist anymore.
2020-12-09 13:14:31 +01:00
Alexandre Flament fa73f10f11 [mod) genious: return valid results even if contents are empty 2020-12-09 13:01:34 +01:00
Alexandre Flament a77d8c8227
Merge pull request #2359 from dalf/update-duden
[mod] duden engine
2020-12-08 20:33:38 +01:00
Alexandre Flament bd4869ecd0
Merge pull request #2366 from dalf/remove-seedpeer
[remove] seedpeer engine
2020-12-08 20:33:23 +01:00
Alexandre Flament 56c64d6b64 [remove] seedpeer engine
the website is offline.
2020-12-07 21:02:29 +01:00
Alexandre Flament c1a9732268
Merge pull request #2364 from dalf/fix-youtube-noapi
[fix] youtube_noapi engine
2020-12-07 20:26:00 +01:00
Alexandre Flament 13d3004703
Merge pull request #2365 from dalf/fix-soundcloud
[fix] soundclound: accept result without content
2020-12-07 20:25:17 +01:00
Alexandre Flament 62073c0e1d
Merge pull request #2361 from dalf/fix-1x
[fix] 1x engine
2020-12-07 20:24:47 +01:00
Alexandre Flament 923bc02c17
Merge pull request #2363 from dalf/fix-wikipedia-minor
[fix] wikipedia: minor fix: return no result instead of crash in some very few cases.
2020-12-07 18:33:37 +01:00
Alexandre Flament deb1bde20d [fix] soundclound: accept result without content 2020-12-07 17:45:36 +01:00
Alexandre Flament 34df0f7910 [fix] youtube_noapi engine 2020-12-07 17:44:31 +01:00
Alexandre Flament 58d51e082d [fix] wikipedia: minor fix: return no result instead of crash in some very few cases.
In few cases, the JSON results doesn't contains the key 'type'.
2020-12-07 17:42:05 +01:00
Alexandre Flament 4ec810749b [fix] 1x engine 2020-12-07 15:46:00 +01:00
Alexandre Flament 1e781863fa [fix] command engine: SearchQuery.query is str not bytes
see c225db45c8
2020-12-07 10:43:42 +01:00
Alexandre Flament 9bf594cbcf [mod] duden engine
* add params['soft_max_redirects'] = 1  (when there is spelling suggestion)
* avoid try..except
* use eval_xpath_* functions
2020-12-07 10:31:11 +01:00
Alexandre Flament a458451d20
Merge pull request #2356 from dalf/fix-ddd
[fix] duckduckgo_definitions: fix relative image URL
2020-12-07 10:16:53 +01:00
Alexandre Flament 925bb561a2
Merge pull request #2352 from dalf/no_http
Remove HTTP connections as much as possible
2020-12-06 10:18:49 +01:00
Alexandre Flament 28cc644f0a [fix] duckduckgo_definitions: fix relative image URL
ddg returns relative URL to https://duckduckgo.com/
2020-12-06 10:14:09 +01:00
Alexandre Flament cdceec1cbb
Merge pull request #2354 from dalf/fix-wikipedia
[fix] wikipedia engine: don't raise an error when the query is not found
2020-12-04 20:42:45 +01:00
Alexandre Flament f0054d67f1 [fix] wikipedia engine: don't raise an error when the query is not found
Add a new parameter "raise_for_status", set by default to True.
When True, any HTTP status code >= 300 raise an exception ( #2332 )
When False, the engine can manage the HTTP status code by itself.
2020-12-04 20:04:39 +01:00
Alexandre Flament bef2f2efa8 [fix] wikidata: fix crash when the item has no description at all and at least one URL. 2020-12-04 17:17:20 +01:00
Alexandre Flament 244e812f37 [fix] remove searx/engines/filecrop.py (dead code) 2020-12-04 16:48:15 +01:00
Alexandre Flament fa909c7c02 [mod] stackoverflow & yandex: detect CAPTCHA response 2020-12-03 13:23:19 +01:00
Alexandre Flament 64cccae99e [mod] various engines: use eval_xpath* functions and searx.exceptions.*
Engine list: ahmia, duckduckgo_images, elasticsearch, google, google_images, google_videos, youtube_api
2020-12-03 10:22:48 +01:00
Alexandre Flament ad72803ed9 [mod] xpath, 1337x, acgsou, apkmirror, archlinux, arxiv: use eval_xpath_* functions 2020-12-03 10:22:48 +01:00
Alexandre Flament de887c6347 [mod] bing_news: use eval_xpath_getindex
remove unused function searx.utils.list_get
2020-12-03 10:22:48 +01:00
Alexandre Flament 1d0c368746 [enh] record details exception per engine
add an new API /stats/errors
2020-12-03 10:22:48 +01:00
Markus Heiser bef185723a [refactor] digg - improve results and clean up source code
- strip html tags and superfluous quotation marks from content
- remove not needed cookie from request
- remove superfluous imports

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-12-02 21:54:27 +01:00
Markus Heiser 6b0a896f01 [mod] digg - pylint searx/engines/digg.py
Eliminate redundant file names which are tested by test.pylint and ignored by
test.pep8

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-12-02 20:59:30 +01:00
Markus Heiser 173b744ef0 [fix] digg - the ISO time stamp of published date has been changed
Error pattern::

    Engines cannot retrieve results:
    digg (unexpected crash time data '2020-10-16T14:09:55Z' does not match format '%Y-%m-%d %H:%M:%S')

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-12-02 20:40:12 +01:00
Alexandre Flament b00d108673 [mod] pylint: numerous minor code fixes 2020-12-01 15:21:19 +01:00
Alexandre Flament 9ed3ee2beb [mod] wikidata: WDGeoAttribute class: doesn't change the method signature of get_str 2020-12-01 15:21:17 +01:00
Alexandre Flament 3cfef61123 [fix] /stats: report error percentage instead of error count
This bug exists since the PR https://github.com/searx/searx/pull/751
2020-12-01 15:07:09 +01:00
Noémi Ványi 4a36a3044d
Add recoll engine (#2325)
recoll is a local search engine based on Xapian:
http://www.lesbonscomptes.com/recoll/

By itself recoll does not offer web or API access,
this can be achieved using recoll-webui:
https://framagit.org/medoc92/recollwebui.git

This engine uses a custom 'files' result template

set `base_url` to the location where recoll-webui can be reached
set `dl_prefix` to a location where the file hierarchy as indexed by recoll can be reached
set `search_dir` to the part of the indexed file hierarchy to be searched, use an empty string to search the entire search domain
2020-11-30 08:35:15 +01:00
M. Efe Çetin d1f527c3af
Photon API Link Update
Via https://photon.komoot.io/
2020-11-27 10:22:28 +03:00
Alexandre Flament 3786920df9 [enh] Add multiple outgoing proxies
credits go to @bauruine see https://github.com/searx/searx/pull/1958
2020-11-20 15:29:21 +01:00
Markus Heiser c71d214b0c [refactor] deviantart - improve results and clean up source code
Devian's request and response forms has been changed.

- fixed title
- fixed time_range_dict to 'popular-*-***'
- use image from <noscript> if exists
- drop obsolete "http to https, remove domain sharding"
- use query URL https://www.deviantart.com/search/deviations?page=5&q=foo
- add searx/engines/deviantart.py to pylint check (test.pylint)

Error pattern::

    There DEBUG:searx:result: invalid title: {'url': 'https://www.deviantart.com/  ...

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-11-14 17:09:56 +01:00
Alexandre Flament 3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Alexandre Flament c3d9b17c2a
Merge pull request #2292 from kvch/elasticsearch-engine
New engine: Elasticsearch
2020-11-14 13:25:08 +01:00
Alexandre Flament 102c08838b
Merge pull request #2289 from dalf/pylint
[mod] pylint: add extension-pkg-whitelist=lxml.etree
2020-11-14 13:24:31 +01:00
Noémi Ványi 43e697681e New engine: Elasticsearch 2020-11-10 19:53:38 +01:00
Alexandre Flament 58d72f2692 [mod] pylint: minor code change to allow pylint globally
This commit is only a step, it doesn't fix all the issues reported by pylint
2020-11-03 11:35:53 +01:00
Alexandre Flament eed43783f9 [fix] comamnd engine: fix import 2020-11-03 10:55:08 +01:00
Alexandre Flament a08df82574 [fix] scanr_structure engine: fix import 2020-11-03 10:54:02 +01:00
Alexandre Flament 95bd6033fa [mod] wikidata engine: use one SPARQL request instead of 2 HTTP requests. 2020-10-28 08:09:25 +01:00
Alexandre Flament ca593728af [mod] duckduckgo_definitions: display only user friendly attributes / URL
various bug fixes
2020-10-28 08:09:25 +01:00
a01200356 c3daa08537 [enh] Add onions category with Ahmia, Not Evil and Torch
Xpath engine and results template changed to account for the fact that
archive.org doesn't cache .onions, though some onion engines migth have
their own cache.

Disabled by default. Can be enabled by setting the SOCKS proxies to
wherever Tor is listening and setting using_tor_proxy as True.

Requires Tor and updating packages.

To avoid manually adding the timeout on each engine, you can set
extra_proxy_timeout to account for Tor's (or whatever proxy used) extra
time.
2020-10-25 17:59:05 -07:00
Nicholas Kegler 8e15d3e4c1 Open Semantic Search Engine 2020-10-25 17:50:00 +01:00
Noémi Ványi e158eeee4b Propagate error messages from YouTube API 2020-10-09 17:34:26 +02:00
Adam Tauber 835d16cbb1
Merge pull request #2255 from kvch/yacy-improvements
Add yacy improvements: HTTP digest auth, category checking
2020-10-09 16:34:42 +02:00
Alexandre Flament cfd21bc475 [fix] fix duckduckgo engine
- remove paging support: a "vqd" parameter is required between each request. This parameter is uniq for each request
- update the URL (no redirect), use the POST method
- language support: works if there is no more than request per minute, otherwise it is ignored !
2020-10-09 16:00:42 +02:00
Noémi Ványi 72c7fd25fe Add yacy improvements: HTTP digest auth, category checking 2020-10-09 15:06:05 +02:00
Noémi Ványi f0278d41fc add ebay enginte to shopping category 2020-10-08 13:20:55 +02:00
Alexandre Flament a9dc54bebc [mod] Add searx.data module
Instead of loading the data/*.json in different location,
load these files in the new searx.data module.
2020-10-07 10:29:34 +02:00
Alexandre Flament 8659212f5a [fix] drop Python 2: use collections.abc.Iterable instead of collections.Iterable 2020-10-06 09:43:24 +02:00
Alexandre Flament b728cb610b
Merge pull request #2241 from dalf/move-extract-text-and-url
Move the extract_text  and extract_url functions to searx.utils
2020-10-04 09:06:20 +02:00
Finn 53c8d945b4
[enh] Add SepiaSearch engine (#2227)
supported_languages values: see https://framagit.org/framasoft/peertube/search-index/-/blob/master/client/src/views/Search.vue#L618-641
2020-10-03 13:00:10 +02:00
Alexandre Flament 2006eb4680 [mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00
Markus Heiser 8162d7aff4 [fix] google engine - div classes has been renamed in HTML reult
Since 1. October 2020 google has changed the 'class' attribute of the HTML
result page.

Fix the xpath expressions and ignore <div class="g" ../> sections which do not
match to title's xpath expression.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-10-01 09:44:29 +02:00
Alexandre Flament f204e4903d [fix] migration from github.com/asciimoo/searx to github.com/searx/searx : fix URLs 2020-09-28 16:44:14 +02:00
Marc Abonce Seguin ecf5899153 fetch google's search langs rather than ui langs 2020-09-22 11:37:44 +02:00
Marc Abonce Seguin 41800835f9 fetch supported languages for startpage engine 2020-09-22 11:37:44 +02:00
Marc Abonce Seguin ea9d979cc3 add language names in qwant's fetch languages function 2020-09-22 11:37:44 +02:00
Dalf c225db45c8 Drop Python 2 (4/n): SearchQuery.query is a str instead of bytes 2020-09-10 10:49:42 +02:00
Dalf 1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
Marc Abonce Seguin ab20ca182c use Wikipedia's REST v1 API 2020-09-10 09:54:30 +02:00
Noémi Ványi f0ca1c3483
[enh] Add command line engines: git grep, find, etc. (#2128)
A new "base" engine called command is introduced. It is the foundation for all command line engines for now.
You can use this engine to create your own command line engine.

Add some engines (commented out to make sure no one enables anything accidentally):
* git grep: This engine lets you grep in the searx repo.
* locate: If locate is installed and initialized, you can search on the FS.
* find: You can find files with a specific name from where you started searx.
* pattern search in files: This engine utilizes the command fgrep.
* regex search in files: This engine runs `grep` to find a file based on its contents.
2020-09-08 09:51:53 +02:00
Alexandre Flament 3397382754
[enh] stop searx when an engine raise an SyntaxError exception (#2177)
and some other exceptions:
* KeyboardInterrupt
* SystemExit
* RuntimeError
* SystemError
* ImportError: an engine with an unmet dependency will stop everything.
2020-09-07 15:39:26 +02:00
Alexandre Flament b329058c1a Revert "[enh] test: load each engine to check for syntax errors"
This reverts commit 4fb3ed2c63.
2020-08-31 19:00:06 +02:00
Adam Tauber 6f9aa0e258
Merge pull request #2160 from dalf/test_load_engine
[enh] test: load each engine to check for syntax errors
2020-08-31 14:29:52 +02:00
Adam Tauber 6ded6e7a9a [fix] skip uncomplete image results - closes #1496 2020-08-31 14:07:45 +02:00
Dalf 4fb3ed2c63 [enh] test: load each engine to check for syntax errors 2020-08-28 12:12:32 +02:00
Marc Abonce Seguin 0d8970c8f2
only return one url per "type" in Wikidata (#2151)
i.e. only one official website, one Twitter, etc.
2020-08-27 21:44:48 +02:00
Émilien Devos 27d74826f1
[enh] add yggtorrent engine (#2135) 2020-08-18 18:02:41 +02:00
Emilien Devos c15a91a534 [fix] piratebay engine date and pep8 indentation 2020-08-10 23:44:53 +02:00
Emilien Devos 52d78d8418 [fix] piratebay engine 2020-08-10 20:26:59 +02:00
Adam Tauber 77103c7874
Merge pull request #2116 from mikeri/invidiousres
Include author and video length in Invidious results
2020-08-10 12:49:17 +02:00
Vlad f678388dbc
Fix google images 'get image' button bug from issue #2103 (#2115)
Closes #2103
2020-08-08 19:35:22 +02:00
Michael Ilsaas a1ce141c99
add peertube engine (#2109) 2020-08-08 19:22:53 +02:00
Michael Ilsaas 2ed8ad7691 include length in invidious results 2020-08-02 13:31:04 +02:00
Michael Ilsaas 0305fe0dd5 include author in invidious results 2020-08-02 13:30:38 +02:00
Marc Abonce Seguin 77b9faa8df fix Wikipedia's paragraph extraction 2020-07-26 23:53:40 -07:00
Michael Ilsaas 98cb6b6701 Update torrentz2 URL from .eu to .is 2020-07-26 15:56:54 +02:00
xywei 1d4657b714
Fix relative urls that do not start with '/' 2020-07-23 11:12:19 -05:00
Gaspard d'Hautefeuille 4e346e741a
fix python 3 support 2020-07-12 23:43:24 +01:00
Adam Tauber 52eba0c721 [fix] pep8 2020-07-08 00:46:03 +02:00
Markus Heiser 16f8ec894a [fix] revise google images engine
this commit is picked from #1985
2020-07-07 21:59:15 +02:00
Markus Heiser 410c2f903d [fix] revise google engine
this commit is picked from #1985
2020-07-07 21:50:59 +02:00
Markus Heiser 8d318ee142
Merge branch 'master' into gigablast 2020-06-29 16:09:59 +00:00
Sophie Tauchert 71db7b1238
Fix YaCy text results returned as images 2020-06-29 14:48:56 +02:00
Noémi Ványi 93cbd85b8a
Merge branch 'master' into duckduckgo_correction 2020-06-28 20:28:12 +02:00
Markus Heiser 5fac6cffa2
Merge branch 'master' into gigablast 2020-06-26 08:09:33 +00:00
Markus Heiser 5293e58032 [fix] yahoo engine - changed content_xpath
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-06-25 21:45:42 +02:00
Markus Heiser 223430ff30
Merge branch 'master' into gigablast 2020-06-16 07:36:44 +00:00
Adam Tauber 32f7877235 [fix] resolve flickr_noapi encoding issues 2020-06-15 19:15:24 +02:00