searxng

Commit Graph

Author	SHA1	Message	Date
Alexandre Flament	8c1a65d32f	[mod] multithreading only in searx.search.* packages it prepares the new architecture change, everything about multithreading in moved in the searx.search.* packages previously the call to the "init" function of the engines was done in searx.engines: * the network was not set (request not sent using the defined proxy) * it requires to monkey patch the code to avoid HTTP requests during the tests	2021-05-05 13:12:42 +02:00
Alexandre Flament	7acd7ffc02	[enh] rewrite and enhance metrics	2021-04-21 16:24:46 +02:00
Alexandre Flament	aae7830d14	[mod] refactoring: processors Report to the user suspended engines. searx.search.processor.abstract: * manages suspend time (per network). * reports suspended time to the ResultContainer (method extend_container_if_suspended) * adds the results to the ResultContainer (method extend_container) * handles exceptions (method handle_exception)	2021-04-21 16:24:46 +02:00
Alexandre Flament	d14994dc73	[httpx] replace searx.poolrequests by searx.network settings.yml: * outgoing.networks: * can contains network definition * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections, keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time) * local_addresses can be "192.168.0.1/24" (it supports IPv6) * support_ipv4 & support_ipv6: both True by default see https://github.com/searx/searx/pull/1034 * each engine can define a "network" section: * either a full network description * either reference an existing network * all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)	2021-04-12 17:25:56 +02:00
Alexandre Flament	99e0651cea	[mod] by default allow only HTTPS, not HTTP Related to https://github.com/searx/searx/pull/2373	2021-03-08 11:35:08 +01:00
Adam Tauber	44f4a9d49a	[enh] add ability to send engine data to subsequent requests	2021-03-06 12:12:35 +01:00
Markus Heiser	4845183128	[mod] don't dump traceback of SearxEngineResponseException on init When initing engines a "SearxEngineResponseException" is logged very verbose, including full traceback information: ERROR:searx.engines:yggtorrent engine: Fail to initialize Traceback (most recent call last): File "share/searx/searx/engines/__init__.py", line 293, in engine_init init_fn(get_engine_from_settings(engine_name)) File "share/searx/searx/engines/yggtorrent.py", line 42, in init resp = http_get(url, allow_redirects=False) File "share/searx/searx/poolrequests.py", line 197, in get return request('get', url, *kwargs) File "share/searx/searx/poolrequests.py", line 190, in request raise_for_httperror(response) File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror raise_for_captcha(resp) File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha raise_for_cloudflare_captcha(resp) File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 24 * 15) searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000 For SearxEngineResponseException this is not needed. Those types of exceptions can be a normal use case. E.g. for CAPTCHA errors like shown in the example above. It should be enough to log a warning for such issues: WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000 closes: #2612 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2021-03-05 17:26:22 +01:00
Marc Abonce Seguin	9b6ffed061	fix fetch_languages for bing Bing has a list of regions that it supports and some of these regions may have more than one possible language. In some cases, like Switzerland, these languages are always shown as options, so there is no issue. But in other cases, like Andorra, Bing will only show one language at the time, either the region's default or the request's language if the latter is supported by that region. For example, if the HTTP request is in French, Andorra will appear as fr-AD but if the same page is requested in any other language Andorra will appear as ca-AD. This is specially a problem when Bing assumes that the request is in English because it overrides enough language codes to make several major languages like Arabic dissappear from the languages.py file. To avoid that issue, I set the Accept-Language header to a language that's only supported in one region to hopefully avoid these overrides.	2021-02-25 23:51:49 -07:00
Alexandre Flament	ca93a01844	[mod] dynamically set language_support variable The language_support variable is set to True by default, and set to False in only 5 engines. Except the documentation and the /config URL, this variable is not used. This commit remove the variable definition in the engines, and set value according to supported_languages length: False when the length is 0, True otherwise. Close #2485	2021-02-01 17:10:37 +01:00
Alexandre Flament	7ec8bc3ea7	[mod] split searx.search into different processors see searx.search.processors.abstract.EngineProcessor First the method searx call the get_params method. If the return value is not None, then the searx call the method search.	2020-12-17 11:39:36 +01:00
Alexandre Flament	d703119d3a	[enh] add raise_for_httperror check HTTP response: * detect some comme CAPTCHA challenge (no solving). In this case the engine is suspended for long a time. * otherwise raise HTTPError as before the check is done in poolrequests.py (was before in search.py). update qwant, wikipedia, wikidata to use raise_for_httperror instead of raise_for_status	2020-12-11 14:37:08 +01:00
Noémi Ványi	3a63dfbdd7	display if an engine does not support https Closes #302	2020-12-09 20:49:54 +01:00
Alexandre Flament	1d0c368746	[enh] record details exception per engine add an new API /stats/errors	2020-12-03 10:22:48 +01:00
Alexandre Flament	3cfef61123	[fix] /stats: report error percentage instead of error count This bug exists since the PR https://github.com/searx/searx/pull/751	2020-12-01 15:07:09 +01:00
Alexandre Flament	3786920df9	[enh] Add multiple outgoing proxies credits go to @bauruine see https://github.com/searx/searx/pull/1958	2020-11-20 15:29:21 +01:00
a01200356	c3daa08537	[enh] Add onions category with Ahmia, Not Evil and Torch Xpath engine and results template changed to account for the fact that archive.org doesn't cache .onions, though some onion engines migth have their own cache. Disabled by default. Can be enabled by setting the SOCKS proxies to wherever Tor is listening and setting using_tor_proxy as True. Requires Tor and updating packages. To avoid manually adding the timeout on each engine, you can set extra_proxy_timeout to account for Tor's (or whatever proxy used) extra time.	2020-10-25 17:59:05 -07:00
Alexandre Flament	a9dc54bebc	[mod] Add searx.data module Instead of loading the data/*.json in different location, load these files in the new searx.data module.	2020-10-07 10:29:34 +02:00
Marc Abonce Seguin	ea9d979cc3	add language names in qwant's fetch languages function	2020-09-22 11:37:44 +02:00
Alexandre Flament	3397382754	[enh] stop searx when an engine raise an SyntaxError exception (#2177 ) and some other exceptions: * KeyboardInterrupt * SystemExit * RuntimeError * SystemError * ImportError: an engine with an unmet dependency will stop everything.	2020-09-07 15:39:26 +02:00
Alexandre Flament	b329058c1a	Revert "[enh] test: load each engine to check for syntax errors" This reverts commit `4fb3ed2c63`.	2020-08-31 19:00:06 +02:00
Dalf	4fb3ed2c63	[enh] test: load each engine to check for syntax errors	2020-08-28 12:12:32 +02:00
Noémi Ványi	e3282748d0	add display_error_messages option to engine settings A new option is added to engines to hide error messages from users. It is called `display_error_messages` and by default it is set to `True`. If it is set to `False` error messages do not show up on the UI. Keep in mind that engines are still suspended if needed regardless of this setting. Closes #1828	2020-05-31 19:17:48 +02:00
Noémi Ványi	99435381a8	[enh] introduce private engines This PR adds a new setting to engines named `tokens`. It expects a list of tokens which lets searx validate if the request should be accepted or not.	2020-02-08 11:47:39 +01:00
Noémi Ványi	5796dc60c9	fix pep 8 check	2019-10-16 15:52:48 +02:00
Noémi Ványi	a6f20caf32	add initial support for offline engines && command engine	2019-10-16 15:52:48 +02:00
Dalf	23611897ec	[fix] make sure then engine name is lower case Minor fix: "%s engine initialized" display the right engine name	2019-07-27 08:52:30 +02:00
Marc Abonce Seguin	51111c2594	[fix] always set langauge_aliases even if it's empty	2019-01-06 20:49:56 -06:00
Marc Abonce Seguin	772c048d01	refactor engine's search language handling Add match_language function in utils to match any user given language code with a list of engine's supported languages. Also add language_aliases dict on each engine to translate standard language codes into the custom codes used by the engine.	2018-03-27 00:08:03 -06:00
Adam Tauber	2f69eaeb2f	[fix] fix engine initialization	2018-02-17 14:30:06 +01:00
Marc Abonce Seguin	829032f306	[fix] read utf-8 files (settings, languages, currency) with python3.5 Related to discussion in #1124 The io.open import is necessary for python2	2018-01-16 23:26:10 -06:00
Joseph Nuthalapati	bdc803e185	Make Python 3 able to read settings files with Unicode characters SearX currently doesn't start up when run with Python 3 as it tries to parse the settings.yml file with ASCII codecs. There are similar problems with engines_languages.json and currencies.json Python 3 requires that files with Unicode characters be read with a 'b' flag. This also works with Python 2 and hence can be integrated into the main source code. Tested with the latest Python 3.6.4rc1 on Debian unstable. Signed-off-by: Joseph Nuthalapati <njoseph@thoughtworks.com>	2017-12-21 17:33:19 +05:30
Adam Tauber	0f6612bb40	[mod] separate engine load and initialization	2017-07-21 14:27:25 +02:00
Adam Tauber	1794f6a4d3	[enh] add "inactive" attribute to engines This modification allows us to deactivate engines in settings.yml without commenting them out	2017-07-20 13:32:20 +02:00
Adam Tauber	343ac7197d	[fix] pep8	2017-06-06 23:37:42 +02:00
Adam Tauber	78365ffb8a	[enh] add init function to engines which loads parallel	2017-06-06 22:20:20 +02:00
Adam Tauber	52e615dede	[enh] py3 compatibility	2017-05-15 12:02:30 +02:00
Alexandre Flament	12d91c1d67	[mod] searx doesn't crash at startup when an engine can't be loaded (see #884 )	2017-04-08 17:38:46 +02:00
Adam Tauber	8bff42f049	Merge branch 'master' into languages	2016-12-28 20:00:53 +01:00
Adam Tauber	ea034fafa9	[fix] proper engine init	2016-12-27 17:55:44 +01:00
Adam Tauber	a605377c40	[enh] explicit engine init	2016-12-27 17:31:14 +01:00
marc	af35eee10b	tests for _fetch_supported_languages in engines and refactor method to make it testable without making requests	2016-12-15 00:40:21 -06:00
marc	f62ce21f50	[mod] fetch supported languages for several engines utils/fetch_languages.py gets languages supported by each engine and generates engines_languages.json with each engine's supported language.	2016-12-13 19:58:10 -06:00
marc	149802c569	[enh] add supported_languages on engines and auto-generate languages.py	2016-12-13 19:32:00 -06:00
Alexandre Flament	e48f07a367	Merge branch 'master' into searchpy2	2016-12-09 23:11:45 +01:00
Adam Tauber	55dc538398	[mod] move load_module function to utils	2016-11-19 17:51:19 +01:00
Alexandre Flament	01e2648e93	Simplify search.py, basically updated PR #518 The timeouts in settings.yml is about the total time (not only the HTTP request but also the prepare the request and parsing the response) It was more or less the case before since the threaded_requests function ignores the thread after the timeout even the HTTP request is ended. New / changed stats : * page_load_time : record the HTTP request time * page_load_count: the number of HTTP request * engine_time : the execution total time of an engine * engine_time_count : the number of "engine_time" measure The avg response times in the preferences are the engine response time (engine_load_time / engine_load_count) To sum up : * Search.search() filters the engines that can't process the request * Search.search() call search_multiple_requests function * search_multiple_requests creates one thread per engine, each thread runs the search_one_request function * search_one_request calls the request function, make the HTTP request, calls the response function, extends the result_container * search_multiple_requests waits for the the thread to finish (or timeout)	2016-11-05 13:45:20 +01:00
Adam Tauber	86daef2063	[fix] do not allow underscore in engine names - closes #708	2016-09-28 22:30:05 +02:00
Adam Tauber	7d9c898170	Merge pull request #634 from kvch/advanced-search support time range search	2016-07-26 00:06:16 +02:00
Adam Tauber	54d987636e	[fix] do not load engines which cannot be initialized - closes #585	2016-07-25 23:36:52 +02:00
Noemi Vanyi	93c0c49e9a	add time range search with yahoo	2016-07-25 23:19:46 +02:00

1 2 3

139 Commits