searxng

Commit Graph

Author	SHA1	Message	Date
pankaj	4900c091a6	use logger.warning logger.warn() is depricated. logger.warning is already being used in some files.	2023-05-19 19:35:29 +05:30
Markus Heiser	27369ebec2	[fix] searxng_extra/update/update_engine_descriptions.py (part 1) Follow up of #2269 The script to update the descriptions of the engines does no longer work since PR #2269 has been merged. searx/engines/wikipedia.py ========================== 1. There was a misusage of zh-classical.wikipedia.org: - `zh-classical` is dedicate to classical Chinese [1] which is not traditional Chinese [2]. - zh.wikipedia.org has LanguageConverter enabled [3] and is going to dynamically show simplified or traditional Chinese according to the HTTP Accept-Language header. 2. The update_engine_descriptions.py needs a list of all wikipedias. The implementation from #2269 included only a reduced list: - https://meta.wikimedia.org/wiki/Wikipedia_article_depth - https://meta.wikimedia.org/wiki/List_of_Wikipedias searxng_extra/update/update_engine_descriptions.py ================================================== Before PR #2269 there was a match_language() function that did an approximation using various methods. With PR #2269 there are only the types in the data model of the languages, which can be recognized by babel. The approximation methods, which are needed (only here) in the determination of the descriptions, must be replaced by other methods. [1] https://en.wikipedia.org/wiki/Classical_Chinese [2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters [3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter Closes: https://github.com/searxng/searxng/issues/2330 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2023-04-15 16:03:59 +02:00
Markus Heiser	4d4aa13e1f	[mod] remove obsolete EngineTraits.supported_languages All engines has been migrated from ``supported_languages`` to the ``fetch_traits`` concept. There is no longer a need for the obsolete code that implements the ``supported_languages`` concept. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2023-03-24 10:37:42 +01:00
Markus Heiser	2499899554	[mod] Google: reversed engineered & upgrade to data_type: traits_v1 Partial reverse engineering of the Google engines including a improved language and region handling based on the engine.traits_v1 data. When ever possible the implementations of the Google engines try to make use of the async REST APIs. The get_lang_info() has been generalized to a get_google_info() function / especially the region handling has been improved by adding the cr parameter. searx/data/engine_traits.json Add data type "traits_v1" generated by the fetch_traits() functions from: - Google (WEB), - Google images, - Google news, - Google scholar and - Google videos and remove data from obsolete data type "supported_languages". A traits.custom type that maps region codes to supported_domains is fetched from https://www.google.com/supported_domains searx/autocomplete.py: Reversed engineered autocomplete from Google WEB. Supports Google's languages and subdomains. The old API suggestqueries.google.com/complete has been replaced by the async REST API: https://{subdomain}/complete/search?{args} searx/engines/google.py Reverse engineering and extensive testing .. - fetch_traits(): Fetch languages & regions from Google properties. - always use the async REST API (formally known as 'use_mobile_ui') - use supported_domains from traits - improved the result list by fetching './/div[@data-content-feature]' and parsing the type of the various content features --> thumbnails are added searx/engines/google_images.py Reverse engineering and extensive testing .. - fetch_traits(): Fetch languages & regions from Google properties. - use supported_domains from traits - if exists, freshness_date is added to the result - issue 1864: result list has been improved a lot (due to the new cr parameter) searx/engines/google_news.py Reverse engineering and extensive testing .. - fetch_traits(): Fetch languages & regions from Google properties. supported_domains is not needed but a ceid list has been added. - different region handling compared to Google WEB - fixed for various languages & regions (due to the new ceid parameter) / avoid CONSENT page - Google News do no longer support time range - result list has been fixed: XPath of pub_date and pub_origin searx/engines/google_videos.py - fetch_traits(): Fetch languages & regions from Google properties. - use supported_domains from traits - add paging support - implement a async request ('asearch': 'arc' & 'async': 'use_ac:true,_fmt:html') - simplified code (thanks to '_fmt:html' request) - issue 1359: fixed xpath of video length data searx/engines/google_scholar.py - fetch_traits(): Fetch languages & regions from Google properties. - use supported_domains from traits - request(): include patents & citations - response(): fixed CAPTCHA detection (Scholar has its own CATCHA manager) - hardening XPath to iterate over results - fixed XPath of pub_type (has been change from gs_ct1 to gs_cgt2 class) - issue 1769 fixed: new request implementation is no longer incompatible Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2023-03-24 10:37:42 +01:00
Markus Heiser	6e5f22e558	[mod] replace engines_languages.json by engines_traits.json Implementations of the traits of the engines. Engine's traits are fetched from the origin engine and stored in a JSON file in the data folder. Most often traits are languages and region codes and their mapping from SearXNG's representation to the representation in the origin search engine. To load traits from the persistence:: searx.enginelib.traits.EngineTraitsMap.from_data() For new traits new properties can be added to the class:: searx.enginelib.traits.EngineTraits .. hint:: Implementation is downward compatible to the deprecated supported_languages method from the vintage implementation. The vintage code is tagged as deprecated an can be removed when all engines has been ported to the traits method. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2023-03-24 10:37:42 +01:00
Markus Heiser	4c06837a50	[mod] make python code pylint 2.16.1 compliant Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2023-02-10 13:59:21 +01:00
Alexandre Flament	32e8c2cf09	searx.network: add "verify" option to the networks Each network can define a verify option: * false to disable certificate verification * a path to existing certificate. SearXNG uses SSL_CERT_FILE and SSL_CERT_DIR when they are defined see https://www.python-httpx.org/environment_variables/#ssl_cert_file	2022-10-14 13:59:22 +00:00
Markus Heiser	ba8959ad7c	[fix] typos / reported by @kianmeng in searx PR-3366 [PR-3366] https://github.com/searx/searx/pull/3366 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2022-09-27 18:32:14 +02:00
Markus Heiser	8df1f0c47e	[mod] add 'Accept-Language' HTTP header to online processores Most engines that support languages (and regions) use the Accept-Language from the WEB browser to build a response that fits to the language (and region). - add new engine option: send_accept_language_header Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2022-08-01 17:01:59 +02:00
Markus Heiser	a2badb4fe4	[doc] add description of method EngineProcessor.get_params() Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2022-08-01 16:42:33 +02:00
Markus Heiser	c63fab6928	Merge pull request #1443 from return42/fix-online_dictionary [fix] online_dictionary: regular expression	2022-07-07 16:25:10 +02:00
Markus Heiser	480476fdf3	[fix] online_dictionary: regular expression The query term of a engine-type `online_dictionary` can consist of more than one word. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2022-07-07 15:58:29 +02:00
Émilien Devos	63a995b8c1	Better explanation for the use of use_mobile_ui	2022-07-06 00:10:09 +02:00
Emilien Devos	0d4c066119	notify the user that use_mobile_ui parameter exist	2022-06-11 17:20:56 +02:00
Markus Heiser	2de007138c	[fix] prepare for pylint 2.14.0 Remove issue reported by Pylint 2.14.0: - no-self-use: has been moved to optional extension [1] - The refactoring checker now also raises 'consider-using-generator' messages for max(), min() and sum(). [2] .pylintrc: - <option name>-hint has been removed since long, Pylint 2.14.0 raises an error on invalid options - bad-continuation and bad-whitespace have been removed [3] [1] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/summary.html#removed-checkers [2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/full.html#what-s-new-in-pylint-2-14-0 [2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.6/summary.html#summary-release-highlights Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2022-06-03 15:41:52 +02:00
Markus Heiser	e92d40c854	[enh] implement a OnlineUrlSearchProcessor Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2022-01-30 16:05:08 +01:00
Martin Fischer	def62c3a47	[typing] add type hints for dictionaries	2022-01-17 11:42:48 +01:00
Markus Heiser	3d96a9839a	[format.python] initial formatting of the python code This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2021-12-27 09:26:22 +01:00
Markus Heiser	fcdc2c2cd2	[format.python] disable py code formatting for some hunks of code Disable the python code formatting from python-black, where the readability of code suffers by formatting. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2021-12-27 09:16:03 +01:00
Markus Heiser	443bf35e09	[pylint] fix global-variable-not-assigned issues If there is no write access, there is no need for global. Remove global statement if there is no assignment. global-variable-not-assigned: Using global for names but no assignment is done Used when a variable is defined through the "global" statement but no assignment to this variable is done. In Pylint 2.11 the global-variable-not-assigned checker now catches global variables that are never reassigned in a local scope and catches (reassigned) functions [1][2] [1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html [2] https://github.com/PyCQA/pylint/issues/1375 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2021-09-17 10:14:27 +02:00
Alexandre Flament	b513917ef9	[mod] searx.metrics & searx.search: use the engine loggers metrics & processors use the engine logger	2021-09-10 21:49:34 +02:00
Markus Heiser	2a3b9a2e26	[pylint] searx: drop no longer needed 'missing-function-docstring' Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2021-09-07 13:34:35 +02:00
Markus Heiser	24f2376c11	[pylint] prepare for pylint v2.9.3 / fix some (new) pylint issues Upgrade from pylint v2.8.3 to 2.9.3 raise some new issues:: searx/search/checker/__main__.py:37:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with) searx/search/checker/__main__.py:38:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with) searx/search/processors/__init__.py:20:0: R0402: Use 'from searx import engines' instead (consider-using-from-import) searx/preferences.py:182:19: C0207: Use data.split('-', maxsplit=1)[0] instead (use-maxsplit-arg) searx/preferences.py:506:15: R1733: Unnecessary dictionary index lookup, use 'user_setting' instead (unnecessary-dict-index-lookup) searx/webapp.py:436:0: C0206: Consider iterating with .items() (consider-using-dict-items) searx/webapp.py:950:4: C0206: Consider iterating with .items() (consider-using-dict-items) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2021-07-03 17:54:08 +02:00
Markus Heiser	f122cb0e27	[fix] typo: online_dictionnary --> online_dictionary Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2021-06-04 15:05:58 +02:00
Alexandre Flament	8c1a65d32f	[mod] multithreading only in searx.search.* packages it prepares the new architecture change, everything about multithreading in moved in the searx.search.* packages previously the call to the "init" function of the engines was done in searx.engines: * the network was not set (request not sent using the defined proxy) * it requires to monkey patch the code to avoid HTTP requests during the tests	2021-05-05 13:12:42 +02:00
Markus Heiser	924f9afea3	[lint] pylint searx/search/processors files / BTW add some doc-strings Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>	2021-04-27 15:13:39 +02:00
Alexandre Flament	b1557b5443	[mod] processors: show identical error messages on /search and /stats	2021-04-27 14:20:07 +02:00
Alexandre Flament	c27fef1cde	[mod] metrics: add secondary parameter Some error won't stop the engine: * additional HTTP redirects for example * some invalid results secondary=True allows to flag these errors as not important.	2021-04-21 16:24:46 +02:00
Alexandre Flament	7acd7ffc02	[enh] rewrite and enhance metrics	2021-04-21 16:24:46 +02:00
Alexandre Flament	aae7830d14	[mod] refactoring: processors Report to the user suspended engines. searx.search.processor.abstract: * manages suspend time (per network). * reports suspended time to the ResultContainer (method extend_container_if_suspended) * adds the results to the ResultContainer (method extend_container) * handles exceptions (method handle_exception)	2021-04-21 16:24:46 +02:00
Alexandre Flament	d14994dc73	[httpx] replace searx.poolrequests by searx.network settings.yml: * outgoing.networks: * can contains network definition * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections, keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time) * local_addresses can be "192.168.0.1/24" (it supports IPv6) * support_ipv4 & support_ipv6: both True by default see https://github.com/searx/searx/pull/1034 * each engine can define a "network" section: * either a full network description * either reference an existing network * all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)	2021-04-12 17:25:56 +02:00
Alexandre Flament	eaa694fb7d	[enh] replace requests by httpx	2021-04-10 15:38:33 +02:00
Alexandre Flament	0b45afd4d7	[fix] checker: various bug fixes * initialize engine_data (youtube engine) * don't crash if an engine don't set result['url']	2021-03-25 09:37:37 +01:00
Alexandre Flament	99e0651cea	[mod] by default allow only HTTPS, not HTTP Related to https://github.com/searx/searx/pull/2373	2021-03-08 11:35:08 +01:00
Alexandre Flament	46ca32c3cc	[mod] update currencies.json and fetch_currencies.py use a sparql request on wikidata to get the list of currencies. currencies.json contains the translation for all supported searx languages. Supersede #993	2021-02-23 16:42:28 +01:00
Alexandre Flament	c22d4c764c	[fix] duckduckgo engine: "!ddg !g" do not redirect to google * searx understand "!ddg !g time" as : send "!g time" to DDG * !g a DDG bang for Google: DDG return a HTTP redirect to Google This commit adds a the allows_redirect param not to follow HTTP redirect. The DDG engine returns a empty result as before without HTTP redirect.	2021-02-12 11:10:08 +01:00
Alexandre Flament	aedf03c0f7	Fix: activate raise_for_error by default Fix commit `d703119d3a` : Some engines need to parse the HTTP error but raise_for_error is always set to False in the "request" function.	2021-02-09 11:27:41 +01:00
Alexandre Flament	3b7b852aa8	[fix] checker: minor fix about language detection	2021-01-19 21:29:31 +01:00
Alexandre Flament	d473407ec9	[fix] checker: fix engine statistics Without this commit, the URL /stats/errors shows percentage above 100% after the checker has run.	2021-01-18 08:19:44 +01:00
Alexandre Flament	f3e1bd308f	[mod] checker: minor adjustements on the default tests the query "time" is convinient because most of the search engine will return some results, but some engines in the general category will return documentation about the HTML tags <time> or <input type="time">	2021-01-12 11:47:17 +01:00
Alexandre Flament	8cbc9f2d58	[enh] add checker	2021-01-12 11:47:17 +01:00
Alexandre Flament	5c6a5407a0	[fix] fix of PR #2225	2020-12-17 16:49:48 +01:00
Alexandre Flament	7ec8bc3ea7	[mod] split searx.search into different processors see searx.search.processors.abstract.EngineProcessor First the method searx call the get_params method. If the return value is not None, then the searx call the method search.	2020-12-17 11:39:36 +01:00

43 Commits