Commit Graph

66 Commits

Author SHA1 Message Date
Markus Heiser e8706fb738 [fix] engine & network issues / documentation and type annotations
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-25 13:58:26 +02:00
Markus Heiser e9afc4f8ce [mod] Startpage: reversed engineered & upgrade to data_type: traits_v1
One reason for the often seen CAPTCHA of the Startpage requests are the
incomplete requests SearXNG sends to startpage.com: this patch is a complete new
implementation of the ``request()`` function, reversed engineered from the
Startpage's search form.  The new implementation:

- use traits of data_type: traits_v1 and drop deprecated data_type: supported_languages
- adds time-range support
- adds save-search support
- fix searxng/searxng/issues 1884
- fix searxng/searxng/issues 1081 --> improvements to avoid CAPTCHA

In preparation for more categories (News, Images, Videos ..) from Startpage, the
variable ``startpage_categ`` was set up.  The default value is ``web`` and other
categories from Startpage are not yet implemented.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser 61383edb27 [mod] Startpage: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Startpage engine.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Alexandre Flament 37addec69e search.suspended_time settings: bug fixes
* fix type in settings.yml: replace suspend_times by suspended_times
* always use delay defined in settings.yml:
  * HTTP status 402 and 403: read the value from settings.yml instead of using the hardcoded value of 1 day.
  * startpage engine: CAPTCHA suspend the engine for one day instead of one week
2023-01-28 10:24:14 +00:00
Alexandre FLAMENT 035bc507ec [fix] startpage engine 2022-10-14 18:27:53 +00:00
Markus Heiser ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Alexandre Flament 378b29be2f fix startpage: update XPath in _fetch_supported_languages 2022-03-19 14:16:37 +01:00
Alexandre Flament f9271d595f [fix] startpage: workaround to use the startpage network
workaround for the issue #762
2022-01-15 22:56:34 +01:00
Markus Heiser df238e944c [mod] starpage engine: add comment about Startpage's FFox add-on
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser 21e884f369 [fix] startpage engine: fetch CAPTCHA & issues related to PR-695
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days.
When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7
days.

[1] https://github.com/searxng/searxng/pull/695

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser 2f4e567e90 [fix] Get an actual `sc` argument from startpage's home page.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser 1cbcddb3f7 [pylint] Startpage engine
Fix remarks from pylint

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser f1f5e69c42 [fix] startpage engine - avoid captcha
Startpage has introduced new anti-scraping measures that make SearXNG instances
run into captchas:

1. some arguments has been removed and a new `sc` has been added.
2. search path changed from `do/search` to `sp/search`
3. POST request is no longer needed

Closes: https://github.com/searxng/searxng/issues/692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:12 +01:00
Martin Fischer b02f762687 [enh] add more categories 2022-01-05 11:00:11 +01:00
Markus Heiser 3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Alexandre Flament ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Alexandre Flament a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
lucky13820 fea8958e99
Fix the StartPage result title is showing the url
Fix the issue 2395 where StartPage result title is showing the url. https://github.com/searx/searx/issues/2395
2020-12-16 13:54:14 -08:00
joshu9h 8260435c8b
[Fix] Startpage 2020-12-13 15:43:50 +01:00
Alexandre Flament 3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Alexandre Flament 2006eb4680 [mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00
Marc Abonce Seguin 41800835f9 fetch supported languages for startpage engine 2020-09-22 11:37:44 +02:00
Spühler Stefan 4f90fb6a92 [Fix] Startpage ValueError on Spanish date format
datetime.parser.parse() does not know the Spanish date format which
leads to a ValueError. Fixes #1870

Traceback (most recent call last):
  File "/usr/local/searx/searx/search.py", line 160, in search_one_http_request_safe
    search_results = search_one_http_request(engine, query, request_params)
  File "/usr/local/searx/searx/search.py", line 97, in search_one_http_request
    return engine.response(response)
  File "/usr/local/searx/searx/engines/startpage.py", line 102, in response
    published_date = parser.parse(date_string, dayfirst=True)
  File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 1358, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 649, in parse
    raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', '24 Ene 2013')
2020-03-09 09:31:20 +01:00
Dalf 85b3723345 [mod] speed optimization
compile XPath only once
avoid redundant call to urlparse
get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
2019-11-15 09:33:15 +01:00
Adam Tauber ed1c1bdb04 [fix] pep8 2019-10-14 15:09:39 +02:00
Adam Tauber 77a70fe541 [fix] update startpage engine - closes #1601 2019-10-14 14:18:41 +02:00
Noémi Ványi b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00
Noémi Ványi aeb6dab187
Merge branch 'master' into master 2019-01-04 22:14:40 +01:00
Michael Pfitzner 44ce51f0c5 restore startpage search results 2018-12-14 21:38:48 +01:00
dimqua 0d86ed9c7e update startpage.py 2018-12-11 21:45:47 +03:00
marc 4d1770398a remove 'all' option from search languages 2017-12-06 01:20:15 -06:00
Adam Tauber 52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
marc f62ce21f50 [mod] fetch supported languages for several engines
utils/fetch_languages.py gets languages supported by each engine and
generates engines_languages.json with each engine's supported language.
2016-12-13 19:58:10 -06:00
marc a11948c71b Add language support for more engines. 2016-12-13 19:32:43 -06:00
marc 149802c569 [enh] add supported_languages on engines and auto-generate languages.py 2016-12-13 19:32:00 -06:00
Adam Tauber 16bdc0baf4 [mod] do not escape html content in engines 2016-12-09 18:59:19 +01:00
stepshal b3ab221b98 Fix anomalous backslash in string 2016-07-11 23:53:13 +07:00
Adam Tauber bd22e9a336 [fix] pep8 compatibilty 2016-01-18 12:47:31 +01:00
Thomas Pointhuber 4508c96667 [enh] fix content fetching, parse published date from description 2015-10-24 16:19:47 +02:00
Thomas Pointhuber 996c96ffff [fix] block ixquick search url's 2015-08-24 11:31:30 +02:00
Thomas Pointhuber 23b9095cbf [fix] improve result handling of startpage engine 2015-08-24 11:28:55 +02:00
Cqoicebordel f1c10f4fe4 Startpage's unit test 2015-02-06 17:31:10 +01:00
Cqoicebordel b4b666e703 Flake8 2015-01-15 20:27:30 +01:00
Cqoicebordel fa0330f0ff Fix startpage
Fix issue with unicode caracters in startpage : we shouldn't urlencode them if we are using POST.
Should fix #169. @dimqua can you confirm ?
2015-01-15 20:18:40 +01:00
Adam Tauber c8be128e97 [mod] ignore startpage unicode errors 2015-01-09 11:21:46 +01:00
Adam Tauber b1234ee889 [fix] startpage engine compatibility 2014-11-17 10:19:23 +01:00
Thomas Pointhuber 678a80f043 fix startpage engine and add comments
* add language support
* remove not required code
* improve google-ad detection (no false detection anymore, I hope)
* other improvements
2014-09-02 19:57:01 +02:00
Adam Tauber 111a86d355 [fix] html escape 2014-08-06 14:43:44 +02:00
asciimoo 7db4558de7 [mod][fix] startpage engine updates 2014-02-18 16:14:31 +01:00
asciimoo c1d7d30b8e [mod] len() removed from conditions 2014-02-11 13:13:51 +01:00