Commit Graph

65 Commits

Author SHA1 Message Date
Markus Heiser e9afc4f8ce [mod] Startpage: reversed engineered & upgrade to data_type: traits_v1
One reason for the often seen CAPTCHA of the Startpage requests are the
incomplete requests SearXNG sends to startpage.com: this patch is a complete new
implementation of the ``request()`` function, reversed engineered from the
Startpage's search form.  The new implementation:

- use traits of data_type: traits_v1 and drop deprecated data_type: supported_languages
- adds time-range support
- adds save-search support
- fix searxng/searxng/issues 1884
- fix searxng/searxng/issues 1081 --> improvements to avoid CAPTCHA

In preparation for more categories (News, Images, Videos ..) from Startpage, the
variable ``startpage_categ`` was set up.  The default value is ``web`` and other
categories from Startpage are not yet implemented.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser 61383edb27 [mod] Startpage: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Startpage engine.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Alexandre Flament 37addec69e search.suspended_time settings: bug fixes
* fix type in settings.yml: replace suspend_times by suspended_times
* always use delay defined in settings.yml:
  * HTTP status 402 and 403: read the value from settings.yml instead of using the hardcoded value of 1 day.
  * startpage engine: CAPTCHA suspend the engine for one day instead of one week
2023-01-28 10:24:14 +00:00
Alexandre FLAMENT 035bc507ec [fix] startpage engine 2022-10-14 18:27:53 +00:00
Markus Heiser ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Alexandre Flament 378b29be2f fix startpage: update XPath in _fetch_supported_languages 2022-03-19 14:16:37 +01:00
Alexandre Flament f9271d595f [fix] startpage: workaround to use the startpage network
workaround for the issue #762
2022-01-15 22:56:34 +01:00
Markus Heiser df238e944c [mod] starpage engine: add comment about Startpage's FFox add-on
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser 21e884f369 [fix] startpage engine: fetch CAPTCHA & issues related to PR-695
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days.
When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7
days.

[1] https://github.com/searxng/searxng/pull/695

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser 2f4e567e90 [fix] Get an actual `sc` argument from startpage's home page.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser 1cbcddb3f7 [pylint] Startpage engine
Fix remarks from pylint

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser f1f5e69c42 [fix] startpage engine - avoid captcha
Startpage has introduced new anti-scraping measures that make SearXNG instances
run into captchas:

1. some arguments has been removed and a new `sc` has been added.
2. search path changed from `do/search` to `sp/search`
3. POST request is no longer needed

Closes: https://github.com/searxng/searxng/issues/692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:12 +01:00
Martin Fischer b02f762687 [enh] add more categories 2022-01-05 11:00:11 +01:00
Markus Heiser 3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Alexandre Flament ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Alexandre Flament a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
lucky13820 fea8958e99
Fix the StartPage result title is showing the url
Fix the issue 2395 where StartPage result title is showing the url. https://github.com/searx/searx/issues/2395
2020-12-16 13:54:14 -08:00
joshu9h 8260435c8b
[Fix] Startpage 2020-12-13 15:43:50 +01:00
Alexandre Flament 3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Alexandre Flament 2006eb4680 [mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00
Marc Abonce Seguin 41800835f9 fetch supported languages for startpage engine 2020-09-22 11:37:44 +02:00
Spühler Stefan 4f90fb6a92 [Fix] Startpage ValueError on Spanish date format
datetime.parser.parse() does not know the Spanish date format which
leads to a ValueError. Fixes #1870

Traceback (most recent call last):
  File "/usr/local/searx/searx/search.py", line 160, in search_one_http_request_safe
    search_results = search_one_http_request(engine, query, request_params)
  File "/usr/local/searx/searx/search.py", line 97, in search_one_http_request
    return engine.response(response)
  File "/usr/local/searx/searx/engines/startpage.py", line 102, in response
    published_date = parser.parse(date_string, dayfirst=True)
  File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 1358, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 649, in parse
    raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', '24 Ene 2013')
2020-03-09 09:31:20 +01:00
Dalf 85b3723345 [mod] speed optimization
compile XPath only once
avoid redundant call to urlparse
get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
2019-11-15 09:33:15 +01:00
Adam Tauber ed1c1bdb04 [fix] pep8 2019-10-14 15:09:39 +02:00
Adam Tauber 77a70fe541 [fix] update startpage engine - closes #1601 2019-10-14 14:18:41 +02:00
Noémi Ványi b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00
Noémi Ványi aeb6dab187
Merge branch 'master' into master 2019-01-04 22:14:40 +01:00
Michael Pfitzner 44ce51f0c5 restore startpage search results 2018-12-14 21:38:48 +01:00
dimqua 0d86ed9c7e update startpage.py 2018-12-11 21:45:47 +03:00
marc 4d1770398a remove 'all' option from search languages 2017-12-06 01:20:15 -06:00
Adam Tauber 52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
marc f62ce21f50 [mod] fetch supported languages for several engines
utils/fetch_languages.py gets languages supported by each engine and
generates engines_languages.json with each engine's supported language.
2016-12-13 19:58:10 -06:00
marc a11948c71b Add language support for more engines. 2016-12-13 19:32:43 -06:00
marc 149802c569 [enh] add supported_languages on engines and auto-generate languages.py 2016-12-13 19:32:00 -06:00
Adam Tauber 16bdc0baf4 [mod] do not escape html content in engines 2016-12-09 18:59:19 +01:00
stepshal b3ab221b98 Fix anomalous backslash in string 2016-07-11 23:53:13 +07:00
Adam Tauber bd22e9a336 [fix] pep8 compatibilty 2016-01-18 12:47:31 +01:00
Thomas Pointhuber 4508c96667 [enh] fix content fetching, parse published date from description 2015-10-24 16:19:47 +02:00
Thomas Pointhuber 996c96ffff [fix] block ixquick search url's 2015-08-24 11:31:30 +02:00
Thomas Pointhuber 23b9095cbf [fix] improve result handling of startpage engine 2015-08-24 11:28:55 +02:00
Cqoicebordel f1c10f4fe4 Startpage's unit test 2015-02-06 17:31:10 +01:00
Cqoicebordel b4b666e703 Flake8 2015-01-15 20:27:30 +01:00
Cqoicebordel fa0330f0ff Fix startpage
Fix issue with unicode caracters in startpage : we shouldn't urlencode them if we are using POST.
Should fix #169. @dimqua can you confirm ?
2015-01-15 20:18:40 +01:00
Adam Tauber c8be128e97 [mod] ignore startpage unicode errors 2015-01-09 11:21:46 +01:00
Adam Tauber b1234ee889 [fix] startpage engine compatibility 2014-11-17 10:19:23 +01:00
Thomas Pointhuber 678a80f043 fix startpage engine and add comments
* add language support
* remove not required code
* improve google-ad detection (no false detection anymore, I hope)
* other improvements
2014-09-02 19:57:01 +02:00
Adam Tauber 111a86d355 [fix] html escape 2014-08-06 14:43:44 +02:00
asciimoo 7db4558de7 [mod][fix] startpage engine updates 2014-02-18 16:14:31 +01:00
asciimoo c1d7d30b8e [mod] len() removed from conditions 2014-02-11 13:13:51 +01:00
asciimoo 68a0832524 [enh] search language support upadtes 2014-01-31 05:10:49 +01:00