Commit Graph

120 Commits

Author SHA1 Message Date
Markus Heiser 1ac3961336 [mod] google - get_lang_info add documentataion & comments
BTW: remove obsolete log messages from google engine

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-11 16:06:36 +02:00
Alexandre Flament 1c67b6aece [enh] google engine: supports "default language"
Same behaviour behaviour than Whoogle [1].  Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.

When searching for a locate place, the result are in the expect language,
without missing results [2]:

  > When a language is not specified, the language interpretation is left up to
  > Google to decide how the search results should be delivered.

The query parameters are copied from Whoogle.  With the ``all`` language:

- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.

The new signature of function ``get_lang_info()`` is:

    lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)

Argument ``supported_any_language`` is True for google.py and False for the other
google engines.  With this patch the function now returns:

- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
  - ``lang_info['subdomain']``
  - ``lang_info['country']``
  - ``lang_info['language']``

[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
2021-06-10 10:22:01 +02:00
Markus Heiser dc29f1d826 [pylint] tag PYLINT_FILES by comment `# lint: pylint`
These py files are linted by `test.pylint`, all other files are linted by
`test.pep8`.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-26 20:18:20 +02:00
Alexandre Flament 48720e20a8 Merge remote-tracking branch 'searx/master' 2021-04-19 09:35:12 +02:00
Robin Schneider dfc66ff0f0
Fix grammar mistake in debug log output 2021-04-11 22:12:53 +02:00
Alexandre Flament eaa694fb7d [enh] replace requests by httpx 2021-04-10 15:38:33 +02:00
Alexandre Flament ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Markus Heiser 7f505bdc6f [fix] google: avoid unnecessary SearxEngineXPathException errors
Avoid SearxEngineXPathException errors when parsing non valid results::

    .//div[@class="yuRUbf"]//a/@href index 0 not found
    Traceback (most recent call last):
      File "./searx/engines/google.py", line 274, in response
        url = eval_xpath_getindex(result, href_xpath, 0)
      File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
        raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
    searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Markus Heiser baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Alexandre Flament a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament 64cccae99e [mod] various engines: use eval_xpath* functions and searx.exceptions.*
Engine list: ahmia, duckduckgo_images, elasticsearch, google, google_images, google_videos, youtube_api
2020-12-03 10:22:48 +01:00
Alexandre Flament 2006eb4680 [mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00
Markus Heiser 8162d7aff4 [fix] google engine - div classes has been renamed in HTML reult
Since 1. October 2020 google has changed the 'class' attribute of the HTML
result page.

Fix the xpath expressions and ignore <div class="g" ../> sections which do not
match to title's xpath expression.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-10-01 09:44:29 +02:00
Marc Abonce Seguin ecf5899153 fetch google's search langs rather than ui langs 2020-09-22 11:37:44 +02:00
Dalf 1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
Adam Tauber 52eba0c721 [fix] pep8 2020-07-08 00:46:03 +02:00
Markus Heiser 410c2f903d [fix] revise google engine
this commit is picked from #1985
2020-07-07 21:50:59 +02:00
Marc Abonce Seguin ccaf6ca02c [fix] update xpaths for new google results page 2019-12-07 16:37:24 -07:00
Adam Tauber 731e34299d
Merge pull request #1744 from dalf/optimizations
[mod] speed optimization
2019-12-02 13:39:58 +00:00
Emilien Devos 8f51430f5c [fix] Force Google old UI with a new user agent 2019-11-22 23:01:41 +01:00
Dalf 85b3723345 [mod] speed optimization
compile XPath only once
avoid redundant call to urlparse
get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
2019-11-15 09:33:15 +01:00
Emilien Devos cbd1ebdce8 [fix] Force Google old UI (#1597) 2019-05-29 10:05:57 +09:00
Noémi Ványi b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00
Marc Abonce Seguin 0169b63e84 [fix] fetch google's supported languages 2019-01-06 21:31:45 -06:00
Marc Abonce Seguin 5568f24d6c [fix] check language aliases when setting search language 2019-01-06 20:31:57 -06:00
Marc Abonce Seguin f7f9c50393 [fix] force English results in Google when using en-US 2018-04-18 23:29:48 -05:00
Marc Abonce Seguin 772c048d01 refactor engine's search language handling
Add match_language function in utils to match any user given
language code with a list of engine's supported languages.

Also add language_aliases dict on each engine to translate
standard language codes into the custom codes used by the engine.
2018-03-27 00:08:03 -06:00
Marc Abonce Seguin d1eae9359f fix fetch_langauges to be more accurate
Add languages supported by either all default general engines or 10 engines.
2018-03-20 17:58:20 -06:00
Noémi Ványi 2d5eed9b59 send constant cookie with query to Google 2017-12-18 21:38:52 +01:00
marc 4d1770398a remove 'all' option from search languages 2017-12-06 01:20:15 -06:00
Adam Tauber 1613c6319e [fix] handle /sorry redirects 2017-12-05 20:38:34 +01:00
Adam Tauber 6eb9503896 [fix] use english in google engine if no language was set - this prevents guessing the language by the IP of the instance 2017-11-22 22:56:47 +01:00
Adam Tauber 6fdb6640d9 [fix] revert language changes to prevent CAPTCHAs 2017-11-22 22:50:48 +01:00
Adam Tauber 9ab8536479 [fix] fix language support of google 2017-11-21 16:28:53 +01:00
Adam Tauber 52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
Adam Tauber 52d1087202 [enh] add result number parsing to google engine 2017-01-27 00:18:46 +01:00
David A Roberts 1d30141c20 [enh] show spelling corrections 2017-01-16 13:31:16 +10:00
Adam Tauber 0d4da30c7f [enh] add instant answers to google engine 2017-01-05 17:20:12 +01:00
marc af35eee10b tests for _fetch_supported_languages in engines
and refactor method to make it testable without making requests
2016-12-15 00:40:21 -06:00
marc f62ce21f50 [mod] fetch supported languages for several engines
utils/fetch_languages.py gets languages supported by each engine and
generates engines_languages.json with each engine's supported language.
2016-12-13 19:58:10 -06:00
marc c677aee58a filter langauges 2016-12-13 19:32:00 -06:00
marc 149802c569 [enh] add supported_languages on engines and auto-generate languages.py 2016-12-13 19:32:00 -06:00
Noémi Ványi c59c76e6ee add year to time range to engines which support "Last year"
Engines:
 * Bing images
 * Flickr (noapi)
 * Google
 * Google Images
 * Google News
2016-12-11 16:58:31 +01:00
Adam Tauber 16bdc0baf4 [mod] do not escape html content in engines 2016-12-09 18:59:19 +01:00
Adam Tauber 350a84520d [fix] time range detection 2016-07-26 00:28:48 +02:00
Noemi Vanyi 2e5839503f add time range search for google 2016-07-25 23:28:14 +02:00
stepshal b3ab221b98 Fix anomalous backslash in string 2016-07-11 23:53:13 +07:00
Adam Tauber 85c0351dca Merge pull request #526 from ukwt/anime
Add a few search engines
2016-04-14 10:59:31 +02:00
Kirill Isakov 90c51cb449 Fix a few typos in Google search engine 2016-04-13 23:04:53 +06:00
Adam Tauber 6d55642ab4 [fix] no more redirect ++ explicitly specify search language to avoid googles ip based heuristics 2016-03-25 18:38:02 +01:00
Adam Tauber 09b7673fbd [fix] temporary disable googles inner links - #491 2016-01-18 13:10:21 +01:00
Adam Tauber 66f48c2bf5 [fix] google markup change - closes #489 2016-01-10 18:49:50 +01:00
Adam Tauber 5cea4f9445 [fix] prevent google engine to redirect
nid/pref cookies are also removed
2015-12-22 20:05:42 +01:00
Adam Tauber d8f8bdc951 [fix] quickfix for sometimes missing PREF cookie 2015-12-15 09:48:38 +01:00
Adam Tauber 5d49c15f79 [fix] google engine - ignore new useless result type 2015-10-29 12:47:12 +01:00
Adam Tauber 0ad272c5cb [fix] content escaping - closes #441
TODO check other engines too
2015-09-30 16:42:03 +02:00
Dalf fc0ae0f907 google engine: code cleanup 2015-06-06 00:18:00 +02:00
Dalf 72c8de35a2 google engine :remove OSM map 2015-06-05 23:56:23 +02:00
Alexandre Flament b8fc531b60 [enh] google engine : parse map links and more 2015-06-05 11:23:24 +02:00
Alexandre Flament 39ff21237c [enh] google engine : avoid some "sorry google" by adding another cookie : NID. This cookie is specific by hostname.
This allow to send request to google.* (according to the search language).
Before this commit, request in other languages than english was sent to www.google.com which was redirected to www.google.*
The PREF is still use on the www.google.com domain.
2015-05-30 17:41:40 +02:00
Alexandre Flament 8a69ade875 Revert of #195 when the search language is not english
Sometimes there is two requests to google (depending of the source IP) : one to google.com, the second to google.fr (for instance).

Going to https://www.google.com/ncr and saving the PREF cookie for future use prevent this (there is no redirection).

But, recently (or not ?), by doing this the search returns English results even if the Accept-Language is specified.

There is still a way to prevent this : going to preference, set the search language. I don't know if this can be done by searx.

For now, a quick fix is to disable the use of the PREF cookie when the search language is not English (google engine will slower but returns excepted results).
2015-05-01 21:20:09 +02:00
dalf 0a83be0ec9 [fix] google engine: depending on the IP of the searx instance, each searx request where making two HTTP requests (see https://support.google.com/websearch/answer/873?hl=en ) 2015-01-22 11:40:28 +01:00
Adam Tauber 0f4cb32bf1 [mod] image results removed from google engine 2014-12-09 00:53:09 +01:00
Adam Tauber 611f4e2a86 [fix] pep8 2014-12-05 20:03:16 +01:00
Dalf 5dc3eb3399 [fix] rewrite the google engine since Google Web Search API is about to expire 2014-09-14 14:40:55 +02:00
Thomas Pointhuber 144f89bf78 add comments to google-engines 2014-09-01 15:10:05 +02:00
asciimoo 2a788c8f29 [enh] search language support init 2014-01-31 04:35:23 +01:00
asciimoo ca271fd861 [enh] bing, google paging support 2014-01-29 21:14:38 +01:00
asciimoo 3207a396bd [enh] google engine added 2014-01-29 19:28:38 +01:00