ngosang
78be4b4c70
Fix Google search engine.
...
- Fix broken links. Resolves #1794
- Fix missing results. Resolves #1829
2022-11-11 07:34:19 +01:00
Markus Heiser
ba8959ad7c
[fix] typos / reported by @kianmeng in searx PR-3366
...
[PR-3366] https://github.com/searx/searx/pull/3366
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Markus Heiser
eb02cc77c5
[fix] google - simplify XPath selectors to fetch more results
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-10 18:55:31 +02:00
Émilien Devos
b9f16a77db
output format protobuf to HTML for google mobile
2022-08-10 09:36:06 +00:00
Brock Vojković
24210fb10b
Revert PR #1633
...
This reverts the changes made to the Google results XPath in PR #1633 .
2022-08-10 03:41:39 +02:00
Léon Tiekötter
94b3656b4a
[fix] google engine: results XPath
...
Seems google rolls out changes first on the `google.com` domain and later on the
"language" domains. By example: yesterday [1] `google.com` did not work but
`google.de` and `google.fr` did work, today they do not work any longer and this
fix is needed on all domains.
Closes: https://github.com/searxng/searxng/issues/1628
[1] https://github.com/searxng/searxng/issues/1628#issuecomment-1208191816
2022-08-09 06:23:59 +02:00
Markus Heiser
8df1f0c47e
[mod] add 'Accept-Language' HTTP header to online processores
...
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).
- add new engine option: send_accept_language_header
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-01 17:01:59 +02:00
Markus Heiser
c72d70d45c
Revert "Quick fix for google engine for EU countries"
...
This reverts commit 747cf1a246
.
2022-07-26 06:39:44 +02:00
Léon Tiekötter
950f036c03
[fix] google engine: results XPath
2022-07-26 00:24:15 +02:00
Émilien Devos
747cf1a246
Quick fix for google engine for EU countries
...
This revert part of the commit of 5fb2071cb2
2022-07-25 20:48:50 +00:00
Emilien Devos
5fb2071cb2
[fix] google & youtube - set EU consent cookie
...
This change the previous bypass method for Google consent using
``ucbcb=1`` (6face215b8
) to accept the consent using ``CONSENT=YES+``.
The youtube_noapi and google have a similar API, at least for the consent[1].
Get CONSENT cookie from google reguest::
curl -i "https://www.google.com/search?q=time&tbm=isch " \
-A "Mozilla/5.0 (X11; Linux i686; rv:102.0) Gecko/20100101 Firefox/102.0" \
| grep -i consent
...
location: https://consent.google.com/m?continue=https://www.google.com/search?q%3Dtime%26tbm%3Disch&gl=DE&m=0&pc=irp&uxe=eomtm&hl=en-US&src=1
set-cookie: CONSENT=PENDING+936; expires=Wed, 24-Jul-2024 11:26:20 GMT; path=/; domain=.google.com; Secure
...
PENDING & YES [2]:
Google change the way for consent about YouTube cookies agreement in EU
countries. Instead of showing a popup in the website, YouTube redirects the
user to a new webpage at consent.youtube.com domain ... Fix for this is to
put a cookie CONSENT with YES+ value for every YouTube request
[1] https://github.com/iv-org/invidious/pull/2207
[2] https://github.com/TeamNewPipe/NewPipeExtractor/issues/592
Closes: https://github.com/searxng/searxng/issues/1432
2022-07-25 13:27:06 +02:00
Emilien Devos
6face215b8
bypass google consent with ucbcb=1
2022-07-09 21:33:24 +00:00
Émilien Devos
06cb15cbf7
Reflect the real world parameter from settings.yml
2022-05-10 20:44:35 +00:00
Émilien Devos
7d3e8118b0
Update the XPath for fetching the Google results
2022-02-09 14:34:14 +01:00
Markus Heiser
1a0760c10a
[fix] googel engine - "some results are invalids: invalid content"
...
Fix google issues listet in the `/stats?engine=google` and message::
some results are invalids: invalid content
The log is::
DEBUG searx : result: invalid content: {'url': 'https://de.wikipedia.org/wiki/Foo ', 'title': 'Foo - Wikipedia', 'content': None, 'engine': 'google'}
WARNING searx.engines.google : ErrorContext('searx/search/processors/abstract.py', 111, 'result_container.extend(self.engine_name, search_results)', None, 'some results are invalids: invalid content', ()) True
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-18 13:23:35 +01:00
Markus Heiser
f0102a95c9
[fix] google engine: remove adds and fix mobile_ui selector
...
1. Fix issue reported in comment [1]
2. Fix XPath selector for the response of google's mobile UI, reported in
comment [2]
[1] https://github.com/searxng/searxng/pull/777#issuecomment-1015121322
[2] https://github.com/searxng/searxng/pull/777#issuecomment-1015236238
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-18 11:05:45 +01:00
Émilien Devos
6670063e0d
Update XPath for Google engine
2022-01-17 21:49:57 +00:00
Martin Fischer
b02f762687
[enh] add more categories
2022-01-05 11:00:11 +01:00
Markus Heiser
3d96a9839a
[format.python] initial formatting of the python code
...
This patch was generated by black [1]::
make format.python
[1] https://github.com/psf/black
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Markus Heiser
488ace1da9
[fix] google engine - suggestion
...
BTW: google no longer offers *spelling suggestions*
Closes: https://github.com/searxng/searxng/issues/442
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-11-25 19:42:03 +01:00
Markus Heiser
f0059b80ed
[pylint] engines: drop no longer needed 'missing-function-docstring'
...
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-07 13:26:59 +02:00
Markus Heiser
cd033b5416
[fix] drop useless pylint: disable=undefined-variable
...
Since 7b235a1
(see line 591) it is no longer needed to disable
'undefined-variable' for names defined in::
PYLINT_ADDITIONAL_BUILTINS_FOR_ENGINES
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914068609
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-07 10:26:15 +02:00
Markus Heiser
aecfb2300d
[mod] one logger per engine - drop obsolete logger.getChild
...
Remove the no longer needed `logger = logger.getChild(...)` from engines.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-06 18:05:46 +02:00
Noémi Ványi
3d5e6e0abb
[enh] google: add filter=0 to Google engine for more results
...
backport from searx ( 23b3b56a06ef831af0a1b30a12c26ebd50e329bb )
2021-08-21 17:46:16 +02:00
Émilien Devos
6c9f276571
Add missing parameter for mobile UI search
2021-07-15 13:00:32 +00:00
Markus Heiser
0ef6aa5126
[docs] add documentation from the sources of the google engines
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-21 18:25:52 +02:00
Markus Heiser
05e90f2e57
[fix] google answers: normalize space of the answers.
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-21 16:50:25 +02:00
Markus Heiser
f096d68ec6
[mod] google engine: reduce mobile UI parameters to what is needed
...
Reverse engineering shows that not all of the parameters used by google's mobile
UI (aka "more results" button) are needed [1].
[1] https://github.com/searxng/searxng/pull/160#issuecomment-865013625
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-21 16:50:23 +02:00
Alexandre Flament
7a5c36408a
[mod] google: add "use_mobile_ui" parameter to use mobile endpoint.
...
disable by default, it has to be enabled in settings.yml
related to #159
2021-06-21 14:52:04 +02:00
Markus Heiser
1ac3961336
[mod] google - get_lang_info add documentataion & comments
...
BTW: remove obsolete log messages from google engine
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-11 16:06:36 +02:00
Alexandre Flament
1c67b6aece
[enh] google engine: supports "default language"
...
Same behaviour behaviour than Whoogle [1]. Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.
When searching for a locate place, the result are in the expect language,
without missing results [2]:
> When a language is not specified, the language interpretation is left up to
> Google to decide how the search results should be delivered.
The query parameters are copied from Whoogle. With the ``all`` language:
- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.
The new signature of function ``get_lang_info()`` is:
lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)
Argument ``supported_any_language`` is True for google.py and False for the other
google engines. With this patch the function now returns:
- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
- ``lang_info['subdomain']``
- ``lang_info['country']``
- ``lang_info['language']``
[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
2021-06-10 10:22:01 +02:00
Markus Heiser
dc29f1d826
[pylint] tag PYLINT_FILES by comment `# lint: pylint`
...
These py files are linted by `test.pylint`, all other files are linted by
`test.pep8`.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-26 20:18:20 +02:00
Alexandre Flament
48720e20a8
Merge remote-tracking branch 'searx/master'
2021-04-19 09:35:12 +02:00
Robin Schneider
dfc66ff0f0
Fix grammar mistake in debug log output
2021-04-11 22:12:53 +02:00
Alexandre Flament
eaa694fb7d
[enh] replace requests by httpx
2021-04-10 15:38:33 +02:00
Alexandre Flament
ca93a01844
[mod] dynamically set language_support variable
...
The language_support variable is set to True by default,
and set to False in only 5 engines.
Except the documentation and the /config URL, this variable is not used.
This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.
Close #2485
2021-02-01 17:10:37 +01:00
Markus Heiser
7f505bdc6f
[fix] google: avoid unnecessary SearxEngineXPathException errors
...
Avoid SearxEngineXPathException errors when parsing non valid results::
.//div[@class="yuRUbf"]//a/@href index 0 not found
Traceback (most recent call last):
File "./searx/engines/google.py", line 274, in response
url = eval_xpath_getindex(result, href_xpath, 0)
File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
b1fefec40d
[fix] normalize the language & region aspects of all google engines
...
BTW: make the engines ready for search.checker:
- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Markus Heiser
baec54c492
[fix] revise of the google-news engine
...
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9
).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Alexandre Flament
a4dcfa025c
[enh] engines: add about variable
...
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament
64cccae99e
[mod] various engines: use eval_xpath* functions and searx.exceptions.*
...
Engine list: ahmia, duckduckgo_images, elasticsearch, google, google_images, google_videos, youtube_api
2020-12-03 10:22:48 +01:00
Alexandre Flament
2006eb4680
[mod] move extract_text, extract_url to searx.utils
2020-10-02 18:13:56 +02:00
Markus Heiser
8162d7aff4
[fix] google engine - div classes has been renamed in HTML reult
...
Since 1. October 2020 google has changed the 'class' attribute of the HTML
result page.
Fix the xpath expressions and ignore <div class="g" ../> sections which do not
match to title's xpath expression.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-10-01 09:44:29 +02:00
Marc Abonce Seguin
ecf5899153
fetch google's search langs rather than ui langs
2020-09-22 11:37:44 +02:00
Dalf
1022228d95
Drop Python 2 (1/n): remove unicode string and url_utils
2020-09-10 10:39:04 +02:00
Adam Tauber
52eba0c721
[fix] pep8
2020-07-08 00:46:03 +02:00
Markus Heiser
410c2f903d
[fix] revise google engine
...
this commit is picked from #1985
2020-07-07 21:50:59 +02:00
Marc Abonce Seguin
ccaf6ca02c
[fix] update xpaths for new google results page
2019-12-07 16:37:24 -07:00
Adam Tauber
731e34299d
Merge pull request #1744 from dalf/optimizations
...
[mod] speed optimization
2019-12-02 13:39:58 +00:00
Emilien Devos
8f51430f5c
[fix] Force Google old UI with a new user agent
2019-11-22 23:01:41 +01:00