Commit Graph

4772 Commits

Author SHA1 Message Date
Markus Heiser b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Alexandre Flament 0f18e885bf
Merge pull request #2479 from Tobi823/master
Document workaround for using 2 languages simultaneously #1508
2021-01-27 21:29:42 +01:00
Alexandre Flament b661c3f5d4
Merge pull request #2509 from return42/fix-morty-key
[doc] improve admin-docs about result proxy (morty) configuration
2021-01-27 15:31:29 +01:00
Markus Heiser a69a8a3ed5 [doc] improve admin-docs about result proxy (morty) configuration
[1] https://github.com/searx/searx/pull/1872#issuecomment-768107138

Suggested-by @dalf [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-27 09:58:06 +01:00
Markus Heiser 923b490022 [mod] add Makfile targets for search.checker.<engine_name>
To check all engines:

    make search.checker

To check a engine 'google news' replace space by underline:

    make search.checker.google_news

To see HTTP requests and more use SEARX_DEBUG:

    make SEARX_DEBUG=1 search.checker.google_news

To filter out HTTP redirects:

    make SEARX_DEBUG=1 search.checker.google_news | grep -A1 "HTTP/1.1\" 3[0-9][0-9]"
    ...
    Engine google news                   Checking
    https://news.google.com:443 "GET /search?q=life&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=life&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --
    https://news.google.com:443 "GET /search?q=computer&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-26 11:46:36 +01:00
Alexandre Flament 6047087aac [mod] utils/fetch_languages.py: write files at the right location 2021-01-24 14:25:27 +01:00
Alexandre Flament 3330cf4a46 [enh] every monday, call utils/fetch_*.py scripts and create a PR automatically 2021-01-24 13:32:39 +01:00
Markus Heiser ff6804e545 [data] make engines.languages
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:52:32 +01:00
Markus Heiser 8cdad5d85d [fix] google-videos: parse values for 'length' & 'author'
The 'video.html' template from the 'oscar' design supports replacement
for *author* and *length*.  Google-videos does not have an author, alternatively
the publisher info from is used for the *author*.

Hint: these replacements are not supported by the 'simple' design.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:51:24 +01:00
Markus Heiser 89b3050b5c [fix] revise of the google-Video engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:39:30 +01:00
Alexandre Flament f4a17acb7a
Merge pull request #2498 from dalf/minor-fix-google-news
[fix] google_news: avoid one HTTP redirect except for the English results
2021-01-24 09:13:48 +01:00
Alexandre Flament 96c2996857
Merge pull request #2497 from return42/fix-test.sh
[fix] lxc.sh - SC2034: ubu2010_boilerplate appears unused.
2021-01-24 09:06:11 +01:00
Alexandre Flament 8c46b767d0 [fix] google_news: avoid one HTTP redirect except for the English results
also add
params['soft_max_redirects'] = 1
to avoid false error reporting in /stats/errors
2021-01-24 08:53:35 +01:00
Markus Heiser ea5c992d4f [fix] lxc.sh - SC2034: ubu2010_boilerplate appears unused.
$ make test.sh
  In utils/lxc.sh line 42:
  ubu2010_boilerplate="$ubu1904_boilerplate"
  ^-----------------^ SC2034: ubu2010_boilerplate appears unused. Verify use (or export if used externally).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 08:29:13 +01:00
Alexandre Flament 7d24850d49
Merge pull request #2483 from return42/fix-google-news
[fix] revise of the google-News engine
2021-01-23 20:21:09 +01:00
Markus Heiser 5f92dfcdbe [fix] google-news: query uses locale without country tag
Wthout country-region tag google will redirect to correct the contry tag [1]:

    SEARX_DEBUG=1 searx-checker -v "google news"
    ...
    https://news.google.com:443 "GET /search?q=computer&hl=en...      HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&.... HTTP/1.1" 200 None
    ...

[1] https://github.com/searx/searx/pull/2483#issuecomment-765600849

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-23 11:37:14 +01:00
Markus Heiser baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Markus Heiser a8544798ec [fix] remove Fabric file
The fabfile.py has not been updated since 5 years.  I also asked [1] if someone
still use Fabric wtihout any response.  Lets drop outdated Fabric file.

[1] https://github.com/searx/searx/discussions/2400

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 17:57:55 +01:00
Adam Tauber f310305c54
Merge pull request #2481 from dalf/mod-check
Mod check
2021-01-20 18:48:29 +00:00
Alexandre Flament 73c86f9bf2 [mod] checker: disable by default 2021-01-19 21:44:48 +01:00
Alexandre Flament 3b7b852aa8 [fix] checker: minor fix about language detection 2021-01-19 21:29:31 +01:00
Alexandre Flament aa887eb375 [mod] checker : replace pycld3 by langdetect
pycld3 requires the native library cld3
langdetect is a pure python package
2021-01-19 21:26:04 +01:00
Tobi823 16a0a01553 Document workaround for using 2 languages simultaneously #1508 2021-01-18 17:23:09 +01:00
Alexandre Flament 0495e15df4
Merge pull request #2476 from dalf/fix-error-recording-and-checker
Fix error recording and checker
2021-01-18 08:29:25 +01:00
Alexandre Flament 67a1aab0d5 [fix] /stats/checker : remove the timestamp field when the checker is disabled 2021-01-18 08:19:53 +01:00
Alexandre Flament d473407ec9 [fix] checker: fix engine statistics
Without this commit, the URL /stats/errors shows percentage above 100% after the checker has run.
2021-01-18 08:19:44 +01:00
Alexandre Flament ca76f3119a [fix] error_recorder: record code and lineno about the engine
since the PR #2225 , code and lineno were sometimes meaningless
see /stats/errors
2021-01-17 16:25:11 +01:00
Alexandre Flament 80d7411f2c
Merge pull request #2452 from kvch/add-wilby-engine
Add wiby.me engine
2021-01-16 22:36:31 +01:00
Alexandre Flament b405646749
Merge pull request #2451 from mrwormo/invidious-engine
[Fix] Invidious Engine
2021-01-16 19:25:45 +01:00
Alexandre Flament 709dd960f1
Merge pull request #2473 from return42/fix-setup.py
[fix] setup.py requires pyyaml installed
2021-01-16 19:05:36 +01:00
Alexandre Flament 1d13ad8452
Merge pull request #2460 from dalf/engine-about
[enh] engines: add about variable
2021-01-16 19:05:17 +01:00
Markus Heiser c4a98862bf [fix] setup.py requires pyyaml installed
pip install -e .
...
Obtaining file:///usr/local/searx/searx-src
    ERROR: Command errored out with exit status 1:
     command: /usr/local/searx/searx-pyenv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/usr/local/searx/searx-src/setup.py'"'"'; __file__='"'"'/usr/local/searx/searx-src/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'rn'"'"', '"'"'n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-vzer91m2
         cwd: /usr/local/searx/searx-src/
    Complete output (9 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/local/searx/searx-src/setup.py", line 10, in <module>
        from searx.version import VERSION_STRING
      File "/usr/local/searx/searx-src/searx/__init__.py", line 19, in <module>
        import searx.settings_loader
      File "/usr/local/searx/searx-src/searx/settings_loader.py", line 8, in <module>
        import yaml
    ModuleNotFoundError: No module named 'yaml'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-16 08:58:13 +01:00
Alexandre Flament a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament 5a511f0d62 [fix] CI: fix docker push 2021-01-14 20:35:10 +01:00
Alexandre Flament 824fe40a28
Merge pull request #2467 from dalf/fix-ci
[fix] github actions: use ubuntu-20.04 instead of ubuntu-latest
2021-01-14 17:14:59 +01:00
Alexandre Flament 38090daa29 [fix] github actions: use ubuntu-20.04 instead of ubuntu-latest 2021-01-14 16:49:17 +01:00
mrwormo 2dff3887f0 [fix] Invidious engine by enabling requests by randomly picking amongst working instances 2021-01-14 12:12:56 +01:00
Alexandre Flament 484dc99580
Merge pull request #2419 from dalf/checker
[enh] add checker
2021-01-13 15:46:48 +01:00
Alexandre Flament 912c7e975c [fix] checker: don't run the checker when uwsgi is not properly configured
Before this commit, even with the scheduler disabled, the checker was running
at least once for each uwsgi worker.
2021-01-13 14:07:39 +01:00
Alexandre Flament 7f0c508598 [fix] checker: fix typo unknown instead of unknow 2021-01-12 11:47:17 +01:00
Alexandre Flament a0c8b413a6 [mod] searx.shared: minor tweaks
searx.shared.shared_abstract.SharedDict inherit from abc.ABC
searx.shared.shared_uwsgi.schedule can schedule multiple functions without issue
2021-01-12 11:47:17 +01:00
Alexandre Flament 87bafbc32b [mod] checker: add status and timestamp to the result
for each engine: replace status by success
2021-01-12 11:47:17 +01:00
Alexandre Flament f3e1bd308f [mod] checker: minor adjustements on the default tests
the query "time" is convinient because most of the search engine will return some results,
but some engines in the general category will return documentation about the HTML tags <time> or <input type="time">
2021-01-12 11:47:17 +01:00
Alexandre Flament 45bfab77d0 |mod] checker: improve searx-checker command line
* output is unbuffered
* verbose mode describe more precisly the errrors
2021-01-12 11:47:17 +01:00
Alexandre Flament 3a9f513521 [enh] checker: background check
See settings.yml for the options
SIGUSR1 signal starts the checker.
The result is available at /stats/checker
2021-01-12 11:47:17 +01:00
Alexandre Flament 6e2872f436 [enh] add searx.shared
shared dictionary between the workers (UWSGI or werkzeug)
scheduler: run a task once every x seconds (UWSGI or werkzeug)
2021-01-12 11:47:17 +01:00
Markus Heiser 9c581466e1 [fix] do not colorize output on dumb terminals
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-12 11:47:17 +01:00
Alexandre Flament ca0889d488 [enh] checker: wikidata & ddd: add specific tests 2021-01-12 11:47:17 +01:00
Alexandre Flament 16a889dd8f [enh] checker: add rosebud test 2021-01-12 11:47:17 +01:00
Alexandre Flament 8cbc9f2d58 [enh] add checker 2021-01-12 11:47:17 +01:00