Commit Graph

45 Commits

Author SHA1 Message Date
Alexandre Flament cdd9dab570
Merge f49d1a9b90 into 0f9694c90b 2024-11-24 02:01:36 +00:00
Leo Liu b173f3a8b9 Fix scheduler.lua 2024-11-10 15:53:58 +01:00
Markus Heiser 542f7d0d7b [mod] pylint all files with one profile / drop PYLINT_SEARXNG_DISABLE_OPTION
In the past, some files were tested with the standard profile, others with a
profile in which most of the messages were switched off ... some files were not
checked at all.

- ``PYLINT_SEARXNG_DISABLE_OPTION`` has been abolished
- the distinction ``# lint: pylint`` is no longer necessary
- the pylint tasks have been reduced from three to two

  1. ./searx/engines -> lint engines with additional builtins
  2. ./searx ./searxng_extra ./tests -> lint all other python files

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-11 14:55:38 +01:00
Alexandre FLAMENT f49d1a9b90 Checker: change image check
In the master branch, the checker starts to stream the response
and cut the connection. However it creates a lot of read error,
which are false negative. I don't know how to fix the issue.

This commit change the checker to download the whole image.
The error reporting is also changed to report only one line,
instead of the whole stacktrace.

Also, if a timeout occurs, the checker waits for one second
before retry.

Do note that I have not test the checker running in background.
This feature seems forgotten and lack of interrest despite the
initial move few years ago.
2024-03-09 09:57:19 +00:00
Markus Heiser 50d5a9ff60 [fix] issues reported by pylint 3.1.0
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-09 09:28:13 +01:00
Alexandre Flament 1a66bfa66c checker: display results at the end 2024-03-03 11:18:43 +01:00
Alexandre Flament 38fdd2288a
Drop typing-extensions dependency (#3265)
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-02 13:10:31 +01:00
Markus Heiser 87f18b98ec [fix] SyntaxWarning: invalid escape sequence '\>'
This patch fixes issue reported by ``make test.unit``::

   searx/search/checker/impl.py:39: SyntaxWarning: invalid escape sequence '\>'
      rep = ['<' + tag + '[^\>]*>' for tag in HTML_TAGS]

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-01-15 18:27:21 +01:00
jazzzooo 223b3487c3 [fix] spelling 2023-09-18 16:20:27 +02:00
ArtikusHG 735e388cec
Merge branch 'master' into fasttext 2022-12-16 19:43:10 +00:00
ArtikusHG 1f8f8c1e91 Replace langdetect with fasttext 2022-12-16 21:07:39 +02:00
Alexandre Flament b971167ced move searx.shared.redisdb to searx.redisdb 2022-12-10 09:26:38 +01:00
Alexandre FLAMENT e92755d358 Initialize Redis in searx/webapp.py
settings.yml:
* The default URL was unix:///usr/local/searxng-redis/run/redis.sock?db=0
* The default URL is now "false"

The default URL makes the log difficult to deal with:
if the admin didn't install a Redis instance, the logs record a false error.

It worked before because SearXNG initialized the Redis connection when the limiter started.

In this commit, SearXNG initializes Redis in searx/webapp.py
so various components can use Redis without taking care of the initialization step.
2022-11-05 17:45:52 +01:00
Alexandre Flament fe419e355b The checker requires Redis
Remove the abstraction in searx.shared.SharedDict.
Implement a basic and dedicated scheduler for the checker using a Redis script.
2022-11-05 12:04:50 +01:00
Markus Heiser ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Alexandre Flament 2babf59adc [fix] pyright repported errors
The errors make pyright usage useless since a new error won't be seen [1].

[1] https://github.com/searxng/searxng/pull/1569

```
  searx/compat.py:11:27 - error: Expression of type "Type[cached_property[_T@cached_property]]" cannot be assigned to declared type "Type[cached_property]"
    "Type[cached_property[_T@cached_property]]" is incompatible with "Type[cached_property]"
    Type "Type[cached_property[_T@cached_property]]" cannot be assigned to type "Type[cached_property]" (reportGeneralTypeIssues)
  searx/utils.py:69:36 - error: Expression of type "None" cannot be assigned to parameter of type "str"
    Type "None" cannot be assigned to type "str" (reportGeneralTypeIssues)
  searx/utils.py:573:85 - error: Expression of type "None" cannot be assigned to parameter of type "int"
    Type "None" cannot be assigned to type "int" (reportGeneralTypeIssues)
  searx/webapp.py:1306:22 - error: Argument of type "str" cannot be assigned to parameter "__a" of type "BytesPath" in function "join"
    Type "str" cannot be assigned to type "BytesPath"
      "str" is incompatible with "bytes"
      "str" is incompatible with protocol "PathLike[bytes]"
        "__fspath__" is not present (reportGeneralTypeIssues)
  searx/webapp.py:1306:68 - error: Argument of type "Literal['themes']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
    Type "Literal['themes']" cannot be assigned to type "BytesPath"
      "Literal['themes']" is incompatible with "bytes"
      "Literal['themes']" is incompatible with protocol "PathLike[bytes]"
        "__fspath__" is not present (reportGeneralTypeIssues)
  searx/webapp.py:1306:78 - error: Argument of type "str | Any | None" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
    Type "str | Any | None" cannot be assigned to type "BytesPath"
      Type "str" cannot be assigned to type "BytesPath"
        "str" is incompatible with "bytes"
        "str" is incompatible with protocol "PathLike[bytes]"
          "__fspath__" is not present (reportGeneralTypeIssues)
  searx/webapp.py:1306:85 - error: Argument of type "Literal['img']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
    Type "Literal['img']" cannot be assigned to type "BytesPath"
      "Literal['img']" is incompatible with "bytes"
      "Literal['img']" is incompatible with protocol "PathLike[bytes]"
        "__fspath__" is not present (reportGeneralTypeIssues)
  searx/engines/mongodb.py:8:6 - warning: Import "pymongo" could not be resolved (reportMissingImports)
  searx/engines/mysql_server.py:9:8 - warning: Import "mysql.connector" could not be resolved (reportMissingImports)
  searx/engines/postgresql.py:9:8 - warning: Import "psycopg2" could not be resolved from source (reportMissingModuleSource)
  searx/engines/xpath.py:187:28 - warning: "categories" is not defined (reportUndefinedVariable)
  searx/search/__init__.py:184:82 - warning: "flask" is not defined (reportUndefinedVariable)
  searx/search/checker/background.py:19:26 - error: Type of "schedule" is partially unknown
    Type of "schedule" is "(delay: Any, func: Any, *args: Any) -> Literal[True]" (reportUnknownVariableType)
  searx/shared/__init__.py:8:12 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
  searx/shared/shared_uwsgi.py:5:8 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
```
2022-07-30 18:04:44 +02:00
Martin Fischer 640c404844 [pyright:strict] searx.search.checker.background 2022-01-27 22:07:12 +01:00
Alexandre Flament 5439dd5fb1 [fix] checker: fix image fetch
Since https://github.com/searxng/searxng/pull/354
the searx.network.stream(...) returns a tuple

This commits update the checker code according to
this function signature change.
2022-01-22 16:11:42 +01:00
Markus Heiser 3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Markus Heiser fcdc2c2cd2 [format.python] disable py code formatting for some hunks of code
Disable the python code formatting from python-black, where the readability of
code suffers by formatting.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:16:03 +01:00
Alexandre Flament 29893cf816 [fix] searx.network.stream: fix memory leak 2021-09-28 19:28:12 +02:00
Alexandre Flament 2eab89b4ca [fix] checker: fix memory usage
* download images using the "image_proxy" network (HTTP/1 instead of HTTP/2)
* don't cache data: URL (reduce memory usage)
* after each test: purge image URL cache then call garbage collector
* download only the first 64kb of images
2021-09-28 15:26:02 +02:00
Markus Heiser 2a3b9a2e26 [pylint] searx: drop no longer needed 'missing-function-docstring'
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-07 13:34:35 +02:00
Alexandre Flament 2f363858b8 [fix] searx.search.checker.get_result() always return a dict
So checker_results['status'] == 'ok' is enough to check the checker result.
See searx/webapp.py, /preferences endpoint
2021-08-16 08:29:16 +02:00
Markus Heiser 24f2376c11 [pylint] prepare for pylint v2.9.3 / fix some (new) pylint issues
Upgrade from pylint v2.8.3 to 2.9.3 raise some new issues::

  searx/search/checker/__main__.py:37:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
  searx/search/checker/__main__.py:38:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
  searx/search/processors/__init__.py:20:0: R0402: Use 'from searx import engines' instead (consider-using-from-import)
  searx/preferences.py:182:19: C0207: Use data.split('-', maxsplit=1)[0] instead (use-maxsplit-arg)
  searx/preferences.py:506:15: R1733: Unnecessary dictionary index lookup, use 'user_setting' instead (unnecessary-dict-index-lookup)
  searx/webapp.py:436:0: C0206: Consider iterating with .items() (consider-using-dict-items)
  searx/webapp.py:950:4: C0206: Consider iterating with .items() (consider-using-dict-items)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-03 17:54:08 +02:00
Markus Heiser fa0d05c313 [pylint] checker/__main__.py & checker/background.py
Lint files that has been touched by [PR #58]

[PR #58] https://github.com/searxng/searxng/pull/58

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-05 16:47:02 +02:00
Alexandre Flament 8c1a65d32f [mod] multithreading only in searx.search.* packages
it prepares the new architecture change,
everything about multithreading in moved in the searx.search.* packages

previously the call to the "init" function of the engines was done in searx.engines:
* the network was not set (request not sent using the defined proxy)
* it requires to monkey patch the code to avoid HTTP requests during the tests
2021-05-05 13:12:42 +02:00
Alexandre Flament 7cfd8d900a [mod] oscar: /preferences , engines tab: report engine times
* display the median time instead of the average.
* add a "Reliability" column (sum up the metrics and the checker results).
* the "selected language", "SafeSearch", "Time range" values are displayed as "broken" when the checker tests fail.
2021-04-21 16:24:46 +02:00
Alexandre Flament 7acd7ffc02 [enh] rewrite and enhance metrics 2021-04-21 16:24:46 +02:00
Alexandre Flament aae7830d14 [mod] refactoring: processors
Report to the user suspended engines.

searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
2021-04-21 16:24:46 +02:00
Alexandre Flament d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-04-12 17:25:56 +02:00
Alexandre Flament eaa694fb7d [enh] replace requests by httpx 2021-04-10 15:38:33 +02:00
Alexandre Flament 0b45afd4d7 [fix] checker: various bug fixes
* initialize engine_data (youtube engine)
* don't crash if an engine don't set result['url']
2021-03-25 09:37:37 +01:00
Rolf 80025c3244 Windows does not support SIGUSR1, so don't use it unconditionally. 2021-03-14 19:04:36 -03:00
Alexandre Flament 3b7b852aa8 [fix] checker: minor fix about language detection 2021-01-19 21:29:31 +01:00
Alexandre Flament aa887eb375 [mod] checker : replace pycld3 by langdetect
pycld3 requires the native library cld3
langdetect is a pure python package
2021-01-19 21:26:04 +01:00
Alexandre Flament 67a1aab0d5 [fix] /stats/checker : remove the timestamp field when the checker is disabled 2021-01-18 08:19:53 +01:00
Alexandre Flament d473407ec9 [fix] checker: fix engine statistics
Without this commit, the URL /stats/errors shows percentage above 100% after the checker has run.
2021-01-18 08:19:44 +01:00
Alexandre Flament 912c7e975c [fix] checker: don't run the checker when uwsgi is not properly configured
Before this commit, even with the scheduler disabled, the checker was running
at least once for each uwsgi worker.
2021-01-13 14:07:39 +01:00
Alexandre Flament 7f0c508598 [fix] checker: fix typo unknown instead of unknow 2021-01-12 11:47:17 +01:00
Alexandre Flament 87bafbc32b [mod] checker: add status and timestamp to the result
for each engine: replace status by success
2021-01-12 11:47:17 +01:00
Alexandre Flament 45bfab77d0 |mod] checker: improve searx-checker command line
* output is unbuffered
* verbose mode describe more precisly the errrors
2021-01-12 11:47:17 +01:00
Alexandre Flament 3a9f513521 [enh] checker: background check
See settings.yml for the options
SIGUSR1 signal starts the checker.
The result is available at /stats/checker
2021-01-12 11:47:17 +01:00
Markus Heiser 9c581466e1 [fix] do not colorize output on dumb terminals
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-12 11:47:17 +01:00
Alexandre Flament 8cbc9f2d58 [enh] add checker 2021-01-12 11:47:17 +01:00