Commit Graph

59 Commits

Author SHA1 Message Date
Emilien Devos 827af00d9e Revert "[mod] activate limiter & link_token method (aka CSS ping) by default"
This reverts commit 3af629ec09.
2023-09-23 14:16:29 +02:00
Markus Heiser 3af629ec09 [mod] activate limiter & link_token method (aka CSS ping) by default
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-09-23 14:00:03 +02:00
Alexandre Flament 4573417b6c uwsgi.ini: remove unused cache2
cache2 was used before PR #1856
2023-08-13 07:55:05 +02:00
Markus Heiser 2499899554 [mod] Google: reversed engineered & upgrade to data_type: traits_v1
Partial reverse engineering of the Google engines including a improved language
and region handling based on the engine.traits_v1 data.

When ever possible the implementations of the Google engines try to make use of
the async REST APIs.  The get_lang_info() has been generalized to a
get_google_info() function / especially the region handling has been improved by
adding the cr parameter.

searx/data/engine_traits.json
  Add data type "traits_v1" generated by the fetch_traits() functions from:

  - Google (WEB),
  - Google images,
  - Google news,
  - Google scholar and
  - Google videos

  and remove data from obsolete data type "supported_languages".

  A traits.custom type that maps region codes to *supported_domains* is fetched
  from https://www.google.com/supported_domains

searx/autocomplete.py:
  Reversed engineered autocomplete from Google WEB.  Supports Google's languages and
  subdomains.  The old API suggestqueries.google.com/complete has been replaced
  by the async REST API: https://{subdomain}/complete/search?{args}

searx/engines/google.py
  Reverse engineering and extensive testing ..
  - fetch_traits():  Fetch languages & regions from Google properties.
  - always use the async REST API (formally known as 'use_mobile_ui')
  - use *supported_domains* from traits
  - improved the result list by fetching './/div[@data-content-feature]'
    and parsing the type of the various *content features* --> thumbnails are
    added

searx/engines/google_images.py
  Reverse engineering and extensive testing ..
  - fetch_traits():  Fetch languages & regions from Google properties.
  - use *supported_domains* from traits
  - if exists, freshness_date is added to the result
  - issue 1864: result list has been improved a lot (due to the new cr parameter)

searx/engines/google_news.py
  Reverse engineering and extensive testing ..
  - fetch_traits():  Fetch languages & regions from Google properties.
    *supported_domains* is not needed but a ceid list has been added.
  - different region handling compared to Google WEB
  - fixed for various languages & regions (due to the new ceid parameter) /
    avoid CONSENT page
  - Google News do no longer support time range
  - result list has been fixed: XPath of pub_date and pub_origin

searx/engines/google_videos.py
  - fetch_traits():  Fetch languages & regions from Google properties.
  - use *supported_domains* from traits
  - add paging support
  - implement a async request ('asearch': 'arc' & 'async':
    'use_ac:true,_fmt:html')
  - simplified code (thanks to '_fmt:html' request)
  - issue 1359: fixed xpath of video length data

searx/engines/google_scholar.py
  - fetch_traits():  Fetch languages & regions from Google properties.
  - use *supported_domains* from traits
  - request(): include patents & citations
  - response(): fixed CAPTCHA detection (Scholar has its own CATCHA manager)
  - hardening XPath to iterate over results
  - fixed XPath of pub_type (has been change from gs_ct1 to gs_cgt2 class)
  - issue 1769 fixed: new request implementation is no longer incompatible

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser 5820dc78ce [doc] slight improvements to the doc of the settings (base_url)
Closes: https://github.com/searxng/searxng/issues/2190

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-17 12:08:58 +01:00
Markus Heiser ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Léon Tiekötter 2b94fef7ec [fix] uWSGI: increase buffer-size
Increase max size of a request, by default it is 4k [1].  4096 as buffer-size is
too small and will result in the preference urls not working.

[1] https://uwsgi-docs.readthedocs.io/en/latest/Options.html#buffer-size
2022-07-31 12:40:06 +02:00
Markus Heiser 692708aa77 [clean up] drop obsolete searx, filtron and morty install scripts
Since ./utils/searxng.sh is implemented, the old installation procedures from
filtron, morty and searx can be removed.

For users who want to upgrade, the procedures for removing old installations
have still been retained.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-30 13:39:35 +02:00
Markus Heiser 782f73540e [utils/searxng.sh] implement new script to install SearXNG
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-30 13:39:35 +02:00
Markus Heiser 81bba44869 [install scripts] rename SEARX_<name> variables to SEARXNG_<name>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-30 13:39:35 +02:00
Markus Heiser 8d69ee5e7f [mod] Serving static files with uWSGI (searxng.ini)
1. Serving static files with uWSGI by using static file mount points [1].
2. Expires set to one year since there are hashes [2]

[1] https://uwsgi-docs.readthedocs.io/en/latest/StaticFiles.html#mode-3-using-static-file-mount-points
[2] https://github.com/searxng/searxng/pull/932

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-08 18:13:13 +01:00
Markus Heiser 5eedd5b72a [fix] socket in SearXNG's uWSGI app (searxng.ini)
Use SEARX_UWSGI_SOCKET in uWSGI systemd service

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-08 17:44:27 +01:00
Alexandre Flament 76cbfbbdda reference docs.searxng.org 2022-01-02 21:18:29 +01:00
Markus Heiser 1dae0c0be0 [brand] SearXNG - docs rename links and fix documentation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-11-18 18:27:26 +01:00
Markus Heiser 2b1252148d [brand] SearXNG - nginx & apache searxng.conf, uwsgi searxng.conf
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-30 16:11:01 +02:00
Markus Heiser a9fc4885f2 [brand] SearXNG - bash env SEARXNG_URL
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-30 16:11:01 +02:00
Markus Heiser 60edf2623d [brand] SearXNG - reference /etc/searxng/settings.yml
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-11 12:28:55 +00:00
Alexandre Flament 253b850376 SearXNG: SEARXNG_SETTINGS_PATH 2021-10-02 17:18:05 +02:00
Markus Heiser b6a55e223c [mod] reduce enviroment variables in shell scripts to what is needed
Not all settings from the 'brand:' section of the YAML files are needed in the
shell scripts.  This patch reduce the variables in ./utils/brand.env to what is
needed.  The following ('brand:' settings) can be removed from this file:

- ISSUE_URL
- DOCS_URL
- PUBLIC_INSTANCES
- WIKI_URL

Tasks running outside of an *installed instance*, need the following settings
from the YAML configuration:

- GIT_URL            <--> brand.git_url
- GIT_BRANCH         <--> brand.git_branch
- SEARX_URL          <--> server.base_url  (aka PUBLIC_URL)
- SEARX_PORT         <--> server.port
- SEARX_BIND_ADDRESS <--> server.bind_address

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-24 16:28:15 +02:00
Markus Heiser f61c918dd4 [mod] normalize .config.sh with settings.yml
In commit 94851790 we have centralized all SearXNG setups in the settings.yml
file:

  94851790 [mod] move brand options from Makefile to settings.yml

This step has not yet been completed for the installation procedures!  Since all
SearXNG setups are done in the settings.yml these environment variables needs to
be removed from the ./conf.sh file.  Scripts and other tasks running outside of
an instance got the needed values from the ./utils/brand.env file.

By example: ATM the environment variables of the ./config.sh file are in
conflict with them from settings.yml:

  - PUBLIC_URL          --> {server:base_url}
  - SEARX_INTERNAL_HTTP --> {server:bind_address}.{server:port}
  - GIT_BRANCH          --> {brand:GIT_URL}

These environment variable of a SearXNG instance and additional

  - SEARX_SETTINGS_TEMPLATE

has been remove from the '.config.sh' file.  With this patch, the main focus of
./conf.sh resists on environment variables needed for the installation of morty,
filtron software.

modified  .config.sh:
  - removed no longer supported variables (see above)
  - add comment about: SearXNG setup in settings.yml

modified utils/searx.sh:
  - SEARX_INTERNAL_HTTP no longer take from .config.sh
  - SEARX_SETTINGS_PATH /etc/searx/settings.yml
  - SEARX_SETTINGS_TEMPLATE obsolete

modified utils/lib_install.sh:
  Initialize environment variables SEARX_PYENV, SEARX_SETTINGS_PATH and
  PUBLIC_URL.

modified:  utils/morty.sh
  Add missing hint about SEARX_SETTINGS_PATH and move PUBLIC_URL to
  utils/lib_install.sh

modified:  utils/morty.sh
  Move PUBLIC_URL to utils/lib_install.sh

Renamed utils/templates/etc/searx/use_default_settings.yml -> settings.yml
  - removed option which can't be modified after installation
  - add some comments with examples

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-21 14:38:57 +02:00
Markus Heiser 3e50e8de3e [mod] drop usage of the searx.brand namespace (python procs)
Added function searx.get_setting(name, default=_unset):
  Returns the value to which ``name`` point.  If there is no such name in the
  settings and the ``default`` is unset, a KeyError exception is raised.

In all the python processes ..

- make docs
- make buildenv
- make install (setup.py)

the usage of the 'brand.*' name space is replaced by 'searx.get_setting'
function.

- brand.SEARX_URL        --> get_setting('server.base_url')
- brand.GIT_URL          --> get_setting('brand.git_url')
- brand.GIT_BRANCH'      --> get_setting('server.base_url')
- brand.ISSUE_URL        --> get_setting('brand.issue_url')
- brand.DOCS_URL         --> get_setting('brand.docs_url')
- brand.PUBLIC_INSTANCES --> get_setting('brand.public_instances')
- brand.CONTACT_URL      --> get_setting('general.contact_url', '')
- brand.WIKI_URL         --> get_setting('brand.wiki_url')
- brand.TWITTER_URL      --> get_setting('brand.twitter_url', '')

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-18 15:55:42 +02:00
Markus Heiser 4a814dabf3 [yamllint] ./utils/templates/etc/searx/*.yml
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-12 20:46:07 +02:00
Markus Heiser 6ed4616da9 [enh] add settings option to enable/disable search formats
Access to formats can be denied by settings configuration::

    search:
        formats: [html, csv, json, rss]

Closes: https://github.com/searxng/searxng/issues/95
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-28 08:32:52 +02:00
Alex Balgavy 8736f5bd70 Use $host in nginx morty.conf template 2021-03-04 11:16:27 +01:00
Alex Balgavy 6b59800dc6 Fix security vulnerabilities in suggested nginx configuration
The suggested configurations for nginx found in the documentation and
templates lead to vulnerabilities allowing host spoofing [1] and path
traversal [2], as reported by Gixy [3]. This commit fixes those issues.

[1] https://github.com/yandex/gixy/blob/master/docs/en/plugins/hostspoofing.md
[2] https://github.com/yandex/gixy/blob/master/docs/en/plugins/aliastraversal.md
[3] https://github.com/yandex/gixy
2021-03-03 12:34:22 +01:00
Alexandre Flament 6e2872f436 [enh] add searx.shared
shared dictionary between the workers (UWSGI or werkzeug)
scheduler: run a task once every x seconds (UWSGI or werkzeug)
2021-01-12 11:47:17 +01:00
Markus Heiser a70b9b9f61 [doc] recommend to use 'use_default_settings=True'
Since #2291 is merged, it is recommend to use::

  use_default_settings=True

1. Add a template file use_default_settings.yml::

    SEARX_SETTINGS_TEMPLATE="${REPO_ROOT}/utils/templates/etc/searx/use_default_settings.yml"

2. In Chapter "Configuration" recommend to make use of
   'use_default_settings=True' and describe it

3. Rewrite of docs/admin/settings.rst
   - move chapter 'settings.yml location' to the top
   - update and split chapter 'Global Settings'

4. Add environment SEARX_SETTINGS_TEMPLATE to .config.sh

5. Use environment $SEARX_SETTINGS_TEMPLATE in the utils/searx.sh script

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-12-08 20:19:10 +01:00
renyhp b386a815da
Fix typo chmod searx:searx > chown searx:searx 2020-10-19 17:31:02 +02:00
Markus Heiser f9faafa896
[fix] external_bang - UnicodeDecodeError: 'ascii' codec can't decode (#2043)
Python's default encoding depends on the platform, set (python) default encoding
UTF-8 in uwsgi ini files:

    LANG=C.UTF-8
    LANGUAGE=C.UTF-8
    LC_ALL=C.UTF-8

Error pattern:

    Traceback (most recent call last):
      File "/usr/local/searx/searx-src/searx/webapp.py", line 74, in <module>
        from searx.search import SearchWithPlugins, get_search_query_from_webapp
      File "/usr/local/searx/searx-src/searx/search.py", line 32, in <module>
        from searx.external_bang import get_bang_url
      File "/usr/local/searx/searx-src/searx/external_bang.py", line 13, in <module>
        for bang in json.load(json_file)['bang']:
      File "/usr/lib/python3.8/json/__init__.py", line 293, in load
        return loads(fp.read(),
      File "/usr/lib/python3.8/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31341: ordinal not in range(128)

close: https://github.com/asciimoo/searx/issues/2041

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-07-11 12:17:06 +02:00
Markus Heiser 6ff20cef73 [fix] indentation of filtron's rules (json)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-06-18 18:31:46 +02:00
Markus Heiser 58d5da8b57 nginx: normalize installation (docs and script)s over all distros
This is the revision of the documentation about the varous nginx installation
variants.  It also implements the nginx installation scripts for morty and
filtron.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-04-11 13:19:11 +02:00
Markus Heiser f693149cde Changes from the installation tests on (all) LXC containers.
Tested and fixed HTTP & uWSGI installation on:

  ubu1604 ubu1804 ubu1910 ubu2004 fedora31 archlinux

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-04-08 18:38:36 +02:00
Markus Heiser ee39a098ac apache: normalize installation (docs and script)s over all distros
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-04-07 18:31:51 +02:00
Markus Heiser eb0d4646d8 docs: rework of chapter "Install with apache"
BTW: normalize installation-nginx.rst
2020-04-06 17:59:06 +02:00
Markus Heiser c81849cb5a filtron.sh & morty.sh: improve usage message (if used in containers)
BTW: normalize soma variable names

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-04-05 17:40:37 +02:00
Markus Heiser e530e20ae6 misc: fix variuous marginals
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-04-04 17:53:16 +02:00
Markus Heiser 7b4cf2eb48 tooling box: simplify build enviroments
- no more need for a .config.mk
- docs: use searx.brands environment
- searx.sh, filtron.sh & morty.sh are sourcing utils/brand.env

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-03-29 15:09:34 +02:00
Markus Heiser c15337850e fix: minor typos
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-03-06 22:06:19 +01:00
Markus Heiser 387c6a7769 docs: improve description of uwsgi & ngingx setup
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-03-06 14:47:00 +01:00
Markus Heiser af6acd3417 LXC: install searx-suite installs searx, filtron & morty on all containers
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-02-26 19:07:55 +01:00
Markus Heiser d5917cc029 utils/lib.sh: make uWSGI installation available for all distros
support: ubuntu, debin, fedora, archlinux

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-02-25 20:20:17 +01:00
Markus Heiser 59e4026762 searx.sh: install settings at /etc/searx/settings.yml
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-02-17 18:58:59 +01:00
Markus Heiser de58f02f6b filtron: add missing log action to the filtron rules
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-02-17 15:36:10 +01:00
Markus Heiser 0d6153db12 filtron.sh: updated rules from production
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-02-11 15:57:42 +01:00
Markus Heiser ed4cb4f160 tooling box: varius fix from tests 2020-02-08 13:24:08 +01:00
Markus Heiser 71d7550dbe tooling box ./utils/*: minor fix from production test 2020-02-04 19:47:33 +01:00
Markus Heiser 2f40f61f83 /etc/filtron/rules.json: normalize rules from docs & tooling box
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-02-04 17:59:58 +01:00
Markus Heiser a4437c47ac utils/morty.sh: add script to install morty result proxy
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-02-02 18:14:10 +01:00
Markus Heiser 0bb8847087 utils/filtron.sh: add option to debug filtron requests
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-01-31 17:25:38 +01:00
Markus Heiser 91a55e159e apache: reverse proxy, set `ProxyPreserveHost On`
related discussions:

- https://github.com/asciimoo/searx/issues/1822
- https://github.com/asciimoo/searx/issues/1819#issuecomment-580400259

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-01-31 15:54:07 +01:00