As stated in .. and other posts, the defaults of uWSGI not suitable for a
productive environment. To give just one example, the workers run indefinitely
and the memory leaks aggregate.
- "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1]
- "Configuring uWSGI for Production Deployment" [2]
- "When Paul has tested some PR on his instance, we could clearly see a memory
leak over a week: the memory never dropped to the initial value. Same for my
instance using Docker." [3]
[1] https://av.tib.eu/media/44810
[2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/
[3] https://github.com/searxng/searxng/pull/3443#issuecomment-2094347004
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* Docker: add UWSGI_WORKERS and UWSGI_THREAD.
UWSGI_WORKERS specifies the number of process.
UWSGI_THREADS specifies the number of threads.
The Docker convention is to specify the whole configuration
through environment variables. While not done in SearXNG, these two
additional variables allows admins to skip uwsgi.ini
In additional, https://github.com/searxng/preview-environments starts Docker
without additional files through searxng-helm-chat.
Each instance consumes 1Go of RAM which is a lot especially when there are a
lot of instances / pull requests.
* [scripts] add environments UWSGI_WORKERS and UWSGI_THREADS
- UWSGI_WORKERS specifies the number of process.
- UWSGI_THREADS specifies the number of threads.
Templates for uwsgi scripts can be tested by::
UWSGI_WORKERS=8 UWSGI_THREADS=9 \
./utils/searxng.sh --cmd\
eval "echo \"$(cat utils/templates/etc/uwsgi/*/searxng.ini*)\""\
| grep "workers\|threads"
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
---------
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Caching files on the client side for more than a day can confuse the end user
when updating static files[1].
Depending on the way of providing a SearXNG instance via HTTP, there are several
ways to optimize the access to the /static files. However, since we don't know
what optimization an admin has provided for his static files, we should have
moderate settings in the defaults that run robustly in a wide variety of
installations.
In this sense, all caches on the client side should be cleared after one day at
the latest. So far the files were cached for one year on client side; as soon
as changes are made to the static files (with the option `static_use_hash:
true`) the old static files are kept for one year on the CLient side / which can
also be evaluated as unnecessary caching.
[1] https://github.com/searxng/searxng/discussions/2821
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Partial reverse engineering of the Google engines including a improved language
and region handling based on the engine.traits_v1 data.
When ever possible the implementations of the Google engines try to make use of
the async REST APIs. The get_lang_info() has been generalized to a
get_google_info() function / especially the region handling has been improved by
adding the cr parameter.
searx/data/engine_traits.json
Add data type "traits_v1" generated by the fetch_traits() functions from:
- Google (WEB),
- Google images,
- Google news,
- Google scholar and
- Google videos
and remove data from obsolete data type "supported_languages".
A traits.custom type that maps region codes to *supported_domains* is fetched
from https://www.google.com/supported_domains
searx/autocomplete.py:
Reversed engineered autocomplete from Google WEB. Supports Google's languages and
subdomains. The old API suggestqueries.google.com/complete has been replaced
by the async REST API: https://{subdomain}/complete/search?{args}
searx/engines/google.py
Reverse engineering and extensive testing ..
- fetch_traits(): Fetch languages & regions from Google properties.
- always use the async REST API (formally known as 'use_mobile_ui')
- use *supported_domains* from traits
- improved the result list by fetching './/div[@data-content-feature]'
and parsing the type of the various *content features* --> thumbnails are
added
searx/engines/google_images.py
Reverse engineering and extensive testing ..
- fetch_traits(): Fetch languages & regions from Google properties.
- use *supported_domains* from traits
- if exists, freshness_date is added to the result
- issue 1864: result list has been improved a lot (due to the new cr parameter)
searx/engines/google_news.py
Reverse engineering and extensive testing ..
- fetch_traits(): Fetch languages & regions from Google properties.
*supported_domains* is not needed but a ceid list has been added.
- different region handling compared to Google WEB
- fixed for various languages & regions (due to the new ceid parameter) /
avoid CONSENT page
- Google News do no longer support time range
- result list has been fixed: XPath of pub_date and pub_origin
searx/engines/google_videos.py
- fetch_traits(): Fetch languages & regions from Google properties.
- use *supported_domains* from traits
- add paging support
- implement a async request ('asearch': 'arc' & 'async':
'use_ac:true,_fmt:html')
- simplified code (thanks to '_fmt:html' request)
- issue 1359: fixed xpath of video length data
searx/engines/google_scholar.py
- fetch_traits(): Fetch languages & regions from Google properties.
- use *supported_domains* from traits
- request(): include patents & citations
- response(): fixed CAPTCHA detection (Scholar has its own CATCHA manager)
- hardening XPath to iterate over results
- fixed XPath of pub_type (has been change from gs_ct1 to gs_cgt2 class)
- issue 1769 fixed: new request implementation is no longer incompatible
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since ./utils/searxng.sh is implemented, the old installation procedures from
filtron, morty and searx can be removed.
For users who want to upgrade, the procedures for removing old installations
have still been retained.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Not all settings from the 'brand:' section of the YAML files are needed in the
shell scripts. This patch reduce the variables in ./utils/brand.env to what is
needed. The following ('brand:' settings) can be removed from this file:
- ISSUE_URL
- DOCS_URL
- PUBLIC_INSTANCES
- WIKI_URL
Tasks running outside of an *installed instance*, need the following settings
from the YAML configuration:
- GIT_URL <--> brand.git_url
- GIT_BRANCH <--> brand.git_branch
- SEARX_URL <--> server.base_url (aka PUBLIC_URL)
- SEARX_PORT <--> server.port
- SEARX_BIND_ADDRESS <--> server.bind_address
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In commit 94851790 we have centralized all SearXNG setups in the settings.yml
file:
94851790 [mod] move brand options from Makefile to settings.yml
This step has not yet been completed for the installation procedures! Since all
SearXNG setups are done in the settings.yml these environment variables needs to
be removed from the ./conf.sh file. Scripts and other tasks running outside of
an instance got the needed values from the ./utils/brand.env file.
By example: ATM the environment variables of the ./config.sh file are in
conflict with them from settings.yml:
- PUBLIC_URL --> {server:base_url}
- SEARX_INTERNAL_HTTP --> {server:bind_address}.{server:port}
- GIT_BRANCH --> {brand:GIT_URL}
These environment variable of a SearXNG instance and additional
- SEARX_SETTINGS_TEMPLATE
has been remove from the '.config.sh' file. With this patch, the main focus of
./conf.sh resists on environment variables needed for the installation of morty,
filtron software.
modified .config.sh:
- removed no longer supported variables (see above)
- add comment about: SearXNG setup in settings.yml
modified utils/searx.sh:
- SEARX_INTERNAL_HTTP no longer take from .config.sh
- SEARX_SETTINGS_PATH /etc/searx/settings.yml
- SEARX_SETTINGS_TEMPLATE obsolete
modified utils/lib_install.sh:
Initialize environment variables SEARX_PYENV, SEARX_SETTINGS_PATH and
PUBLIC_URL.
modified: utils/morty.sh
Add missing hint about SEARX_SETTINGS_PATH and move PUBLIC_URL to
utils/lib_install.sh
modified: utils/morty.sh
Move PUBLIC_URL to utils/lib_install.sh
Renamed utils/templates/etc/searx/use_default_settings.yml -> settings.yml
- removed option which can't be modified after installation
- add some comments with examples
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Added function searx.get_setting(name, default=_unset):
Returns the value to which ``name`` point. If there is no such name in the
settings and the ``default`` is unset, a KeyError exception is raised.
In all the python processes ..
- make docs
- make buildenv
- make install (setup.py)
the usage of the 'brand.*' name space is replaced by 'searx.get_setting'
function.
- brand.SEARX_URL --> get_setting('server.base_url')
- brand.GIT_URL --> get_setting('brand.git_url')
- brand.GIT_BRANCH' --> get_setting('server.base_url')
- brand.ISSUE_URL --> get_setting('brand.issue_url')
- brand.DOCS_URL --> get_setting('brand.docs_url')
- brand.PUBLIC_INSTANCES --> get_setting('brand.public_instances')
- brand.CONTACT_URL --> get_setting('general.contact_url', '')
- brand.WIKI_URL --> get_setting('brand.wiki_url')
- brand.TWITTER_URL --> get_setting('brand.twitter_url', '')
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Access to formats can be denied by settings configuration::
search:
formats: [html, csv, json, rss]
Closes: https://github.com/searxng/searxng/issues/95
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since #2291 is merged, it is recommend to use::
use_default_settings=True
1. Add a template file use_default_settings.yml::
SEARX_SETTINGS_TEMPLATE="${REPO_ROOT}/utils/templates/etc/searx/use_default_settings.yml"
2. In Chapter "Configuration" recommend to make use of
'use_default_settings=True' and describe it
3. Rewrite of docs/admin/settings.rst
- move chapter 'settings.yml location' to the top
- update and split chapter 'Global Settings'
4. Add environment SEARX_SETTINGS_TEMPLATE to .config.sh
5. Use environment $SEARX_SETTINGS_TEMPLATE in the utils/searx.sh script
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Python's default encoding depends on the platform, set (python) default encoding
UTF-8 in uwsgi ini files:
LANG=C.UTF-8
LANGUAGE=C.UTF-8
LC_ALL=C.UTF-8
Error pattern:
Traceback (most recent call last):
File "/usr/local/searx/searx-src/searx/webapp.py", line 74, in <module>
from searx.search import SearchWithPlugins, get_search_query_from_webapp
File "/usr/local/searx/searx-src/searx/search.py", line 32, in <module>
from searx.external_bang import get_bang_url
File "/usr/local/searx/searx-src/searx/external_bang.py", line 13, in <module>
for bang in json.load(json_file)['bang']:
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.8/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31341: ordinal not in range(128)
close: https://github.com/asciimoo/searx/issues/2041
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This is the revision of the documentation about the varous nginx installation
variants. It also implements the nginx installation scripts for morty and
filtron.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- no more need for a .config.mk
- docs: use searx.brands environment
- searx.sh, filtron.sh & morty.sh are sourcing utils/brand.env
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>