Partial reverse engineering of the Google engines including a improved language
and region handling based on the engine.traits_v1 data.
When ever possible the implementations of the Google engines try to make use of
the async REST APIs. The get_lang_info() has been generalized to a
get_google_info() function / especially the region handling has been improved by
adding the cr parameter.
searx/data/engine_traits.json
Add data type "traits_v1" generated by the fetch_traits() functions from:
- Google (WEB),
- Google images,
- Google news,
- Google scholar and
- Google videos
and remove data from obsolete data type "supported_languages".
A traits.custom type that maps region codes to *supported_domains* is fetched
from https://www.google.com/supported_domains
searx/autocomplete.py:
Reversed engineered autocomplete from Google WEB. Supports Google's languages and
subdomains. The old API suggestqueries.google.com/complete has been replaced
by the async REST API: https://{subdomain}/complete/search?{args}
searx/engines/google.py
Reverse engineering and extensive testing ..
- fetch_traits(): Fetch languages & regions from Google properties.
- always use the async REST API (formally known as 'use_mobile_ui')
- use *supported_domains* from traits
- improved the result list by fetching './/div[@data-content-feature]'
and parsing the type of the various *content features* --> thumbnails are
added
searx/engines/google_images.py
Reverse engineering and extensive testing ..
- fetch_traits(): Fetch languages & regions from Google properties.
- use *supported_domains* from traits
- if exists, freshness_date is added to the result
- issue 1864: result list has been improved a lot (due to the new cr parameter)
searx/engines/google_news.py
Reverse engineering and extensive testing ..
- fetch_traits(): Fetch languages & regions from Google properties.
*supported_domains* is not needed but a ceid list has been added.
- different region handling compared to Google WEB
- fixed for various languages & regions (due to the new ceid parameter) /
avoid CONSENT page
- Google News do no longer support time range
- result list has been fixed: XPath of pub_date and pub_origin
searx/engines/google_videos.py
- fetch_traits(): Fetch languages & regions from Google properties.
- use *supported_domains* from traits
- add paging support
- implement a async request ('asearch': 'arc' & 'async':
'use_ac:true,_fmt:html')
- simplified code (thanks to '_fmt:html' request)
- issue 1359: fixed xpath of video length data
searx/engines/google_scholar.py
- fetch_traits(): Fetch languages & regions from Google properties.
- use *supported_domains* from traits
- request(): include patents & citations
- response(): fixed CAPTCHA detection (Scholar has its own CATCHA manager)
- hardening XPath to iterate over results
- fixed XPath of pub_type (has been change from gs_ct1 to gs_cgt2 class)
- issue 1769 fixed: new request implementation is no longer incompatible
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since ./utils/searxng.sh is implemented, the old installation procedures from
filtron, morty and searx can be removed.
For users who want to upgrade, the procedures for removing old installations
have still been retained.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
A script to build & install a simple & isolated redis service, dedicated to
SearXNG and connected via Unix socket.
$ ./manage redis.help
redis.:
devpkg : install essential packages to compile redis
build : build redis binaries at /800GBPCIex4/share/SearXNG/dist/redis/6.2.6/amd64
install : create user (searxng-redis) and install systemd service (searxng-redis)
remove : delete user (searxng-redis) and remove service (searxng-redis)
shell : start bash interpreter from user searxng-redis
src : clone redis source code to <path> and checkput 6.2.6
useradd : create user (searxng-redis) at /usr/local/searxng-redis
userdel : delete user (searxng-redis)
addgrp : add <user> to group (searxng-redis)
rmgrp : remove <user> from group (searxng-redis)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Not all settings from the 'brand:' section of the YAML files are needed in the
shell scripts. This patch reduce the variables in ./utils/brand.env to what is
needed. The following ('brand:' settings) can be removed from this file:
- ISSUE_URL
- DOCS_URL
- PUBLIC_INSTANCES
- WIKI_URL
Tasks running outside of an *installed instance*, need the following settings
from the YAML configuration:
- GIT_URL <--> brand.git_url
- GIT_BRANCH <--> brand.git_branch
- SEARX_URL <--> server.base_url (aka PUBLIC_URL)
- SEARX_PORT <--> server.port
- SEARX_BIND_ADDRESS <--> server.bind_address
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In commit 94851790 we have centralized all SearXNG setups in the settings.yml
file:
94851790 [mod] move brand options from Makefile to settings.yml
This step has not yet been completed for the installation procedures! Since all
SearXNG setups are done in the settings.yml these environment variables needs to
be removed from the ./conf.sh file. Scripts and other tasks running outside of
an instance got the needed values from the ./utils/brand.env file.
By example: ATM the environment variables of the ./config.sh file are in
conflict with them from settings.yml:
- PUBLIC_URL --> {server:base_url}
- SEARX_INTERNAL_HTTP --> {server:bind_address}.{server:port}
- GIT_BRANCH --> {brand:GIT_URL}
These environment variable of a SearXNG instance and additional
- SEARX_SETTINGS_TEMPLATE
has been remove from the '.config.sh' file. With this patch, the main focus of
./conf.sh resists on environment variables needed for the installation of morty,
filtron software.
modified .config.sh:
- removed no longer supported variables (see above)
- add comment about: SearXNG setup in settings.yml
modified utils/searx.sh:
- SEARX_INTERNAL_HTTP no longer take from .config.sh
- SEARX_SETTINGS_PATH /etc/searx/settings.yml
- SEARX_SETTINGS_TEMPLATE obsolete
modified utils/lib_install.sh:
Initialize environment variables SEARX_PYENV, SEARX_SETTINGS_PATH and
PUBLIC_URL.
modified: utils/morty.sh
Add missing hint about SEARX_SETTINGS_PATH and move PUBLIC_URL to
utils/lib_install.sh
modified: utils/morty.sh
Move PUBLIC_URL to utils/lib_install.sh
Renamed utils/templates/etc/searx/use_default_settings.yml -> settings.yml
- removed option which can't be modified after installation
- add some comments with examples
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Added function searx.get_setting(name, default=_unset):
Returns the value to which ``name`` point. If there is no such name in the
settings and the ``default`` is unset, a KeyError exception is raised.
In all the python processes ..
- make docs
- make buildenv
- make install (setup.py)
the usage of the 'brand.*' name space is replaced by 'searx.get_setting'
function.
- brand.SEARX_URL --> get_setting('server.base_url')
- brand.GIT_URL --> get_setting('brand.git_url')
- brand.GIT_BRANCH' --> get_setting('server.base_url')
- brand.ISSUE_URL --> get_setting('brand.issue_url')
- brand.DOCS_URL --> get_setting('brand.docs_url')
- brand.PUBLIC_INSTANCES --> get_setting('brand.public_instances')
- brand.CONTACT_URL --> get_setting('general.contact_url', '')
- brand.WIKI_URL --> get_setting('brand.wiki_url')
- brand.TWITTER_URL --> get_setting('brand.twitter_url', '')
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Access to formats can be denied by settings configuration::
search:
formats: [html, csv, json, rss]
Closes: https://github.com/searxng/searxng/issues/95
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since #2291 is merged, it is recommend to use::
use_default_settings=True
1. Add a template file use_default_settings.yml::
SEARX_SETTINGS_TEMPLATE="${REPO_ROOT}/utils/templates/etc/searx/use_default_settings.yml"
2. In Chapter "Configuration" recommend to make use of
'use_default_settings=True' and describe it
3. Rewrite of docs/admin/settings.rst
- move chapter 'settings.yml location' to the top
- update and split chapter 'Global Settings'
4. Add environment SEARX_SETTINGS_TEMPLATE to .config.sh
5. Use environment $SEARX_SETTINGS_TEMPLATE in the utils/searx.sh script
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Python's default encoding depends on the platform, set (python) default encoding
UTF-8 in uwsgi ini files:
LANG=C.UTF-8
LANGUAGE=C.UTF-8
LC_ALL=C.UTF-8
Error pattern:
Traceback (most recent call last):
File "/usr/local/searx/searx-src/searx/webapp.py", line 74, in <module>
from searx.search import SearchWithPlugins, get_search_query_from_webapp
File "/usr/local/searx/searx-src/searx/search.py", line 32, in <module>
from searx.external_bang import get_bang_url
File "/usr/local/searx/searx-src/searx/external_bang.py", line 13, in <module>
for bang in json.load(json_file)['bang']:
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.8/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31341: ordinal not in range(128)
close: https://github.com/asciimoo/searx/issues/2041
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This is the revision of the documentation about the varous nginx installation
variants. It also implements the nginx installation scripts for morty and
filtron.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- no more need for a .config.mk
- docs: use searx.brands environment
- searx.sh, filtron.sh & morty.sh are sourcing utils/brand.env
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>