2. Creating a custom searxng plugin in Python

Running searxng instance locally

We will fork the https://github.com/searxng/searxng repository and add our custom plugins and engines, the repository has a Makefile lets check its contents and break them down:

Default Goal: .DEFAULT_GOAL=help sets the default target to execute when no specific target is provided. In this case, it's set to help.

Targets:

  • help: Prints out help information about available targets and their descriptions.
  • run: Depends on install. Executes ./manage webapp.run.
  • install, uninstall: Execute ./manage pyenv.$@, where $@ is replaced with either install or uninstall.
  • clean: Cleans up the working tree by calling other clean targets and removing various temporary files.
  • test, ci.test, test.shell: Run various tests for the project.
  • Other targets like docs, docker, and themes are shorthand for specific actions related to documentation, Docker, and themes respectively.

To run the project locally:

make run

There is also a convenient .vscode folder with the configuration required for debugging using VS code

Basic structure of a plugin

Referring to the Plugin documentation, there are three different methods a plugin can implement that can be hooked into Flask

The pre_search function is a callback that runs before a search request is executed. It takes two parameters:

  1. request (of type flask.request): Represents the Flask request object, which contains information about the HTTP request being made.
  2. search (of type searx.search.SearchWithPlugins)

The function is expected to return a boolean value:

  • True indicates that the search should continue.
  • False indicates that the search should be stopped.

Additionally, the function has access to modify the result_container attribute of the search object, which suggests that it can potentially manipulate the container holding the search results

def pre_search(request, search):
    # Check some condition based on the request or search object
    if some_condition:
        # Modify the result_container if needed
        search.result_container = modify_results(search.result_container)
        # Continue the search
        return True
    else:
        # Stop the search
        return False

On-result

on_result, is a callback that runs for each individual result obtained from a search engine during a search operation. It allows you to manipulate and potentially filter these results before they are presented to the user.

  • request (of type flask.request): Is the Flask request object, providing information about the HTTP request.
  • result (of type Dict): This parameter represents an individual search result, structured as a dictionary. The dictionary likely contains information about the result, such as its URL, title, description, etc. It's mentioned that if result["url"] is defined, then result["parsed_url"] must be updated accordingly using urlparse(result['url']).

The function returns a boolean value:

  • True indicates that the result should be kept and included in the final set of results.
  • False indicates that the result should be removed and not included in the final set of results.
from urllib.parse import urlparse

def on_result(request, search, result):
    # Check some condition based on the result
    if some_condition:
        # Update the parsed_url if url is defined
        if 'url' in result:
            result['parsed_url'] = urlparse(result['url'])
        # Keep the result
        return True
    else:
        # Remove the result
        return False

Post-search

The post_search is a callback that runs after a search request has been executed. It's designed to perform actions or cleanup tasks after the search operation is complete.

def post_search(request, search):
    # Perform post-search operations here
    # This could include logging, cleanup, or any other tasks

    # Example: Logging the end of the search operation
    app.logger.info('Search request completed successfully.')

    # No return statement needed since the function returns None

Plugin for filtering Wikipedia results

from flask_babel import gettext

name = gettext("No Wikipedia")
'''Translated name of the plugin'''

plugin_id = 'no_wikipedia'

description = gettext(
    "This plugin removes all results from wikipedia"
)
'''Translated description of the plugin.'''

preference_section = 'query'
'''The preference section where the plugin is shown.'''

query_keywords = ['no-wikipedia']
'''Query keywords shown in the preferences.'''

def on_result(request, search, result):
    # Check if the result has a URL and if it contains "wikipedia.org"
    return "url" not in result or "wikipedia.org" not in result["url"]

Configuration details are provided for enabling the plugin in the settings.yml file.

plugins:
	- no_wikipedia
enabled_plugins:
  - 'No Wikipedia'

Plugin for filtering results from URLs defined in the configuration file

Let's create our own configuration parameter inside the settings.yml file.

no_media:
	- 'cnn.com'
	- 'nytimes.com'
	- 'foxnews.com'
	- 'washingtonpost.com'
	- 'nbcnews.com'

Now let's use these value to create a filter that removes these results from the search request

name = gettext("No Media Plugin")
'''Translated name of the plugin'''

plugin_id = 'no_media'

description = gettext(
    "This plugin removes all results from mainstream media websites"
)
'''Translated description of the plugin.'''

preference_section = 'query'
'''The preference section where the plugin is shown.'''

query_keywords = ['no-media']
'''Query keywords shown in the preferences.'''

media_list = settings[plugin_id]

def on_result(request, search, result):
	if "url" in result:
		for media in media_list:
			if media in result["url"]:
				return False
	return True

Add the plugin to the settings.xml

plugins:
	- no_media
enabled_plugins:
  - 'No Media'

We can now see a the plugins and enable them inside the preferences section

Screenshot from 2024-01-30 15-37-03.png