Skip to content

Latest commit

 

History

History
174 lines (115 loc) · 13.4 KB

File metadata and controls

174 lines (115 loc) · 13.4 KB

Additional documentation

This folder contains additional documentation, primarily aimed at configuring and setting up the application to support only a subset of the use cases.

Server setup

Requirements

Hardware

To run the full app a sufficiently powerful server is advised. A GPU is only required if you want to locally run the LLM-based functionality. Otherwise, the relevant services should be configured to outsource such functionality to cloud services.

Our server has the following specifications:

  • CPU: 13th Gen Intel(R) Core(TM) i5-13500
  • GPU: NVIDIA RTX 4000 SFF Ada Generation
  • Memory: 64GB
  • Storage: 2TB

Software

This application is a semantic.works app and thereby has limited dependencies. The following software is required to run the application:

  • git to obtain the application source code
  • docker and docker compose to configure and run the application's microservices
  • A reverse proxy that forwards HTTP requests to the app's identifier service. We typically use app-letsencrypt for this purpose.
  • The mu-cli tool can be used to simplify some tasks, e.g. generating migration files.

Updating the app

Generally updating (parts of) the app consists of pulling the latest version from the remote repository via a git pull and, recreating and/or restarting the appropriate services. For each service A that was added or updated (version bump or changed environment variables), do docker compose up [-d] A. For each service B for which their configuration was updated in the ../config/B folder, do a docker compose restart B. Note, that up on its own does not cause a service to update its configuration.

Service configuration

Most of the services in this app are configured via the docker compose configurations files and appropriate configuration files in the config folder in this project. Note, the gitbook page on UC0.0 contains background on the overal architecture of a semantic.works application.

To simplify configuring the appropriate services we provide partner-specific configurations.

Identifier

The identifier service is an HTTP proxy that acts as access point to the app. All external requests should be forwarded to this service for further processing in an app. On servers we typically use app-letsencrypt as a reverse proxy to forward incoming requests the the correct app instance. To allow app-letsencrypt to forward requests to the correct app, the app's identifier service should

  • expose the appropriate environment variables; and
  • be part of of app-letsencrypt's default network.

This is most easily done in the app's docker-compose.override.yml configuration file. For example, the DECIDe app instance hosted by ABB has the following configuration entries:

services:
  identifier:
    environment:
      VIRTUAL_HOST: "ds.decide.lblod.info,dashboard.decide.lblod.info,yasgui.decide.lblod.info,human-validator.decide.lblod.info"
      LETSENCRYPT_HOST: "ds.decide.lblod.info,dashboard.decide.lblod.info,yasgui.decide.lblod.info,human-validator.decide.lblod.info"
      LETSENCRYPT_EMAIL: "support+servers@redpencil.io"
  # Configuration for other services
  # ...

networks:
  proxy:
    name: letsencrypt_default
    external: true

The example docker compose override files in this folder contain commented template entries that can be used for your app.

Subdomains used for different frontends

This app contains several frontends to which the dispatcher service forwards requests based on subdomains. This can be seen in the dispatcher service configuration in rules using reverse_host to match incoming requests. Should you use different subdomains in you app instance, make sure to update the appropriate rules in your app's dispatcher configuration.

Frontend Subdomain
Pipeline dashboard dashboard
Yasgui yasgui
dcat ds
Human Validation Tool human-validator
Smart search 'smart-search'
Policy impact report policy-impact-report

Outsource LLM to the cloud

The AI services relying on LLMs by default use local models. But they can also be configured to outsource such computations to external providers in the cloud. This requires at least that you

  • obtain the appropriate API keys (or other access tokens) from the providers; and
  • configure the necessary services with these via their environment variables.

The READMEs for each individual service describes the necessary configuration in more detail:

  • The named-entity-recognition (NER) service allows to configure providers for several of its features.
  • The entity-linking-backend service README documents how to configure external providers.
  • The codelist-labeling service can be configured to use a mistral as external provider. Using another external provider requires adding the appropriate langchain-* package to the service by editing its requirements.txt file and building your own image.
  • The Question-answering service can be configured to use different providers. This does require adding the appropriate langchain-* package to the service by editing its requirements.txt file and building your own image.
  • The Embedding service currently does not not support using an external provider. Embeddings can generated locally without a GPU, but this will take considerable longer.

LDES services (DCAT Federation)

The LDES services that produce (the data) for an app's LDES feed have to be configured with the proper base URL. The intended value is the base dataspace URL of your application instance with /ldes/ as affix. For example, for ABB's application instance has https://ds.decide.lblod.info/ as URL, the base URL for the two LDES services than becomes https://ds.decide.lblod.info/ldes/.

Note that the /ldes/ affix is important as the dispatcher relies on this to forward requests to the ldes-serve-feed service. If you use a base URL that does not end in /ldes/, you should also update the corresponding dispatcher rule accordingly.

See the configuration for the ldes-delta-pusher and ldes-serve-feed services in you override file.

Login for pipeline dashboard

Using the pipeline dashboard requires you create the appropriate user accounts. The overall README in this repository describes how to do this in its "Account management for the pipeline dashboard" section.

As an, optional, additional layer of security it is possible to configure an application-wide salt for all passwords in an app instance, also called a pepper. Enabling this does require some additional configuration:

  • You have to set the MU_APPLICATION_SALT environment variable for the login service to an appropriate value:
# In docker-compose.override.yml
services:
  login:
    environment:
      MU_APPLICATION_SALT: "REPLACE-WITH-A-LONG-SECURE-RANDOM-NUMBER"
  • When generating an account migration using the generate-account script you also have to provide the correct application salt via the --salt argument.

Note, this application salt should not be confused with the regular salts generated and appended to each password before hashing it. The latter is already taken care of by the generate-account script.

Partner configurations

This folder also contains some pre-configured docker compose configurations disabling services that are unnecessary for the use cases specific partners are interested in. The easiest way to include this configurations is to add them as last entry in your .env file:

COMPOSE_FILE=docker-compose.yml:./docs/docker-compose.override.NAME.yml:docker-compose.override.yml

Note, take care not to include the docker-compose.dev.yml file here as this can expose services to the outside world.

Bamberg

The city of Bamberg is mostly interested in use case 0.1 and 2. Therefore their partner-specific configuration disables unnecessary services as well as provide some placeholders for configuring specific services. See the comments in the override file for more information.

Data harvesting

Due to technical limitations our pdf-scraper service cannot directly retrieve PDFs from the web portal of the city of Bamberg. A workaround is to obtain the PDFs via another method and feed them into the app from disk using an additional service.

To this end, an internal-files service is configured in docker-compose.override.bamberg.yml. This service mounts a folder data/internal-files, make sure to create this folder, in which PDFs can be placed.

In the pipeline dashboard you can use http://internal-files/FILENAME.pdf as input decision URLs. As municipality select Stadt Bamberg from the options in the dropdown, as illustrated in the following screenshot.

Example form for harvesting PDFs

Freiburg

The city of Freiburg is mostly interested in use case 1. Therefore, their partner-specific configuration disables most services for other use cases and provides some placeholders for configuring relevant services. See the comments in the override file for more information.

Data harvesting

To harvest decisions from your OParl endpoint, use the pipeline dashboard to create a "Harvest OParl API & Publish as ELI" job. For example, the following screenshot illustrate the form to harvest all decisions from https://ris.freiburg.de/oparl.

Example form for harvesting from an OParl endpoint

Nominatim

The nominatim service should be configured to retrieve the OpenStreetMap (OSM) Data Extracts for Germany instead of Belgium. To do this set the PBF_URL environment to the correct URL, as illustrated in docker-compose.override.freiburg.yml.

Warning

The OSM Data Extracts are downloaded and processing only when starting the service for the first time. Be sure to configure the correct PBF_URL before starting the service. If you (accidentally) started the service with a incorrect PBF_URL you can down the service, remove the mounted volume, and up the service with the correct configuration.

Note

Downloading and processing the Data Extract for Germany takes a long time, in the order of hours on my development machine, and uses a lot of resources from your machine. You can follow its proces via the service's logs.

Ghent

The city of Ghent is mostly interested in use cases 0.1 and 2. Therefore, their partner-specific configuration disables most services for other use cases and provides some placeholders for configuring relevant services. See the comments in the override file for more information.

Data harvesting

Decisions for Ghent are harvested from Lokaal Beslist. The pipeline dashboard can be used to create the relevant jobs to gather data. To initially harvest all decisions create a "Harvest Lokaal Beslist OSLO & Publish as ELI" job and select as "Initial sync" as Sync mode in the form. The URL field will be automatically filled with the correct value.

Example form for initial sync

Warning

The initial sync will take some time as it syncs all data from the configured harvester.

To update your data after an initial sync has completed, also create a "Harvest Lokaal Beslist OSLO & Publish as ELI" job. But select "Delta sync" as Sync mode:

Example form for delta sync

To periodically update your data automatically, you can create a Scheduled job for a delta sync. This can be done via the "Scheduled jobs" tab of the pipeline dashboard.

Example form for scheduled delta sync