This folder contains additional documentation, primarily aimed at configuring and setting up the application to support only a subset of the use cases.
To run the full app a sufficiently powerful server is advised. A GPU is only required if you want to locally run the LLM-based functionality. Otherwise, the relevant services should be configured to outsource such functionality to cloud services.
Our server has the following specifications:
- CPU: 13th Gen Intel(R) Core(TM) i5-13500
- GPU: NVIDIA RTX 4000 SFF Ada Generation
- Memory: 64GB
- Storage: 2TB
This application is a semantic.works app and thereby has limited dependencies. The following software is required to run the application:
gitto obtain the application source codedockeranddocker composeto configure and run the application's microservices- A reverse proxy that forwards HTTP requests to the app's identifier service. We typically use app-letsencrypt for this purpose.
- The mu-cli tool can be used to simplify some tasks, e.g. generating migration files.
Generally updating (parts of) the app consists of pulling the latest version from the remote repository via a git pull and, recreating and/or restarting the appropriate services.
For each service A that was added or updated (version bump or changed environment variables), do docker compose up [-d] A. For each service B for which their configuration was updated in the ../config/B folder, do a docker compose restart B. Note, that up on its own does not cause a service to update its configuration.
Most of the services in this app are configured via the docker compose configurations files and appropriate configuration files in the config folder in this project. Note, the gitbook page on UC0.0 contains background on the overal architecture of a semantic.works application.
To simplify configuring the appropriate services we provide partner-specific configurations.
The identifier service is an HTTP proxy that acts as access point to the app. All external requests should be forwarded to this service for further processing in an app. On servers we typically use app-letsencrypt as a reverse proxy to forward incoming requests the the correct app instance. To allow app-letsencrypt to forward requests to the correct app, the app's identifier service should
- expose the appropriate environment variables; and
- be part of of
app-letsencrypt's default network.
This is most easily done in the app's docker-compose.override.yml configuration file. For example, the DECIDe app instance hosted by ABB has the following configuration entries:
services:
identifier:
environment:
VIRTUAL_HOST: "ds.decide.lblod.info,dashboard.decide.lblod.info,yasgui.decide.lblod.info,human-validator.decide.lblod.info"
LETSENCRYPT_HOST: "ds.decide.lblod.info,dashboard.decide.lblod.info,yasgui.decide.lblod.info,human-validator.decide.lblod.info"
LETSENCRYPT_EMAIL: "support+servers@redpencil.io"
# Configuration for other services
# ...
networks:
proxy:
name: letsencrypt_default
external: trueThe example docker compose override files in this folder contain commented template entries that can be used for your app.
This app contains several frontends to which the dispatcher service forwards requests based on subdomains. This can be seen in the dispatcher service configuration in rules using reverse_host to match incoming requests. Should you use different subdomains in you app instance, make sure to update the appropriate rules in your app's dispatcher configuration.
| Frontend | Subdomain |
|---|---|
| Pipeline dashboard | dashboard |
| Yasgui | yasgui |
| dcat | ds |
| Human Validation Tool | human-validator |
| Smart search | 'smart-search' |
| Policy impact report | policy-impact-report |
The AI services relying on LLMs by default use local models. But they can also be configured to outsource such computations to external providers in the cloud. This requires at least that you
- obtain the appropriate API keys (or other access tokens) from the providers; and
- configure the necessary services with these via their environment variables.
The READMEs for each individual service describes the necessary configuration in more detail:
- The named-entity-recognition (NER) service allows to configure providers for several of its features.
- The entity-linking-backend service README documents how to configure external providers.
- The codelist-labeling service can be configured to use a mistral as external provider. Using another external provider requires adding the appropriate
langchain-*package to the service by editing itsrequirements.txtfile and building your own image. - The Question-answering service can be configured to use different providers. This does require adding the appropriate
langchain-*package to the service by editing itsrequirements.txtfile and building your own image. - The Embedding service currently does not not support using an external provider. Embeddings can generated locally without a GPU, but this will take considerable longer.
The LDES services that produce (the data) for an app's LDES feed have to be configured with the proper base URL. The intended value is the base dataspace URL of your application instance with /ldes/ as affix. For example, for ABB's application instance has https://ds.decide.lblod.info/ as URL, the base URL for the two LDES services than becomes https://ds.decide.lblod.info/ldes/.
Note that the /ldes/ affix is important as the dispatcher relies on this to forward requests to the ldes-serve-feed service. If you use a base URL that does not end in /ldes/, you should also update the corresponding dispatcher rule accordingly.
See the configuration for the ldes-delta-pusher and ldes-serve-feed services in you override file.
Using the pipeline dashboard requires you create the appropriate user accounts. The overall README in this repository describes how to do this in its "Account management for the pipeline dashboard" section.
As an, optional, additional layer of security it is possible to configure an application-wide salt for all passwords in an app instance, also called a pepper. Enabling this does require some additional configuration:
- You have to set the
MU_APPLICATION_SALTenvironment variable for theloginservice to an appropriate value:
# In docker-compose.override.yml
services:
login:
environment:
MU_APPLICATION_SALT: "REPLACE-WITH-A-LONG-SECURE-RANDOM-NUMBER"- When generating an account migration using the
generate-accountscript you also have to provide the correct application salt via the--saltargument.
Note, this application salt should not be confused with the regular salts generated and appended to each password before hashing it. The latter is already taken care of by the generate-account script.
This folder also contains some pre-configured docker compose configurations disabling services that are unnecessary for the use cases specific partners are interested in. The easiest way to include this configurations is to add them as last entry in your .env file:
COMPOSE_FILE=docker-compose.yml:./docs/docker-compose.override.NAME.yml:docker-compose.override.ymlNote, take care not to include the docker-compose.dev.yml file here as this can expose services to the outside world.
The city of Bamberg is mostly interested in use case 0.1 and 2. Therefore their partner-specific configuration disables unnecessary services as well as provide some placeholders for configuring specific services. See the comments in the override file for more information.
Due to technical limitations our pdf-scraper service cannot directly retrieve PDFs from the web portal of the city of Bamberg. A workaround is to obtain the PDFs via another method and feed them into the app from disk using an additional service.
To this end, an internal-files service is configured in docker-compose.override.bamberg.yml. This service mounts a folder data/internal-files, make sure to create this folder, in which PDFs can be placed.
In the pipeline dashboard you can use http://internal-files/FILENAME.pdf as input decision URLs. As municipality select Stadt Bamberg from the options in the dropdown, as illustrated in the following screenshot.
The city of Freiburg is mostly interested in use case 1. Therefore, their partner-specific configuration disables most services for other use cases and provides some placeholders for configuring relevant services. See the comments in the override file for more information.
To harvest decisions from your OParl endpoint, use the pipeline dashboard to create a "Harvest OParl API & Publish as ELI" job. For example, the following screenshot illustrate the form to harvest all decisions from https://ris.freiburg.de/oparl.
The nominatim service should be configured to retrieve the OpenStreetMap (OSM) Data Extracts for Germany instead of Belgium. To do this set the PBF_URL environment to the correct URL, as illustrated in docker-compose.override.freiburg.yml.
Warning
The OSM Data Extracts are downloaded and processing only when starting the service for the first time. Be sure to configure the correct PBF_URL before starting the service. If you (accidentally) started the service with a incorrect PBF_URL you can down the service, remove the mounted volume, and up the service with the correct configuration.
Note
Downloading and processing the Data Extract for Germany takes a long time, in the order of hours on my development machine, and uses a lot of resources from your machine. You can follow its proces via the service's logs.
The city of Ghent is mostly interested in use cases 0.1 and 2. Therefore, their partner-specific configuration disables most services for other use cases and provides some placeholders for configuring relevant services. See the comments in the override file for more information.
Decisions for Ghent are harvested from Lokaal Beslist. The pipeline dashboard can be used to create the relevant jobs to gather data. To initially harvest all decisions create a "Harvest Lokaal Beslist OSLO & Publish as ELI" job and select as "Initial sync" as Sync mode in the form. The URL field will be automatically filled with the correct value.
Warning
The initial sync will take some time as it syncs all data from the configured harvester.
To update your data after an initial sync has completed, also create a "Harvest Lokaal Beslist OSLO & Publish as ELI" job. But select "Delta sync" as Sync mode:
To periodically update your data automatically, you can create a Scheduled job for a delta sync. This can be done via the "Scheduled jobs" tab of the pipeline dashboard.




