Modeling digital twin data and architecture: a building guide with FIWARE as enabling technology

A real-time parking occupancy prediction system for Málaga's public parking facilities, built using FIWARE components, Apache Spark ML, and Kubernetes. This project leverages open data from Málaga City Council to train machine learning models and provide occupancy predictions through a web interface.

If you use or base your work on this project, please cite the following article:

@ARTICLE{9346030,
  author={Conde, Javier and Munoz-Arcentales, Andrés and Alonso, Álvaro and López-Pernas, Sonsoles and Salvachúa, Joaquín},
  journal={IEEE Internet Computing},
  title={Modeling Digital Twin Data and Architecture: A Building Guide With FIWARE as Enabling Technology},
  year={2022},
  volume={26},
  number={3},
  pages={7-14},
  keywords={Data models;Computer architecture;Digital twins;Market research;Proposals;Ecosystems;Computer architecture},
  doi={10.1109/MIC.2021.3056923}
}

Overview

This application predicts the occupancy levels of public parking facilities in Málaga, Spain. It demonstrates a complete data pipeline from ingestion to real-time prediction:

Data Ingestion: Fetches real-time parking data from Málaga's Open Data portal
Context Management: Uses FIWARE Orion Context Broker for managing context information
Data Persistence: Stores historical data in MongoDB with Draco (FIWARE's data persistence component)
Machine Learning: Trains Random Forest models using Apache Spark MLlib
Real-time Predictions: Provides instant occupancy predictions via a web interface
Orchestration: Fully containerized and deployed on Kubernetes

Supported Parking Facilities

Salitre (435 spots)
Av. de Andalucía (613 spots)
Cervantes (409 spots)
El Palo (127 spots)
Camas (350 spots)
Alcazaba (378 spots)
Tejón y Rodriguez (187 spots)
Cruz De Humilladero (217 spots)
San Juan De La Cruz (624 spots)
Pz. de la Marina (430 spots)

🏗️ Architecture

┌─────────────────┐
│   User Browser  │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│                     Kubernetes Cluster                       │
│                                                               │
│  ┌──────────────┐        ┌──────────────┐                   │
│  │  Web UI      │◄──────►│    Orion     │                   │
│  │  (Node.js)   │        │Context Broker│                   │
│  └──────────────┘        └──────┬───────┘                   │
│         │                       │                            │
│         │                       ▼                            │
│         │              ┌─────────────────┐                  │
│         │              │     Draco       │                  │
│         │              │  (Historical)   │                  │
│         │              └────────┬────────┘                  │
│         │                       │                            │
│         ▼                       ▼                            │
│  ┌──────────────────────────────────────┐                  │
│  │         MongoDB (Replica Set)        │                  │
│  │    mongodb-0, mongodb-1, mongodb-2   │                  │
│  └────────────┬─────────────────────────┘                  │
│               │                                              │
│               ▼                                              │
│  ┌──────────────────────────────────────┐                  │
│  │        Apache Spark Jobs             │                  │
│  │  ┌──────────┐      ┌──────────┐     │                  │
│  │  │  Train   │      │ Predict  │     │                  │
│  │  │  Model   │      │ (Stream) │     │                  │
│  │  └──────────┘      └─────┬────┘     │                  │
│  └────────────────────────────┼─────────┘                  │
│                               │                             │
│                               ▼                             │
│                    ┌──────────────────┐                    │
│                    │   Orion Update   │                    │
│                    │  (Predictions)   │                    │
│                    └──────────────────┘                    │
│                                                              │
│  ┌──────────────────────────────────────┐                  │
│  │        Data Sink Job (Cron)          │                  │
│  │   Fetches data from OpenData API     │                  │
│  └──────────────────────────────────────┘                  │
└─────────────────────────────────────────────────────────────┘
         ▲
         │
         │
┌────────┴────────────┐
│  Málaga OpenData    │
│   Public API        │
└─────────────────────┘

Technologies

Core Technologies

FIWARE Orion Context Broker: Context information management (NGSI-v2)
FIWARE Draco: Historical data persistence
Apache Spark 3.1.2: Distributed data processing and machine learning
MongoDB 3.6: Document database with replica set configuration
Kubernetes: Container orchestration platform
Minikube: Local Kubernetes cluster

Application Stack

Backend: Node.js (Express.js)
Frontend: HTML5, JavaScript, Socket.io, Bulma CSS
ML Framework: Spark MLlib (Random Forest Classifier)
Data Ingestion: Python (pycurl, pymongo)
Build Tools: SBT (Scala Build Tool), Docker

Libraries & Dependencies

Node.js: express, socket.io, mongoose, node-fetch, cross-fetch
Python: pycurl, pymongo
Scala: spark-core, spark-sql, spark-mllib, mongo-spark-connector

Prerequisites

Before you begin, ensure you have the following installed:

Minikube (v1.20+)
kubectl (v1.20+)
Docker (v20.10+)
Git
curl
bash/zsh shell

System Requirements

RAM: Minimum 8GB (16GB recommended)
CPU: 4 cores minimum
Disk Space: 20GB free space
OS: macOS, Linux, or Windows with WSL2

Minikube Configuration

Start Minikube with adequate resources:

minikube start --cpus=4 --memory=8192 --disk-size=20g

Enable required addons:

minikube addons enable ingress
minikube addons enable storage-provisioner

🚀 Quick Start

1. Clone the Repository

git clone <repository-url>
cd fiware_helloWorld

2. Deploy the Complete Stack

The entire application can be deployed with a single script:

chmod +x create_cluster.sh
./create_cluster.sh

This script will:

Create the tfm namespace
Deploy MongoDB replica set (3 nodes)
Configure MongoDB replication
Deploy MongoDB Express UI
Deploy Orion Context Broker
Deploy Draco for data persistence
Create Spark service accounts and volumes
Deploy the prediction web application
Run the data sink job to populate initial data
Create FIWARE entities and subscriptions
Submit Spark prediction job

Note: The deployment process takes approximately 3-5 minutes.

3. Access the Application

Once deployed, get the service URLs:

minikube service list

Access the web interface:

minikube service web-service -n tfm

Or access via Ingress (if configured):

echo "http://$(minikube ip)"

📁 Project Structure

fiware_helloWorld/
├── create_cluster.sh              # Main deployment script
├── .env                          # Environment variables
│
├── kubernetes/                   # Kubernetes manifests
│   ├── draco-deployment.yaml     # Draco deployment
│   ├── draco-service.yaml        # Draco service
│   ├── mongodb-statefulSet.yaml  # MongoDB stateful set (3 replicas)
│   ├── mongodb-sc.yaml           # MongoDB storage class
│   ├── mongodb-hservice.yaml     # MongoDB headless service
│   ├── mongodb-express.yaml      # MongoDB web UI
│   ├── orion-deployment.yaml     # Orion Context Broker deployment
│   ├── orion-service.yaml        # Orion service
│   ├── prediction-web-deployment.yaml  # Web UI deployment
│   ├── spark-pv.yaml             # Spark persistent volume
│   ├── spark-pvc.yaml            # Spark persistent volume claim
│   ├── spark-hservice.yaml       # Spark headless service
│   ├── jupyterlab-*.yaml         # JupyterLab components (optional)
│   ├── minikube-ingress.yaml     # Ingress configuration
│   └── Jobs/
│       └── sink-job.yaml         # Data ingestion job
│
├── prediction-web/               # Web application
│   ├── app.js                    # Express server with Socket.io
│   ├── package.json              # Node.js dependencies
│   ├── Dockerfile                # Container image
│   ├── build-image.sh            # Image build script
│   ├── public/                   # Static files
│   │   ├── index.html            # Main UI
│   │   ├── predictions.html      # Predictions view
│   │   ├── predict.js            # Client-side logic
│   │   └── *.css, *.js           # Assets
│   └── entities/                 # FIWARE entity scripts
│       ├── createPredictionEntities.sh
│       ├── curlEntities.sh
│       ├── subscribeReqPredictionTicket.sh
│       ├── subscribeResPredictionTicket.sh
│       └── subscribeResDracoPredictionTicket.sh
│
├── spark-job/                    # Spark ML jobs
│   ├── build.sbt                 # SBT build configuration
│   ├── Train.scala               # Model training job
│   ├── Prediction.scala          # Real-time prediction job
│   ├── spark-submit-train.sh     # Training job submission
│   ├── spark-submit-predict.sh   # Prediction job submission
│   ├── spark-create-submit-train.sh
│   └── spark-create-submit-predict.sh
│
├── data-sink-job/                # Data ingestion
│   ├── update-db.py              # Python script to fetch OpenData
│   ├── requirements.txt          # Python dependencies
│   └── Dockerfile                # Container image
│
├── spark-notebook/               # JupyterLab environment
│   ├── Dockerfile
│   └── build_image.sh
│
└── statefulset/                  # MongoDB configuration
    ├── commands.sh
    └── mongodb-rsconfig.sh       # Replica set initialization

🧩 Components

1. Orion Context Broker

Purpose: FIWARE Generic Enabler for managing context information using NGSI-v2 API.

Configuration:

Port: 1026
Version: 2.3.0
Database: MongoDB replica set
Log Level: DEBUG
HTTP Timeout: 15000ms

Entities:

Parking occupancy entities (current state)
Prediction request entities
Prediction response entities

2. Draco

Purpose: FIWARE data persistence component for storing historical context information.

Features:

Subscribes to Orion Context Broker changes
Stores historical data in MongoDB
Enables time-series analysis

3. MongoDB Replica Set

Purpose: Distributed database for storing parking data and predictions.

Configuration:

Replica Set: MainRepSet
Nodes: 3 (mongodb-0, mongodb-1, mongodb-2)
Port: 27017
Storage: Persistent volumes with StorageClass

Collections:

parking: Historical parking occupancy data
sth_test: Real-time predictions and requests

4. Spark ML Jobs

Training Job (`Train.scala`)

Purpose: Trains a Random Forest classification model to predict parking occupancy levels.

Features:

Reads historical data from MongoDB
Feature engineering: weekday, hour, month
Random Forest with 200 trees
80/20 train-test split
Saves model to persistent volume

Model Parameters:

Algorithm: Random Forest Classifier
Number of trees: 200
Feature subset strategy: log2
Input features: parking name, weekday, hour, month
Output: Occupancy level (0-10 scale)

Prediction Job (`Prediction.scala`)

Purpose: Real-time streaming job that receives prediction requests and returns results.

Features:

Listens for NGSI notifications from Orion (port 9001)
Loads pre-trained model from persistent volume
Processes prediction requests in real-time
Sends results back to Orion Context Broker

Flow:

Receives entity update from Orion
Extracts features (name, weekday, hour)
Applies ML pipeline and model
Sends prediction back to Orion
Web UI receives notification via subscription

5. Web Application

Purpose: User interface for requesting and displaying parking occupancy predictions.

Technologies:

Express.js server
Socket.io for real-time communication
Mongoose for MongoDB integration
Bulma CSS framework

Features:

Interactive parking selection
Date and time picker
Real-time prediction results
Historical predictions view
WebSocket-based updates

Ports:

Internal: 3000
NodePort: 30003

6. Data Sink Job

Purpose: Kubernetes CronJob that fetches real parking data from Málaga OpenData API.

Configuration:

Source: https://datosabiertos.malaga.eu/recursos/aparcamientos/ocupappublicosmun/ocupappublicosmunfiware.json
Destination: MongoDB collection parking
Adds timestamp metadata to each record

Schedule: Can be configured as a CronJob for periodic updates.

🔄 Data Flow

Training Phase

┌─────────────────┐
│  Data Sink Job  │
│   (Python)      │
└────────┬────────┘
         │
         │ Fetches parking data
         ▼
┌──────────────────────┐
│   Málaga OpenData    │
│   Public API         │
└──────────┬───────────┘
           │
           │ Stores JSON data
           ▼
    ┌─────────────┐
    │   MongoDB   │
    │  (parking)  │
    └──────┬──────┘
           │
           │ Reads historical data
           ▼
    ┌─────────────┐
    │ Spark Train │
    │    Job      │
    └──────┬──────┘
           │
           │ Saves model
           ▼
    ┌─────────────┐
    │ Persistent  │
    │   Volume    │
    └─────────────┘

Prediction Phase

┌────────────┐
│ User Browser│
└──────┬─────┘
       │ Enters prediction request
       ▼
┌─────────────────┐
│   Web UI        │
│  (Socket.io)    │
└─────┬───────────┘
      │
      │ PATCH request
      ▼
┌────────────────┐
│ Orion Context  │◄────┐
│    Broker      │     │
└─────┬──────────┘     │
      │                │
      │ Notification   │ PATCH prediction result
      ▼                │
┌────────────────┐     │
│ Spark Predict  │─────┘
│  (Streaming)   │
└────────────────┘
      │
      │ Model inference
      ▼
┌────────────────┐
│ ML Model       │
│ (Random Forest)│
└────────────────┘
      │
      │ Prediction sent to Web UI
      ▼
┌────────────────┐
│ User receives  │
│   prediction   │
└────────────────┘

Data Persistence Flow

┌────────────────┐
│ Orion Context  │
│    Broker      │
└─────┬──────────┘
      │
      │ Subscription
      ▼
┌────────────────┐
│     Draco      │
└─────┬──────────┘
      │
      │ Stores historical data
      ▼
┌────────────────┐
│    MongoDB     │
│  (sth_test)    │
└────────────────┘

📊 FAIR Principles

This project implements the FAIR Data Principles (Findable, Accessible, Interoperable, and Reusable) to ensure effective data management and maximize the value of the parking occupancy digital twin.

F - Findable

Data and metadata are easy to find for both humans and computers.

✅ Implementation:

Standardized Entity Identification
- Each parking facility has a unique identifier in the FIWARE Orion Context Broker
- Entities follow FIWARE Smart Data Models conventions
- Consistent naming scheme across all parking facilities
Searchable APIs
- RESTful API endpoints for querying entities: GET /v2/entities
- Filter capabilities by entity type, attributes, and metadata
- MongoDB indexes for efficient historical data retrieval
Metadata Documentation
- Entity attributes include type information and metadata
- Timestamps (dateObserved, dateCreated, dateModified) for all records
- Comprehensive API documentation in this README
Data Catalog
- All parking facilities catalogued with name, location, and capacity
- MongoDB collections organized by data type and purpose
- Version-controlled deployment configurations

A - Accessible

Data can be accessed through standardized protocols and remains available over time.

✅ Implementation:

Open Standards & Protocols
- NGSI-v2: Standard context information management protocol
- HTTP/REST: Universal web protocols for all API interactions
- WebSocket (Socket.io): Real-time bidirectional communication
- MongoDB Wire Protocol: Standard database access
Public Data Source
- Data sourced from Málaga City Council's Open Data portal
- Publicly accessible endpoint: datosabiertos.malaga.eu
- No authentication required for public endpoints
Persistent Storage
- Historical data stored in MongoDB replica set (3 nodes)
- Kubernetes Persistent Volumes ensure data durability
- Daily backups possible through MongoDB snapshot mechanisms
Service Availability
- High availability through Kubernetes orchestration
- MongoDB replica set provides automatic failover
- Load balancing and service discovery via Kubernetes services
Multiple Access Patterns
- Web UI for end users
- RESTful APIs for programmatic access
- MongoDB direct access for analytics
- Socket.io for real-time updates

I - Interoperable

Data can be integrated with other data and work with different applications.

✅ Implementation:

Standard Data Formats
- JSON: Universal data interchange format for all APIs
- NGSI-v2: FIWARE standard for context information
- BSON: MongoDB native format, JSON-compatible
- Parquet: Considered for Spark data processing
FIWARE Ecosystem Integration
- Compatible with all FIWARE Generic Enablers
- Orion Context Broker as the central integration point
- Draco for seamless data persistence
- Can integrate with FIWARE IoT Agents, CEP, Big Data components
Standardized Vocabulary
- Follows FIWARE Smart Data Models
- Attributes use common semantic conventions:
  - name, location, capacity, availableSpots, totalSpots
  - Temporal attributes: dateObserved, dateCreated, dateModified
- Occupancy levels normalized to 0-10 scale
Platform-Independent Architecture
- Containerized components run on any Kubernetes cluster
- Docker images portable across environments
- Cloud-agnostic design (works on AWS, Azure, GCP, on-premise)
Open APIs
- RESTful endpoints follow OpenAPI/Swagger conventions
- WebSocket protocol for real-time communication
- No proprietary protocols or vendor lock-in
Data Exchange Capabilities
- NGSI subscriptions for event-driven integration
- HTTP callbacks for asynchronous notifications
- Batch data export via MongoDB queries
- Real-time streaming through Socket.io

R - Reusable

Data and models are well-described and can be replicated and combined in different settings.

✅ Implementation:

Comprehensive Documentation
- Detailed README with architecture diagrams
- Inline code comments in all components
- API endpoint documentation with examples
- Step-by-step deployment instructions
Automation & Reproducibility
- One-command deployment script: ./create_cluster.sh
- Infrastructure as Code (Kubernetes manifests)
- Dockerfiles for reproducible container builds
- Version-pinned dependencies in all package managers
Open Source Components
- FIWARE Orion (Apache 2.0)
- Apache Spark (Apache 2.0)
- MongoDB (Server Side Public License)
- Node.js ecosystem (MIT/Apache licenses)
Modular Architecture
- Loosely coupled microservices
- Components can be replaced or upgraded independently
- Clear separation of concerns:
  - Data ingestion (Python)
  - Context management (Orion)
  - Persistence (MongoDB, Draco)
  - ML processing (Spark)
  - Presentation (Node.js)
Model Reusability
- ML model saved in Spark's native format
- Feature engineering pipeline persisted separately
- Model can be exported for use in other frameworks
- Training code documented and parameterized
Extensibility
- Add new parking facilities by updating entity configurations
- Extend features by modifying Spark ML pipeline
- Integrate additional data sources through Orion subscriptions
- Plugin architecture for new visualizations
Clear Licensing
- Project licensed for academic and research use
- Third-party dependencies properly attributed
- Citation information provided at the top of README
Version Control
- All code version-controlled in Git
- Docker images tagged with versions
- Component versions explicitly specified in manifests

FAIR Impact

By implementing FAIR principles, this digital twin system:

Enables collaboration across institutions and research groups
Facilitates reproducibility of experiments and results
Promotes innovation by allowing others to build upon this work
Ensures long-term value of the data and infrastructure
Supports open science and transparent research practices
Allows integration with broader smart city initiatives

Future FAIR Enhancements

Potential improvements to strengthen FAIR compliance:

Persistent Identifiers: Add DOIs for datasets and model versions
Rich Metadata: Implement schema.org or DCAT metadata
Data Provenance: Record complete lineage of predictions
FAIR Metrics: Regular assessment using FAIR maturity indicators
Data Catalog: Deploy CKAN or similar catalog software
Semantic Web: Add RDF/Linked Data capabilities
License Headers: Add SPDX license identifiers to all files

🤖 Machine Learning Model

Model Architecture

Algorithm: Random Forest Classifier

Why Random Forest?

Handles non-linear relationships
Robust to outliers
Provides feature importance
Good accuracy for categorical predictions
No need for feature scaling

Features

Feature	Type	Description	Example
`name`	Categorical	Parking facility name	"Salitre"
`weekday`	Categorical	Day of week (1-7)	3 (Tuesday)
`hour`	Numerical	Hour of day (0-23)	14
`month`	Numerical	Month (1-12)	6 (June)

Feature Engineering

String Indexing: Converts categorical features to numerical indices
One-Hot Encoding: Converts indices to binary vectors
Vector Assembly: Combines all features into a single feature vector

Target Variable

Occupancy Level: Discretized into 10 levels (0-10)

Calculation:

available_ratio = availableSpots / totalSpots
occupation_percentage = (1 - available_ratio) * 100
occupation_level = round(occupation_percentage / 10)

0: 0-10% occupied (almost empty)
5: 50-60% occupied (half full)
10: 90-100% occupied (nearly full)

Model Performance

The model is evaluated using:

Metric: Multiclass Classification Accuracy
Train-Test Split: 80/20
Validation: Hold-out validation

Typical accuracy: ~75-85% (varies based on training data volume)

Model Persistence

Models are saved to a Kubernetes Persistent Volume:

Location: /opt/spark/work-dir/models/
Format: Spark ML native format
Components:
- model/: Trained Random Forest model
- pipeline/: Feature engineering pipeline

🔌 API Endpoints

Web Application API

POST /notify

Receives notifications from Orion Context Broker with prediction results.

Request Body:

{
  "data": [
    {
      "socketId": {"value": "socket-123"},
      "predictionId": {"value": "pred-456"},
      "predictionValue": {"value": 7},
      "name": {"value": "Salitre"},
      "weekday": {"value": 3},
      "time": {"value": 14}
    }
  ]
}

Response: 200 OK

GET /predictions.html

Displays historical predictions.

Static Files

GET /: Main prediction interface
GET /predictions.html: Historical predictions

Orion Context Broker API

Base URL: http://<minikube-ip>:30329/v2

Get All Entities

curl http://<minikube-ip>:30329/v2/entities

Get Specific Entity

curl http://<minikube-ip>:30329/v2/entities/<entity-id>

Update Entity

curl -X PATCH http://<minikube-ip>:30329/v2/entities/<entity-id>/attrs \
  -H "Content-Type: application/json" \
  -d '{
    "name": {"value": "Salitre", "type": "String"},
    "weekday": {"value": 3, "type": "Integer"},
    "time": {"value": 14, "type": "Integer"}
  }'

Socket.io Events

Client → Server

Event: predict

Payload:

{
  name: "Salitre",
  year: 2026,
  month: 2,
  day: 21,
  weekday: 5,
  time: 14,
  predictionId: "unique-id"
}

Server → Client

Event: messages

Types:

CONFIRMATION

{
  type: "CONFIRMATION",
  payload: {
    msg: "Your request is being processed"
  }
}

PREDICTION

{
  type: "PREDICTION",
  payload: {
    socketId: "socket-123",
    name: "Salitre",
    weekday: 3,
    time: 14,
    predictionId: "pred-456",
    predictionValue: 7
  }
}

ERROR

{
  type: "ERROR",
  payload: {
    msg: "There has been a problem with your request"
  }
}

⚙️ Configuration

Environment Variables

Create or modify .env file:

# Orion Context Broker
ORION_PORT=1026
ORION_VERSION=2.3.0

# MongoDB
MONGO_DB_PORT=27017
MONGO_DB_VERSION=3.6

# Web Client
WEB_CLIENT_PORT=3000
CONTEXT_BROKER=http://orion:1026/v2
NGSI_VERSION=ngsi-v2

Kubernetes Resources

MongoDB StatefulSet

replicas: 3
storageClass: mongodb-sc
volumeClaimTemplate: 1Gi per replica

Orion Deployment

replicas: 1
args:
  - "-dbhost": MongoDB replica set connection string
  - "-rplSet": MainRepSet
  - "-logLevel": DEBUG
  - "-httpTimeout": 15000

Spark Jobs

Resource Requests (adjust based on your cluster):

spark.executor.instances: 1
spark.executor.memory: 2g
spark.driver.memory: 2g

Scaling

Scale MongoDB Replicas

kubectl scale statefulset mongodb --replicas=5 -n tfm

Scale Orion

kubectl scale deployment orion --replicas=3 -n tfm

Scale Web Application

kubectl scale deployment web-deployment --replicas=2 -n tfm

📖 Usage

Making a Prediction

Access the Web Interface:
```
minikube service web-service -n tfm
```
Select Parameters:
- Choose a parking facility from the dropdown
- Select a date using the calendar
- Pick an hour slot (0-23)
Submit Request:
- Click the submit button
- Wait for confirmation message
- Prediction result appears in seconds
Interpret Results:
- Prediction value: 0-10 scale
- 0-3: Low occupancy (easy to find parking)
- 4-7: Medium occupancy (some spots available)
- 8-10: High occupancy (difficult to find parking)

Training a New Model

Ensure sufficient data in MongoDB:

kubectl exec -it mongodb-0 -n tfm -- mongo
> use tfm
> db.parking.count()

Submit training job:
```
cd spark-job
./spark-submit-train.sh
```
Monitor training:
```
kubectl logs -f -n tfm -l job=training
```

Verify model saved to persistent volume:

kubectl exec -it <spark-driver-pod> -n tfm -- ls /opt/spark/work-dir/models/

Restart prediction job to use new model:

kubectl delete pod -n tfm -l job=prediction
./spark-submit-predict.sh

Viewing Historical Data

# Access MongoDB
kubectl exec -it mongodb-0 -n tfm -- mongo

# Query parking data
> use tfm
> db.parking.find().limit(10).pretty()

# Count documents
> db.parking.count()

# Query predictions
> use sth_test
> db.Pred.find().limit(10).pretty()

Monitoring Logs

# Orion logs
kubectl logs -f deployment/orion -n tfm

# Web application logs
kubectl logs -f deployment/web-deployment -n tfm

# Spark prediction logs
kubectl logs -f -n tfm -l job=prediction

# Draco logs
kubectl logs -f deployment/draco -n tfm

🔧 Troubleshooting

Issue: MongoDB pods not starting

Symptoms: Pods stuck in Pending or CrashLoopBackOff

Solutions:

# Check PVC status
kubectl get pvc -n tfm

# Check storage class
kubectl get sc

# Ensure sufficient disk space
minikube ssh "df -h"

# Delete and recreate
kubectl delete statefulset mongodb -n tfm
kubectl delete pvc -n tfm -l app=mongodb
kubectl apply -f kubernetes/mongodb-sc.yaml
kubectl apply -f kubernetes/mongodb-statefulSet.yaml

Issue: Orion cannot connect to MongoDB

Symptoms: Orion logs show connection errors

Solutions:

# Verify MongoDB replica set status
kubectl exec -it mongodb-0 -n tfm -- mongo --eval "rs.status()"

# Check MongoDB service
kubectl get svc mongodb-svc -n tfm

# Verify replica set configuration
sh statefulset/mongodb-rsconfig.sh mongodb-0

Issue: Spark job fails to submit

Symptoms: Error during spark-submit

Solutions:

# Verify service account
kubectl get serviceaccount spark -n tfm

# Check cluster role binding
kubectl get clusterrolebinding spark-role

# Ensure Spark image is accessible
kubectl describe pod <spark-driver-pod> -n tfm

# Check Spark PVC
kubectl get pvc spark-pvc -n tfm

Issue: Predictions not returning

Symptoms: Web UI shows "processing" but no result

Solutions:

# Check Spark prediction job is running
kubectl get pods -n tfm -l job=prediction

# Verify Orion subscriptions
curl http://$(minikube ip):30329/v2/subscriptions

# Check logs
kubectl logs -f deployment/web-deployment -n tfm
kubectl logs -f -n tfm -l job=prediction

# Verify network connectivity
kubectl exec -it <web-pod> -n tfm -- curl http://orion:1026/version

Issue: Data sink job fails

Symptoms: No data in MongoDB parking collection

Solutions:

# Check job status
kubectl get jobs -n tfm

# View job logs
kubectl logs -n tfm job/sink-job

# Verify network access to OpenData API
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -n tfm -- \
  curl https://datosabiertos.malaga.eu/recursos/aparcamientos/ocupappublicosmun/ocupappublicosmunfiware.json

# Manually run job
kubectl delete job sink-job -n tfm
kubectl apply -f kubernetes/Jobs/sink-job.yaml

Issue: Out of memory errors

Symptoms: Pods OOMKilled

Solutions:

# Increase Minikube memory
minikube stop
minikube start --cpus=4 --memory=16384

# Adjust Spark memory settings in spark-submit scripts
# Edit spark-submit-*.sh files:
--conf "spark.executor.memory=4g"
--conf "spark.driver.memory=4g"

Issue: Cannot access services

Symptoms: minikube service command hangs or fails

Solutions:

# Check Minikube status
minikube status

# Verify tunnel
minikube tunnel

# Use port-forward as alternative
kubectl port-forward -n tfm deployment/web-deployment 3000:3000

# Check NodePort services
kubectl get svc -n tfm

Complete Reset

If all else fails, completely reset the deployment:

# Delete namespace
kubectl delete namespace tfm

# Clean up persistent volumes
kubectl delete pv --all

# Restart Minikube
minikube stop
minikube delete
minikube start --cpus=4 --memory=8192

# Redeploy
./create_cluster.sh

🤝 Contributing

Contributions are welcome! Here are some ways you can contribute:

Areas for Improvement

Model Enhancement:
- Add more features (weather, events, holidays)
- Try different algorithms (XGBoost, Neural Networks)
- Implement hyperparameter tuning
- Add model versioning
Data Pipeline:
- Implement real-time data ingestion
- Add data validation and cleaning
- Create data quality dashboard
- Implement incremental model training
Infrastructure:
- Add Helm charts
- Implement GitOps with ArgoCD
- Add monitoring (Prometheus, Grafana)
- Implement service mesh (Istio)
Application:
- Add user authentication
- Improve UI/UX
- Add mobile support
- Implement caching layer (Redis)
Testing:
- Unit tests for all components
- Integration tests
- Load testing
- CI/CD pipeline

Contribution Process

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is part of a Master's Thesis at Universidad Politécnica de Madrid (UPM).

🙏 Acknowledgments

Málaga City Council for providing open data
FIWARE Foundation for the open-source components
Apache Spark community
Universidad Politécnica de Madrid

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data-sink-job		data-sink-job
kubernetes		kubernetes
prediction-web		prediction-web
spark-job		spark-job
spark-notebook		spark-notebook
statefulset		statefulset
.DS_Store		.DS_Store
.env		.env
.gitignore		.gitignore
README.md		README.md
create_cluster.sh		create_cluster.sh

Folders and files

Latest commit

History

Repository files navigation

Modeling digital twin data and architecture: a building guide with FIWARE as enabling technology

📑 Table of Contents

Overview

Supported Parking Facilities

🏗️ Architecture

Technologies

Core Technologies

Application Stack

Libraries & Dependencies

Prerequisites

System Requirements

Minikube Configuration

🚀 Quick Start

1. Clone the Repository

2. Deploy the Complete Stack

3. Access the Application

📁 Project Structure

🧩 Components

1. Orion Context Broker

2. Draco

3. MongoDB Replica Set

4. Spark ML Jobs

Training Job (Train.scala)

Prediction Job (Prediction.scala)

5. Web Application

6. Data Sink Job

🔄 Data Flow

Training Phase

Prediction Phase

Data Persistence Flow

📊 FAIR Principles

F - Findable

A - Accessible

I - Interoperable

R - Reusable

FAIR Impact

Future FAIR Enhancements

🤖 Machine Learning Model

Model Architecture

Features

Feature Engineering

Target Variable

Model Performance

Model Persistence

🔌 API Endpoints

Web Application API

POST /notify

GET /predictions.html

Static Files

Orion Context Broker API

Get All Entities

Get Specific Entity

Update Entity

Socket.io Events

Client → Server

Server → Client

⚙️ Configuration

Environment Variables

Kubernetes Resources

MongoDB StatefulSet

Orion Deployment

Spark Jobs

Scaling

Scale MongoDB Replicas

Scale Orion

Scale Web Application

📖 Usage

Making a Prediction

Training a New Model

Viewing Historical Data

Monitoring Logs

🔧 Troubleshooting

Issue: MongoDB pods not starting

Issue: Orion cannot connect to MongoDB

Issue: Spark job fails to submit

Issue: Predictions not returning

Training Job (`Train.scala`)

Prediction Job (`Prediction.scala`)

Packages