A real-time parking occupancy prediction system for Málaga's public parking facilities, built using FIWARE components, Apache Spark ML, and Kubernetes. This project leverages open data from Málaga City Council to train machine learning models and provide occupancy predictions through a web interface.
If you use or base your work on this project, please cite the following article:
@ARTICLE{9346030,
author={Conde, Javier and Munoz-Arcentales, Andrés and Alonso, Álvaro and López-Pernas, Sonsoles and Salvachúa, Joaquín},
journal={IEEE Internet Computing},
title={Modeling Digital Twin Data and Architecture: A Building Guide With FIWARE as Enabling Technology},
year={2022},
volume={26},
number={3},
pages={7-14},
keywords={Data models;Computer architecture;Digital twins;Market research;Proposals;Ecosystems;Computer architecture},
doi={10.1109/MIC.2021.3056923}
}
- Overview
- Architecture
- Technologies
- Prerequisites
- Quick Start
- Project Structure
- Components
- Data Flow
- FAIR Principles
- Machine Learning Model
- API Endpoints
- Configuration
- Usage
- Troubleshooting
- Contributing
This application predicts the occupancy levels of public parking facilities in Málaga, Spain. It demonstrates a complete data pipeline from ingestion to real-time prediction:
- Data Ingestion: Fetches real-time parking data from Málaga's Open Data portal
- Context Management: Uses FIWARE Orion Context Broker for managing context information
- Data Persistence: Stores historical data in MongoDB with Draco (FIWARE's data persistence component)
- Machine Learning: Trains Random Forest models using Apache Spark MLlib
- Real-time Predictions: Provides instant occupancy predictions via a web interface
- Orchestration: Fully containerized and deployed on Kubernetes
- Salitre (435 spots)
- Av. de Andalucía (613 spots)
- Cervantes (409 spots)
- El Palo (127 spots)
- Camas (350 spots)
- Alcazaba (378 spots)
- Tejón y Rodriguez (187 spots)
- Cruz De Humilladero (217 spots)
- San Juan De La Cruz (624 spots)
- Pz. de la Marina (430 spots)
┌─────────────────┐
│ User Browser │
└────────┬────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Web UI │◄──────►│ Orion │ │
│ │ (Node.js) │ │Context Broker│ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────────────┐ │
│ │ │ Draco │ │
│ │ │ (Historical) │ │
│ │ └────────┬────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ MongoDB (Replica Set) │ │
│ │ mongodb-0, mongodb-1, mongodb-2 │ │
│ └────────────┬─────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ Apache Spark Jobs │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Train │ │ Predict │ │ │
│ │ │ Model │ │ (Stream) │ │ │
│ │ └──────────┘ └─────┬────┘ │ │
│ └────────────────────────────┼─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Orion Update │ │
│ │ (Predictions) │ │
│ └──────────────────┘ │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ Data Sink Job (Cron) │ │
│ │ Fetches data from OpenData API │ │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲
│
│
┌────────┴────────────┐
│ Málaga OpenData │
│ Public API │
└─────────────────────┘
- FIWARE Orion Context Broker: Context information management (NGSI-v2)
- FIWARE Draco: Historical data persistence
- Apache Spark 3.1.2: Distributed data processing and machine learning
- MongoDB 3.6: Document database with replica set configuration
- Kubernetes: Container orchestration platform
- Minikube: Local Kubernetes cluster
- Backend: Node.js (Express.js)
- Frontend: HTML5, JavaScript, Socket.io, Bulma CSS
- ML Framework: Spark MLlib (Random Forest Classifier)
- Data Ingestion: Python (pycurl, pymongo)
- Build Tools: SBT (Scala Build Tool), Docker
- Node.js: express, socket.io, mongoose, node-fetch, cross-fetch
- Python: pycurl, pymongo
- Scala: spark-core, spark-sql, spark-mllib, mongo-spark-connector
Before you begin, ensure you have the following installed:
- Minikube (v1.20+)
- kubectl (v1.20+)
- Docker (v20.10+)
- Git
- curl
- bash/zsh shell
- RAM: Minimum 8GB (16GB recommended)
- CPU: 4 cores minimum
- Disk Space: 20GB free space
- OS: macOS, Linux, or Windows with WSL2
Start Minikube with adequate resources:
minikube start --cpus=4 --memory=8192 --disk-size=20gEnable required addons:
minikube addons enable ingress
minikube addons enable storage-provisionergit clone <repository-url>
cd fiware_helloWorldThe entire application can be deployed with a single script:
chmod +x create_cluster.sh
./create_cluster.shThis script will:
- Create the
tfmnamespace - Deploy MongoDB replica set (3 nodes)
- Configure MongoDB replication
- Deploy MongoDB Express UI
- Deploy Orion Context Broker
- Deploy Draco for data persistence
- Create Spark service accounts and volumes
- Deploy the prediction web application
- Run the data sink job to populate initial data
- Create FIWARE entities and subscriptions
- Submit Spark prediction job
Note: The deployment process takes approximately 3-5 minutes.
Once deployed, get the service URLs:
minikube service listAccess the web interface:
minikube service web-service -n tfmOr access via Ingress (if configured):
echo "http://$(minikube ip)"fiware_helloWorld/
├── create_cluster.sh # Main deployment script
├── .env # Environment variables
│
├── kubernetes/ # Kubernetes manifests
│ ├── draco-deployment.yaml # Draco deployment
│ ├── draco-service.yaml # Draco service
│ ├── mongodb-statefulSet.yaml # MongoDB stateful set (3 replicas)
│ ├── mongodb-sc.yaml # MongoDB storage class
│ ├── mongodb-hservice.yaml # MongoDB headless service
│ ├── mongodb-express.yaml # MongoDB web UI
│ ├── orion-deployment.yaml # Orion Context Broker deployment
│ ├── orion-service.yaml # Orion service
│ ├── prediction-web-deployment.yaml # Web UI deployment
│ ├── spark-pv.yaml # Spark persistent volume
│ ├── spark-pvc.yaml # Spark persistent volume claim
│ ├── spark-hservice.yaml # Spark headless service
│ ├── jupyterlab-*.yaml # JupyterLab components (optional)
│ ├── minikube-ingress.yaml # Ingress configuration
│ └── Jobs/
│ └── sink-job.yaml # Data ingestion job
│
├── prediction-web/ # Web application
│ ├── app.js # Express server with Socket.io
│ ├── package.json # Node.js dependencies
│ ├── Dockerfile # Container image
│ ├── build-image.sh # Image build script
│ ├── public/ # Static files
│ │ ├── index.html # Main UI
│ │ ├── predictions.html # Predictions view
│ │ ├── predict.js # Client-side logic
│ │ └── *.css, *.js # Assets
│ └── entities/ # FIWARE entity scripts
│ ├── createPredictionEntities.sh
│ ├── curlEntities.sh
│ ├── subscribeReqPredictionTicket.sh
│ ├── subscribeResPredictionTicket.sh
│ └── subscribeResDracoPredictionTicket.sh
│
├── spark-job/ # Spark ML jobs
│ ├── build.sbt # SBT build configuration
│ ├── Train.scala # Model training job
│ ├── Prediction.scala # Real-time prediction job
│ ├── spark-submit-train.sh # Training job submission
│ ├── spark-submit-predict.sh # Prediction job submission
│ ├── spark-create-submit-train.sh
│ └── spark-create-submit-predict.sh
│
├── data-sink-job/ # Data ingestion
│ ├── update-db.py # Python script to fetch OpenData
│ ├── requirements.txt # Python dependencies
│ └── Dockerfile # Container image
│
├── spark-notebook/ # JupyterLab environment
│ ├── Dockerfile
│ └── build_image.sh
│
└── statefulset/ # MongoDB configuration
├── commands.sh
└── mongodb-rsconfig.sh # Replica set initialization
Purpose: FIWARE Generic Enabler for managing context information using NGSI-v2 API.
Configuration:
- Port: 1026
- Version: 2.3.0
- Database: MongoDB replica set
- Log Level: DEBUG
- HTTP Timeout: 15000ms
Entities:
- Parking occupancy entities (current state)
- Prediction request entities
- Prediction response entities
Purpose: FIWARE data persistence component for storing historical context information.
Features:
- Subscribes to Orion Context Broker changes
- Stores historical data in MongoDB
- Enables time-series analysis
Purpose: Distributed database for storing parking data and predictions.
Configuration:
- Replica Set:
MainRepSet - Nodes: 3 (mongodb-0, mongodb-1, mongodb-2)
- Port: 27017
- Storage: Persistent volumes with StorageClass
Collections:
parking: Historical parking occupancy datasth_test: Real-time predictions and requests
Purpose: Trains a Random Forest classification model to predict parking occupancy levels.
Features:
- Reads historical data from MongoDB
- Feature engineering: weekday, hour, month
- Random Forest with 200 trees
- 80/20 train-test split
- Saves model to persistent volume
Model Parameters:
- Algorithm: Random Forest Classifier
- Number of trees: 200
- Feature subset strategy: log2
- Input features: parking name, weekday, hour, month
- Output: Occupancy level (0-10 scale)
Purpose: Real-time streaming job that receives prediction requests and returns results.
Features:
- Listens for NGSI notifications from Orion (port 9001)
- Loads pre-trained model from persistent volume
- Processes prediction requests in real-time
- Sends results back to Orion Context Broker
Flow:
- Receives entity update from Orion
- Extracts features (name, weekday, hour)
- Applies ML pipeline and model
- Sends prediction back to Orion
- Web UI receives notification via subscription
Purpose: User interface for requesting and displaying parking occupancy predictions.
Technologies:
- Express.js server
- Socket.io for real-time communication
- Mongoose for MongoDB integration
- Bulma CSS framework
Features:
- Interactive parking selection
- Date and time picker
- Real-time prediction results
- Historical predictions view
- WebSocket-based updates
Ports:
- Internal: 3000
- NodePort: 30003
Purpose: Kubernetes CronJob that fetches real parking data from Málaga OpenData API.
Configuration:
- Source:
https://datosabiertos.malaga.eu/recursos/aparcamientos/ocupappublicosmun/ocupappublicosmunfiware.json - Destination: MongoDB collection
parking - Adds timestamp metadata to each record
Schedule: Can be configured as a CronJob for periodic updates.
┌─────────────────┐
│ Data Sink Job │
│ (Python) │
└────────┬────────┘
│
│ Fetches parking data
▼
┌──────────────────────┐
│ Málaga OpenData │
│ Public API │
└──────────┬───────────┘
│
│ Stores JSON data
▼
┌─────────────┐
│ MongoDB │
│ (parking) │
└──────┬──────┘
│
│ Reads historical data
▼
┌─────────────┐
│ Spark Train │
│ Job │
└──────┬──────┘
│
│ Saves model
▼
┌─────────────┐
│ Persistent │
│ Volume │
└─────────────┘
┌────────────┐
│ User Browser│
└──────┬─────┘
│ Enters prediction request
▼
┌─────────────────┐
│ Web UI │
│ (Socket.io) │
└─────┬───────────┘
│
│ PATCH request
▼
┌────────────────┐
│ Orion Context │◄────┐
│ Broker │ │
└─────┬──────────┘ │
│ │
│ Notification │ PATCH prediction result
▼ │
┌────────────────┐ │
│ Spark Predict │─────┘
│ (Streaming) │
└────────────────┘
│
│ Model inference
▼
┌────────────────┐
│ ML Model │
│ (Random Forest)│
└────────────────┘
│
│ Prediction sent to Web UI
▼
┌────────────────┐
│ User receives │
│ prediction │
└────────────────┘
┌────────────────┐
│ Orion Context │
│ Broker │
└─────┬──────────┘
│
│ Subscription
▼
┌────────────────┐
│ Draco │
└─────┬──────────┘
│
│ Stores historical data
▼
┌────────────────┐
│ MongoDB │
│ (sth_test) │
└────────────────┘
This project implements the FAIR Data Principles (Findable, Accessible, Interoperable, and Reusable) to ensure effective data management and maximize the value of the parking occupancy digital twin.
Data and metadata are easy to find for both humans and computers.
✅ Implementation:
-
Standardized Entity Identification
- Each parking facility has a unique identifier in the FIWARE Orion Context Broker
- Entities follow FIWARE Smart Data Models conventions
- Consistent naming scheme across all parking facilities
-
Searchable APIs
- RESTful API endpoints for querying entities:
GET /v2/entities - Filter capabilities by entity type, attributes, and metadata
- MongoDB indexes for efficient historical data retrieval
- RESTful API endpoints for querying entities:
-
Metadata Documentation
- Entity attributes include type information and metadata
- Timestamps (
dateObserved,dateCreated,dateModified) for all records - Comprehensive API documentation in this README
-
Data Catalog
- All parking facilities catalogued with name, location, and capacity
- MongoDB collections organized by data type and purpose
- Version-controlled deployment configurations
Data can be accessed through standardized protocols and remains available over time.
✅ Implementation:
-
Open Standards & Protocols
- NGSI-v2: Standard context information management protocol
- HTTP/REST: Universal web protocols for all API interactions
- WebSocket (Socket.io): Real-time bidirectional communication
- MongoDB Wire Protocol: Standard database access
-
Public Data Source
- Data sourced from Málaga City Council's Open Data portal
- Publicly accessible endpoint:
datosabiertos.malaga.eu - No authentication required for public endpoints
-
Persistent Storage
- Historical data stored in MongoDB replica set (3 nodes)
- Kubernetes Persistent Volumes ensure data durability
- Daily backups possible through MongoDB snapshot mechanisms
-
Service Availability
- High availability through Kubernetes orchestration
- MongoDB replica set provides automatic failover
- Load balancing and service discovery via Kubernetes services
-
Multiple Access Patterns
- Web UI for end users
- RESTful APIs for programmatic access
- MongoDB direct access for analytics
- Socket.io for real-time updates
Data can be integrated with other data and work with different applications.
✅ Implementation:
-
Standard Data Formats
- JSON: Universal data interchange format for all APIs
- NGSI-v2: FIWARE standard for context information
- BSON: MongoDB native format, JSON-compatible
- Parquet: Considered for Spark data processing
-
FIWARE Ecosystem Integration
- Compatible with all FIWARE Generic Enablers
- Orion Context Broker as the central integration point
- Draco for seamless data persistence
- Can integrate with FIWARE IoT Agents, CEP, Big Data components
-
Standardized Vocabulary
- Follows FIWARE Smart Data Models
- Attributes use common semantic conventions:
name,location,capacity,availableSpots,totalSpots- Temporal attributes:
dateObserved,dateCreated,dateModified
- Occupancy levels normalized to 0-10 scale
-
Platform-Independent Architecture
- Containerized components run on any Kubernetes cluster
- Docker images portable across environments
- Cloud-agnostic design (works on AWS, Azure, GCP, on-premise)
-
Open APIs
- RESTful endpoints follow OpenAPI/Swagger conventions
- WebSocket protocol for real-time communication
- No proprietary protocols or vendor lock-in
-
Data Exchange Capabilities
- NGSI subscriptions for event-driven integration
- HTTP callbacks for asynchronous notifications
- Batch data export via MongoDB queries
- Real-time streaming through Socket.io
Data and models are well-described and can be replicated and combined in different settings.
✅ Implementation:
-
Comprehensive Documentation
- Detailed README with architecture diagrams
- Inline code comments in all components
- API endpoint documentation with examples
- Step-by-step deployment instructions
-
Automation & Reproducibility
- One-command deployment script:
./create_cluster.sh - Infrastructure as Code (Kubernetes manifests)
- Dockerfiles for reproducible container builds
- Version-pinned dependencies in all package managers
- One-command deployment script:
-
Open Source Components
- FIWARE Orion (Apache 2.0)
- Apache Spark (Apache 2.0)
- MongoDB (Server Side Public License)
- Node.js ecosystem (MIT/Apache licenses)
-
Modular Architecture
- Loosely coupled microservices
- Components can be replaced or upgraded independently
- Clear separation of concerns:
- Data ingestion (Python)
- Context management (Orion)
- Persistence (MongoDB, Draco)
- ML processing (Spark)
- Presentation (Node.js)
-
Model Reusability
- ML model saved in Spark's native format
- Feature engineering pipeline persisted separately
- Model can be exported for use in other frameworks
- Training code documented and parameterized
-
Extensibility
- Add new parking facilities by updating entity configurations
- Extend features by modifying Spark ML pipeline
- Integrate additional data sources through Orion subscriptions
- Plugin architecture for new visualizations
-
Clear Licensing
- Project licensed for academic and research use
- Third-party dependencies properly attributed
- Citation information provided at the top of README
-
Version Control
- All code version-controlled in Git
- Docker images tagged with versions
- Component versions explicitly specified in manifests
By implementing FAIR principles, this digital twin system:
- Enables collaboration across institutions and research groups
- Facilitates reproducibility of experiments and results
- Promotes innovation by allowing others to build upon this work
- Ensures long-term value of the data and infrastructure
- Supports open science and transparent research practices
- Allows integration with broader smart city initiatives
Potential improvements to strengthen FAIR compliance:
- Persistent Identifiers: Add DOIs for datasets and model versions
- Rich Metadata: Implement schema.org or DCAT metadata
- Data Provenance: Record complete lineage of predictions
- FAIR Metrics: Regular assessment using FAIR maturity indicators
- Data Catalog: Deploy CKAN or similar catalog software
- Semantic Web: Add RDF/Linked Data capabilities
- License Headers: Add SPDX license identifiers to all files
Algorithm: Random Forest Classifier
Why Random Forest?
- Handles non-linear relationships
- Robust to outliers
- Provides feature importance
- Good accuracy for categorical predictions
- No need for feature scaling
| Feature | Type | Description | Example |
|---|---|---|---|
name |
Categorical | Parking facility name | "Salitre" |
weekday |
Categorical | Day of week (1-7) | 3 (Tuesday) |
hour |
Numerical | Hour of day (0-23) | 14 |
month |
Numerical | Month (1-12) | 6 (June) |
- String Indexing: Converts categorical features to numerical indices
- One-Hot Encoding: Converts indices to binary vectors
- Vector Assembly: Combines all features into a single feature vector
Occupancy Level: Discretized into 10 levels (0-10)
Calculation:
available_ratio = availableSpots / totalSpots
occupation_percentage = (1 - available_ratio) * 100
occupation_level = round(occupation_percentage / 10)
- 0: 0-10% occupied (almost empty)
- 5: 50-60% occupied (half full)
- 10: 90-100% occupied (nearly full)
The model is evaluated using:
- Metric: Multiclass Classification Accuracy
- Train-Test Split: 80/20
- Validation: Hold-out validation
Typical accuracy: ~75-85% (varies based on training data volume)
Models are saved to a Kubernetes Persistent Volume:
- Location:
/opt/spark/work-dir/models/ - Format: Spark ML native format
- Components:
model/: Trained Random Forest modelpipeline/: Feature engineering pipeline
Receives notifications from Orion Context Broker with prediction results.
Request Body:
{
"data": [
{
"socketId": {"value": "socket-123"},
"predictionId": {"value": "pred-456"},
"predictionValue": {"value": 7},
"name": {"value": "Salitre"},
"weekday": {"value": 3},
"time": {"value": 14}
}
]
}Response: 200 OK
Displays historical predictions.
GET /: Main prediction interfaceGET /predictions.html: Historical predictions
Base URL: http://<minikube-ip>:30329/v2
curl http://<minikube-ip>:30329/v2/entitiescurl http://<minikube-ip>:30329/v2/entities/<entity-id>curl -X PATCH http://<minikube-ip>:30329/v2/entities/<entity-id>/attrs \
-H "Content-Type: application/json" \
-d '{
"name": {"value": "Salitre", "type": "String"},
"weekday": {"value": 3, "type": "Integer"},
"time": {"value": 14, "type": "Integer"}
}'Event: predict
Payload:
{
name: "Salitre",
year: 2026,
month: 2,
day: 21,
weekday: 5,
time: 14,
predictionId: "unique-id"
}Event: messages
Types:
- CONFIRMATION
{
type: "CONFIRMATION",
payload: {
msg: "Your request is being processed"
}
}- PREDICTION
{
type: "PREDICTION",
payload: {
socketId: "socket-123",
name: "Salitre",
weekday: 3,
time: 14,
predictionId: "pred-456",
predictionValue: 7
}
}- ERROR
{
type: "ERROR",
payload: {
msg: "There has been a problem with your request"
}
}Create or modify .env file:
# Orion Context Broker
ORION_PORT=1026
ORION_VERSION=2.3.0
# MongoDB
MONGO_DB_PORT=27017
MONGO_DB_VERSION=3.6
# Web Client
WEB_CLIENT_PORT=3000
CONTEXT_BROKER=http://orion:1026/v2
NGSI_VERSION=ngsi-v2replicas: 3
storageClass: mongodb-sc
volumeClaimTemplate: 1Gi per replicareplicas: 1
args:
- "-dbhost": MongoDB replica set connection string
- "-rplSet": MainRepSet
- "-logLevel": DEBUG
- "-httpTimeout": 15000Resource Requests (adjust based on your cluster):
spark.executor.instances: 1
spark.executor.memory: 2g
spark.driver.memory: 2gkubectl scale statefulset mongodb --replicas=5 -n tfmkubectl scale deployment orion --replicas=3 -n tfmkubectl scale deployment web-deployment --replicas=2 -n tfm-
Access the Web Interface:
minikube service web-service -n tfm
-
Select Parameters:
- Choose a parking facility from the dropdown
- Select a date using the calendar
- Pick an hour slot (0-23)
-
Submit Request:
- Click the submit button
- Wait for confirmation message
- Prediction result appears in seconds
-
Interpret Results:
- Prediction value: 0-10 scale
- 0-3: Low occupancy (easy to find parking)
- 4-7: Medium occupancy (some spots available)
- 8-10: High occupancy (difficult to find parking)
-
Ensure sufficient data in MongoDB:
kubectl exec -it mongodb-0 -n tfm -- mongo > use tfm > db.parking.count()
-
Submit training job:
cd spark-job ./spark-submit-train.sh -
Monitor training:
kubectl logs -f -n tfm -l job=training
-
Verify model saved to persistent volume:
kubectl exec -it <spark-driver-pod> -n tfm -- ls /opt/spark/work-dir/models/
-
Restart prediction job to use new model:
kubectl delete pod -n tfm -l job=prediction ./spark-submit-predict.sh
# Access MongoDB
kubectl exec -it mongodb-0 -n tfm -- mongo
# Query parking data
> use tfm
> db.parking.find().limit(10).pretty()
# Count documents
> db.parking.count()
# Query predictions
> use sth_test
> db.Pred.find().limit(10).pretty()# Orion logs
kubectl logs -f deployment/orion -n tfm
# Web application logs
kubectl logs -f deployment/web-deployment -n tfm
# Spark prediction logs
kubectl logs -f -n tfm -l job=prediction
# Draco logs
kubectl logs -f deployment/draco -n tfmSymptoms: Pods stuck in Pending or CrashLoopBackOff
Solutions:
# Check PVC status
kubectl get pvc -n tfm
# Check storage class
kubectl get sc
# Ensure sufficient disk space
minikube ssh "df -h"
# Delete and recreate
kubectl delete statefulset mongodb -n tfm
kubectl delete pvc -n tfm -l app=mongodb
kubectl apply -f kubernetes/mongodb-sc.yaml
kubectl apply -f kubernetes/mongodb-statefulSet.yamlSymptoms: Orion logs show connection errors
Solutions:
# Verify MongoDB replica set status
kubectl exec -it mongodb-0 -n tfm -- mongo --eval "rs.status()"
# Check MongoDB service
kubectl get svc mongodb-svc -n tfm
# Verify replica set configuration
sh statefulset/mongodb-rsconfig.sh mongodb-0Symptoms: Error during spark-submit
Solutions:
# Verify service account
kubectl get serviceaccount spark -n tfm
# Check cluster role binding
kubectl get clusterrolebinding spark-role
# Ensure Spark image is accessible
kubectl describe pod <spark-driver-pod> -n tfm
# Check Spark PVC
kubectl get pvc spark-pvc -n tfmSymptoms: Web UI shows "processing" but no result
Solutions:
# Check Spark prediction job is running
kubectl get pods -n tfm -l job=prediction
# Verify Orion subscriptions
curl http://$(minikube ip):30329/v2/subscriptions
# Check logs
kubectl logs -f deployment/web-deployment -n tfm
kubectl logs -f -n tfm -l job=prediction
# Verify network connectivity
kubectl exec -it <web-pod> -n tfm -- curl http://orion:1026/versionSymptoms: No data in MongoDB parking collection
Solutions:
# Check job status
kubectl get jobs -n tfm
# View job logs
kubectl logs -n tfm job/sink-job
# Verify network access to OpenData API
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -n tfm -- \
curl https://datosabiertos.malaga.eu/recursos/aparcamientos/ocupappublicosmun/ocupappublicosmunfiware.json
# Manually run job
kubectl delete job sink-job -n tfm
kubectl apply -f kubernetes/Jobs/sink-job.yamlSymptoms: Pods OOMKilled
Solutions:
# Increase Minikube memory
minikube stop
minikube start --cpus=4 --memory=16384
# Adjust Spark memory settings in spark-submit scripts
# Edit spark-submit-*.sh files:
--conf "spark.executor.memory=4g"
--conf "spark.driver.memory=4g"Symptoms: minikube service command hangs or fails
Solutions:
# Check Minikube status
minikube status
# Verify tunnel
minikube tunnel
# Use port-forward as alternative
kubectl port-forward -n tfm deployment/web-deployment 3000:3000
# Check NodePort services
kubectl get svc -n tfmIf all else fails, completely reset the deployment:
# Delete namespace
kubectl delete namespace tfm
# Clean up persistent volumes
kubectl delete pv --all
# Restart Minikube
minikube stop
minikube delete
minikube start --cpus=4 --memory=8192
# Redeploy
./create_cluster.shContributions are welcome! Here are some ways you can contribute:
-
Model Enhancement:
- Add more features (weather, events, holidays)
- Try different algorithms (XGBoost, Neural Networks)
- Implement hyperparameter tuning
- Add model versioning
-
Data Pipeline:
- Implement real-time data ingestion
- Add data validation and cleaning
- Create data quality dashboard
- Implement incremental model training
-
Infrastructure:
- Add Helm charts
- Implement GitOps with ArgoCD
- Add monitoring (Prometheus, Grafana)
- Implement service mesh (Istio)
-
Application:
- Add user authentication
- Improve UI/UX
- Add mobile support
- Implement caching layer (Redis)
-
Testing:
- Unit tests for all components
- Integration tests
- Load testing
- CI/CD pipeline
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is part of a Master's Thesis at Universidad Politécnica de Madrid (UPM).
- Málaga City Council for providing open data
- FIWARE Foundation for the open-source components
- Apache Spark community
- Universidad Politécnica de Madrid