Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 94 additions & 5 deletions hal/ubuntu/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,100 @@
# Ubuntu
# Ubuntu Cloud-Init for Cube AI

This directory contains the cloud-init configuration files for Cube AI.
This directory contains cloud-init configurations and QEMU launch scripts for running Cube AI in Ubuntu-based Confidential VMs (CVMs).

## After the first boot
## Overview

For local development, replace the following IP address entries in `docker/.env` with the IP address of the qemu virtual machine as follows:
Ubuntu 24.04 (Noble) has built-in support for both Intel TDX and AMD SEV-SNP confidential computing technologies. No additional kernel modules or packages need to be installed - the guest support is enabled by default in the kernel.

## Files

- `qemu.sh` - Main QEMU launch script with TDX/SNP support
- `user-data-tdx.yaml` - Cloud-init configuration for Intel TDX VMs
- `user-data-snp.yaml` - Cloud-init configuration for AMD SEV-SNP VMs
- `user-data-regular.yaml` - Cloud-init configuration for regular (non-CVM) VMs

## Usage

### Auto-detect CVM Support

```bash
sudo ./qemu.sh start
```

This will automatically detect available CVM support (TDX or SNP) and launch the VM with the appropriate configuration.

### Force Specific CVM Mode

```bash
UV_CUBE_NEXTAUTH_URL=http://<ip-address>:${UI_PORT}
# Intel TDX
sudo ./qemu.sh start_tdx

# AMD SEV-SNP
sudo ./qemu.sh start_snp

# Regular KVM (no CVM)
sudo ./qemu.sh start_regular
```

### Environment Variables

```bash
# Force specific CVM mode
ENABLE_CVM=tdx sudo ./qemu.sh start
ENABLE_CVM=snp sudo ./qemu.sh start
ENABLE_CVM=none sudo ./qemu.sh start

# Customize VM resources
RAM=32768M CPU=16 sudo ./qemu.sh start
```

### Detect Available Support

```bash
sudo ./qemu.sh detect
```

## CVM Support Details

### Intel TDX (Trust Domain Extensions)

- Ubuntu 24.04 kernel has `CONFIG_INTEL_TDX_GUEST=y` enabled by default
- Guest attestation available via `/sys/firmware/tdx` or configfs
- Quote generation via vsock (CID=2, port=4050)

### AMD SEV-SNP (Secure Nested Paging)

- Ubuntu 24.04 kernel has `CONFIG_SEV_GUEST=y` enabled by default
- Guest attestation available via `/dev/sev-guest`
- Modules: `sev-guest`, `ccp` (loaded automatically)

## After First Boot

For local development, update the following in `docker/.env`:

```bash
UV_CUBE_NEXTAUTH_URL=http://<vm-ip-address>:${UI_PORT}
```

Default SSH access:
- **Port**: 6190 (forwarded from guest port 22)
- **User**: ultraviolet
- **Password**: password

## Host Requirements

### For TDX VMs
- Intel CPU with TDX support (4th Gen Xeon Scalable or newer)
- TDX-enabled BIOS/firmware
- Host kernel with TDX module initialized

### For SNP VMs
- AMD EPYC CPU with SEV-SNP support (Milan or newer)
- SEV-SNP enabled in BIOS
- Host kernel with SEV-SNP support
- `/dev/sev` device available

### Common Requirements
- QEMU with confidential computing support
- OVMF firmware (for UEFI boot)
- KVM enabled
203 changes: 203 additions & 0 deletions hal/ubuntu/cloud/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Cloud-Init Configuration for Cube AI

This directory contains cloud-init configuration files for deploying Cube AI on Ubuntu-based confidential virtual machines (CVMs) on cloud providers.

## Cloud-Init Files

| File | Backend | Description |
|------|---------|-------------|
| `cube-agent-config.yml` | Ollama | Default configuration with Ollama for easy model management |
| `cube-agent-vllm-config.yml` | vLLM | High-performance configuration with vLLM for production workloads |

## Choosing a Backend

### Ollama (Recommended for Getting Started)

Use `cube-agent-config.yml` for:

- Quick setup and experimentation
- Running multiple models
- CPU or small GPU deployments
- Built-in quantization support (Q4_0, Q4_1, Q8_0)

### vLLM (Recommended for Production)

Use `cube-agent-vllm-config.yml` for:

- Maximum inference throughput
- Large-scale production deployments
- Multi-GPU setups with tensor parallelism
- Continuous batching and PagedAttention

## Deployment

### Google Cloud Platform (GCP)

```bash
# Clone infrastructure templates
git clone https://github.com/ultravioletrs/cocos-infra.git
cd cocos-infra

# Configure terraform.tfvars
cat >> terraform.tfvars << 'EOF'
vm_name = "cube-ai-vm"
project_id = "your-gcp-project-id"
region = "us-central1"
zone = "us-central1-a"
min_cpu_platform = "AMD Milan"
confidential_instance_type = "SEV_SNP"
machine_type = "n2d-standard-4"
cloud_init_config = "/path/to/cube/hal/ubuntu/cloud/cube-agent-config.yml"
EOF

# Deploy
cd gcp
tofu init && tofu apply -var-file="../terraform.tfvars"
```

### Microsoft Azure

```bash
# Configure terraform.tfvars
cat >> terraform.tfvars << 'EOF'
vm_name = "cube-ai-vm"
resource_group_name = "cube-ai-rg"
location = "westus"
subscription_id = "your-subscription-id"
machine_type = "Standard_DC4ads_v5"
cloud_init_config = "/path/to/cube/hal/ubuntu/cloud/cube-agent-config.yml"
EOF

# Deploy
cd azure
az login
tofu init && tofu apply -var-file="../terraform.tfvars"
```

## Configuration

### Environment Variables

Set these environment variables before deployment to customize the configuration:

| Variable | Default | Description |
|----------|---------|-------------|
| `CUBE_MODELS` | `tinyllama:1.1b` | Comma-separated Ollama models to pull |
| `CUBE_VLLM_MODEL` | `meta-llama/Llama-2-7b-hf` | HuggingFace model ID for vLLM |
| `CUBE_VLLM_GPU_COUNT` | `1` | Number of GPUs for tensor parallelism |
| `CUBE_AGENT_VERSION` | `latest` | Cube Agent release version |
| `HF_TOKEN` | - | HuggingFace token for gated models |

### TLS/mTLS Certificates

For production deployments, replace the self-signed certificate generation with your own certificates:

1. Edit the cloud-init file
2. Uncomment the certificate file sections
3. Replace placeholder content with your certificates
4. Update `/etc/cube/agent.env` to enable TLS

### Custom Models (Ollama)

Pull additional models by setting `CUBE_MODELS`:

```bash
export CUBE_MODELS="llama2:7b,mistral:latest,codellama:13b"
```

Or create a custom Modelfile after deployment:

```bash
ssh cubeadmin@<vm-ip>
cat > /tmp/Modelfile << 'EOF'
FROM llama2:7b
PARAMETER temperature 0.7
SYSTEM You are a helpful AI assistant.
EOF
sudo -u ollama /usr/local/bin/ollama create custom-assistant -f /tmp/Modelfile
```

## Verification

After deployment, verify the services are running:

```bash
# Check cloud-init completion
ssh cubeadmin@<vm-ip>
cloud-init status --wait

# Check service status
sudo systemctl status cube-agent
sudo systemctl status ollama # or vllm

# Test health endpoint
curl http://localhost:7001/health

# Test chat completion
curl http://<vm-ip>:7001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "tinyllama:1.1b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```

## VM Size Recommendations

### GCP

| Use Case | Machine Type | vCPUs | RAM |
|----------|--------------|-------|-----|
| Development | `n2d-standard-2` | 2 | 8GB |
| Production (Ollama) | `n2d-standard-4` | 4 | 16GB |
| Production (vLLM) | `n2d-standard-8` | 8 | 32GB |
| Production (vLLM + GPU) | `n1-standard-8` + T4 | 8 | 30GB |

### Azure

| Use Case | Machine Type | vCPUs | RAM |
|----------|--------------|-------|-----|
| Development | `Standard_DC2ads_v5` | 2 | 8GB |
| Production (Ollama) | `Standard_DC4ads_v5` | 4 | 16GB |
| Production (vLLM) | `Standard_DC8ads_v5` | 8 | 32GB |
| Production (vLLM + GPU) | `Standard_NC6s_v3` | 6 | 112GB |

## Troubleshooting

### Cloud-init not completing

```bash
# Check cloud-init logs
sudo cat /var/log/cloud-init-output.log
sudo cat /var/log/cloud-init.log
```

### Cube Agent not starting

```bash
# Check service logs
sudo journalctl -u cube-agent -f

# Verify configuration
cat /etc/cube/agent.env
```

### Ollama not responding

```bash
# Check service logs
sudo journalctl -u ollama -f

# Check if models are downloaded
sudo -u ollama /usr/local/bin/ollama list
```

### vLLM GPU issues

```bash
# Check NVIDIA driver
nvidia-smi

# Check vLLM logs
sudo journalctl -u vllm -f
```
Loading