🇬🇧 English | 🇫🇷 Français · Setup: AWS → · Azure → · GCP → · All docs →
Find $500–$20K/month of idle cloud waste in 60 seconds — no credentials needed:
# Try instantly (no install):
docker run --rm getcleancloud/cleancloud:latest demo
# Or install locally:
pipx install cleancloud
cleancloud demoscan and doctor with Docker require credential mounts → Docker usage →
CleanCloud scans AWS, Azure, and GCP and names specific idle resources as review candidates — with cost per resource. Read-only. No agents. No SaaS.
cleancloud demo --category ai
3 review candidates found:
1. [AWS] Idle GPU EC2 Instance (GPU utilisation <5% over 7 days)
Risk : Critical
Confidence : High
Resource : aws.ec2.instance → i-0a1b2c3d4e5f67890
Region : us-east-1
Rule : aws.ec2.gpu.idle
Reason : GPU utilisation 1.2% for 7 days (p4d.24xlarge — ml-training-cluster-node-1)
Details:
- estimated_monthly_cost: ~$23,374/month
2. [Azure] Idle ML Compute Instance (31 days since last activity)
Risk : High
Confidence : High
Resource : azure.ml.compute_instance → ws-prod/compute/ds-workstation-nc24
Region : eastus
Rule : azure.ml.compute_instance.idle
Reason : No control-plane activity for 31 days while Running (Standard_NC24s_v3, GPU)
Details:
- estimated_monthly_cost: ~$2,190/month
3. [AWS] Idle RDS Instance (Zero connections for 21 days)
Risk : High
Confidence : High
Resource : aws.rds.instance → db-prod-analytics
Region : us-east-1
Rule : aws.rds.instance.idle
Reason : Zero connections for 21 days (db.r5.large, postgres 15.4)
Details:
- estimated_monthly_cost: ~$380/month
--- Scan Summary ---
Total review candidates: 3
By risk: critical: 1 high: 2
Minimum estimated waste: ~$25,944/month
Full 10-finding example: docs/example-outputs.md
- Korben 🇫🇷 — Major French tech publication
- Last Week in AWS #457 — Corey Quinn's weekly AWS newsletter
"Solid discovery tool that bubbles up potential savings. Easy to install and use!" — Reddit user
CleanCloud is a cloud hygiene scanner — reads your inventory, flags specific idle resources as review candidates, and estimates the cost of keeping them running.
- Catches expensive idle AI/ML waste: SageMaker, AML, Vertex AI — GPU-backed resources flagged as higher-risk review candidates ($500–$23K/month)
- Works across AWS, Azure, and GCP in one tool
- Runs entirely in your environment — no agents, no SaaS, no credentials stored
- 46 curated, high-signal detection rules designed to avoid false positives in IaC environments
- CI/CD-ready — enforcement exit codes + JSON/CSV/markdown output
- No deletes or modifications to cloud resources
- No write access to any cloud API
- No credentials stored, no telemetry sent
- No SaaS account or agents required
Fully read-only. Safe for production and regulated environments.
# Add your cloud provider and scan:
pipx install 'cleancloud[aws]' # or [azure], [gcp], [all]
cleancloud scan --provider aws --all-regions
cleancloud scan --provider azure
cleancloud scan --provider gcp --all-projectsChoose your path:
| I want to… | Start here |
|---|---|
| Scan AWS | AWS setup (IAM policy, regions, multi-account) → |
| Scan Azure | Azure setup (RBAC, subscriptions, Workload Identity) → |
| Scan GCP | GCP setup (IAM, projects, ADC) → |
| Run in CI/CD | CI/CD guide (GitHub Actions, GitLab, exit codes) → |
| Suppress findings / set thresholds | Policy config reference → |
| Tag filtering, exception patterns, rollout advice | Best practices → |
| Scan multiple AWS accounts | Multi-account setup → |
| Getting an error | Troubleshooting → |
Not sure if your credentials have the right permissions? Run cleancloud doctor --provider aws first.
Idle AI/ML infrastructure is the fastest-growing source of invisible cloud spend. Unlike compute or storage, these resources bill at full rate even with zero activity — GPU-backed endpoints don't scale to zero.
| Resource | Idle cost range |
|---|---|
| Bedrock Provisioned Throughput | $600 – $7,300+ / MU / month |
| SageMaker endpoint (GPU) | $500 – $23,000 / month |
| SageMaker Notebook Instance (GPU) | $500 – $23,000+ / month |
| SageMaker Studio Apps (KernelGateway/JupyterLab/CodeEditor) | $42 – $1,600+ / month |
| SageMaker Training Job (runaway/hung GPU job) | $670 – $2,360+ / day |
| Azure AML compute cluster (GPU) | $600 – $15,000 / month |
| Azure ML Compute Instance (GPU) | $600 – $15,000+ / month |
| Azure ML Online Endpoint (GPU-backed) | $200 – $2,600+ / month |
| Azure AI Search (Basic+) | $261 – $4,028+ / month |
| Azure OpenAI Provisioned Deployment (PTU) | $1,460+ / PTU / month |
| Vertex AI Online Prediction endpoint (GPU) | $449 – $23,000+ / month |
| Vertex AI Workbench instance (GPU) | $449 – $8,000+ / month |
| Cloud TPU node (v4/v5p) | $188 – $750+/ day |
| Vertex AI Feature Store (Bigtable-backed) | $197 – $591+ / month |
CleanCloud detects zero-invocation / zero-prediction endpoints, stale managed notebook and app activity, and long-running managed training jobs across all three clouds. Native cost tools show the bill — they do not name the specific resource to review.
cleancloud scan --provider aws --category ai # Bedrock PTUs + SageMaker endpoints + notebooks + Studio apps + training jobs + idle GPU EC2
cleancloud scan --provider azure --category ai # AML compute + ML instances + online endpoints + AI Search + OpenAI PTUs
cleancloud scan --provider gcp --category ai # Vertex AI endpoints + Workbench + training jobs + Cloud TPU + Feature Stores
cleancloud scan --provider aws --category all # hygiene + AI/ML togetherNo setup required beyond the base install — opt-in with --category ai. Works with multi-account and multi-project scans:
cleancloud scan --provider aws --org --all-regions --category allAI/ML rules → · Full detection details →
- Platform and FinOps teams — run weekly hygiene scans across your AWS Org or Azure tenant, enforce waste thresholds, catch drift before it compounds
- Regulated industries — financial services, healthcare, and government teams that cannot send cloud account data to a SaaS vendor
- Mid-market engineering teams — too large to ignore cloud waste, too lean for enterprise FinOps platforms. Native cost tools show bills; CleanCloud shows what to review
- Cloud consultants and MSPs — run a read-only audit against a client account in minutes, export findings to markdown or JSON
- One-time audits — run in CloudShell, see findings in 60 seconds, no setup required
- Pre-review reports — export findings to markdown before a quarterly cost review or board meeting
Drop a cleancloud.yaml in your repo root. Every exception is a git-reviewable approval — version-controlled alongside your infrastructure.
# cleancloud.yaml
defaults:
confidence: MEDIUM # skip low-signal findings globally
min_cost: 10 # skip findings below $10/month
exceptions:
- rule_id: aws.ec2.instance.stopped
resource_id: i-0abc1234567890def
reason: "Bastion host — started on demand"
expires_at: "2026-12-31" # auto-expires — forces periodic review
- rule_id: aws.rds.instance.idle
resource_id: "db-test-*" # glob — suppress all test databases
reason: "Test databases are intentionally ephemeral"
thresholds:
fail_on_confidence: HIGH # exit 2 in CI if any HIGH confidence finding remains
fail_on_cost: 500 # exit 2 if total estimated waste exceeds $500/monthEnforce in CI/CD:
cleancloud scan --provider aws --org --all-regions # picks up cleancloud.yaml automaticallyFull policy config reference → · Best practices →
CleanCloud exits 0 by default — findings are reported, nothing blocked unless you ask.
# Weekly governance: fail if monthly waste crosses $500
cleancloud scan --provider aws --org --all-regions \
--output json --output-file findings.json \
--fail-on-cost 500
# Pre-deploy gate: block on any HIGH confidence waste
cleancloud scan --provider aws --region us-east-1 \
--fail-on-confidence HIGH| Exit code | Meaning |
|---|---|
0 |
No policy violation (or no enforcement flags set) |
1 |
Configuration error or unexpected failure |
2 |
Policy violation — threshold breached |
3 |
Missing credentials or insufficient permissions |
Full CI/CD guide → · AWS → · Azure → · GCP →
| AWS/Azure/GCP native cost tools | FinOps SaaS platforms | CleanCloud | |
|---|---|---|---|
| Shows cost trends | ✅ | ✅ | — |
| Names specific resources flagged for review | ❌ | partial | ✅ |
| Deterministic cost estimate per resource | ❌ | ❌ | ✅ |
| Detects idle AI/ML waste (SageMaker, AML, Vertex AI — including GPU-backed endpoints) | ❌ | ❌ | ✅ |
| Policy-as-code (exceptions + thresholds in git) | ❌ | ❌ | ✅ |
| Git-reviewable exception approvals | ❌ | ❌ | ✅ |
| Read-only, no agents | ✅ | ❌ | ✅ |
| Runs in air-gapped / regulated environments | ❌ | ❌ | ✅ |
| No SaaS account or vendor access required | ❌ | ❌ | ✅ |
| Multi-account / multi-subscription / multi-project | ❌ | ✅ | ✅ |
| CI/CD and scheduled enforcement (exit codes) | ❌ | ❌ | ✅ |
Multi-Account Scanning (AWS)
Built for enterprises running AWS Organizations. Scan every account in parallel — findings aggregated into one report.
# Scan from a config file (commit .cleancloud/accounts.yaml to your repo)
cleancloud scan --provider aws --multi-account .cleancloud/accounts.yaml --all-regions
# Inline account IDs — no file needed
cleancloud scan --provider aws --accounts 111111111111,222222222222 --all-regions
# Auto-discover all accounts in your AWS Organization
cleancloud scan --provider aws --org --all-regions --concurrency 5Permissions required:
| Role | Permissions |
|---|---|
| Hub account | 16 read-only permissions + sts:AssumeRole on spoke roles |
Hub account (--org only) |
Above + organizations:ListAccounts |
| Spoke accounts | 16 read-only permissions (same as single-account scan — no extra changes) |
.cleancloud/accounts.yaml — commit this to your repo:
role_name: CleanCloudReadOnlyRole
accounts:
- id: "111111111111"
name: production
- id: "222222222222"
name: stagingSpoke account trust policy — allows the hub to assume the role:
{
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::<HUB_ACCOUNT_ID>:root" },
"Action": "sts:AssumeRole"
}How it works:
- Hub-and-spoke — CleanCloud assumes
CleanCloudReadOnlyRolein each target account using STS. No persistent access, no stored credentials. - Three discovery modes —
.cleancloud/accounts.yamlfor explicit control,--accountsfor quick ad-hoc scans,--orgfor full AWS Organizations auto-discovery. - Efficient region detection — active regions are discovered once on the hub account and reused across all spokes. Without this: N accounts × 160 API calls just for region probing. With it: 160 calls once.
- Parallel with isolation — each account runs in its own thread with its own session. One account failing (AccessDenied, timeout) never affects the others.
- Partial-success visibility — if 2 regions fail and 7 succeed within an account, the account is marked
partialwith the failed regions named. - Live progress —
[3/50] done production (123456789012) — 47s, 12 findingsprinted as each account completes. - Per-account cost breakdown — JSON output includes estimated monthly waste per account, sortable and scriptable.
Full setup guide (IAM policy, trust policy, IaC templates): AWS multi-account setup →
Multi-Subscription Scanning (Azure)
Built for enterprises running large Azure tenants. Scan every subscription in parallel with one identity — findings aggregated into one report with a per-subscription cost breakdown.
# Scan all subscriptions the service principal can access (default)
cleancloud scan --provider azure
# Auto-discover via Management Group
cleancloud scan --provider azure --management-group <MANAGEMENT_GROUP_ID>
# Explicit list
cleancloud scan --provider azure --subscription <SUB_1> --subscription <SUB_2>Permissions required:
| Scope | Role |
|---|---|
| Each subscription | Reader (built-in) |
Management Group (if using --management-group) |
Reader + Microsoft.Management/managementGroups/read |
Assign Reader at the Management Group level and it inherits to all subscriptions underneath — no per-subscription role assignment needed:
az role assignment create \
--assignee <SERVICE_PRINCIPAL_CLIENT_ID> \
--role Reader \
--scope /providers/Microsoft.Management/managementGroups/<MANAGEMENT_GROUP_ID>How it works:
- Flat identity model — one service principal, Reader at Management Group level. No cross-subscription role assumption, no hub-and-spoke complexity.
- Three discovery modes — all accessible (default),
--management-groupfor auto-discovery,--subscriptionfor explicit control. - Parallel with isolation — each subscription runs in its own thread. One subscription failing (permission denied, timeout) never affects the others.
- Graceful permission handling — rules that fail with 403 are reported as skipped (with the missing permission named), not as scan failures.
- Per-subscription cost breakdown — output shows estimated monthly waste per subscription so you can see exactly which subscription is dirty.
Full setup guide (RBAC, Workload Identity, Management Group): Azure multi-subscription setup →
Multi-Project Scanning (GCP)
Built for teams running multiple GCP projects. Scan all accessible projects in parallel with one identity — findings aggregated into one report with a per-project cost breakdown.
# Scan all projects the identity can access (default — uses ADC project discovery)
cleancloud scan --provider gcp --all-projects
# Scan specific projects
cleancloud scan --provider gcp --project my-project-123 --project another-project-456Permissions required (per project):
| Permission | Required for |
|---|---|
compute.disks.list |
Unattached persistent disks |
compute.instances.list |
Stopped VM instances |
compute.addresses.list |
Unused regional static IPs |
compute.globalAddresses.list |
Unused global static IPs |
compute.snapshots.list |
Old disk snapshots |
cloudsql.instances.list |
Idle Cloud SQL instances |
monitoring.timeSeries.list |
SQL connection activity check |
All read-only permissions are covered by four predefined roles: roles/compute.viewer, roles/cloudsql.viewer, roles/monitoring.viewer, and roles/browser (required for --all-projects project enumeration). For CI/CD, use Workload Identity Federation — see GCP setup →.
Full setup guide: GCP setup →
Is it safe to run in production?
Yes. CleanCloud is read-only — it calls only List, Describe, and Get APIs. No writes, no deletes, no changes to your cloud account.
Does CleanCloud send my data anywhere? No. It runs entirely in your environment. No telemetry, no SaaS, no outbound connections except to your cloud provider's own APIs.
Will it flag resources my team manages with Terraform / CDK? CleanCloud detects actual idle state (zero connections, zero traffic, zero invocations) — not resource existence. A Terraform-managed RDS instance with zero connections for 30 days is still flagged. Use tag filtering or exceptions to suppress intentional infrastructure.
How do I suppress a specific resource?
Two options: tag it with cleancloud-ignore: true (tag filtering), or add an explicit exception in cleancloud.yaml (policy-as-code). Exceptions support glob patterns and expiry dates. See Policy config →.
My CI is failing on findings I don't care about. How do I fix it?
Don't disable enforcement — suppress the specific noise. Use min_cost to hide cheap findings, confidence: MEDIUM to skip low-signal ones, or add exceptions for known-good resources. See Troubleshooting →.
Can I run it without a cleancloud.yaml?
Yes. Without a config file all rules are enabled with their defaults. The config is optional — you can start with just a CLI flag and add a config later.
Does it work in air-gapped / private environments? Yes. CleanCloud only needs network access to your cloud provider's API endpoints. No external dependencies, no package downloads at scan time.
46 rules across AWS, Azure, and GCP — conservative, high-signal, designed to avoid false positives in IaC environments.
AWS:
- Compute: stopped instances 30+ days (EBS charges continue)
- Storage: unattached EBS volumes (HIGH), old EBS snapshots, old AMIs, old RDS snapshots 90+ days
- Network: unattached Elastic IPs (HIGH), detached ENIs, idle NAT Gateways, idle load balancers (HIGH)
- Platform: idle RDS instances (HIGH)
- Observability: infinite retention CloudWatch Logs
- Governance: untagged resources, unused security groups
- AI/ML (opt-in:
--category ai): idle Bedrock Provisioned Throughput (Model Units) with zero invocations 7+ days; idle SageMaker endpoints with no observedInvokeEndpointtraffic 14+ days; SageMaker Notebook Instances with stale control-plane timestamps 14+ days; SageMaker Studio apps (KernelGateway/JupyterLab/CodeEditor) with no usable recent activity signal 7+ days; SageMaker training jobs stillInProgressbeyond the 24h threshold
Azure:
- Compute: stopped (not deallocated) VMs (HIGH)
- Storage: unattached managed disks (HIGH), old snapshots
- Network: unused public IPs, empty load balancers (HIGH), empty App Gateways (HIGH), idle VNet Gateways
- Platform: empty App Service Plans (HIGH), idle SQL databases (HIGH), idle App Services, unused Container Registries
- Governance: untagged resources
- AI/ML (opt-in:
--category ai): idle AML compute clusters with non-zero baseline capacity and no workload activity 14+ days — GPU clusters flagged HIGH risk ($600–$15K/month); idle Compute Instances with no control-plane activity 14+ days — GPU instances CRITICAL risk ($600–$15K+/month); idle ML managed online endpoints with zero scoring requests 7+ days — GPU-backed endpoints flagged HIGH/CRITICAL ($200–$2,600+/month); idle AI Search services (Basic+) with zero queries 90+ days — billed per SKU × replicas × partitions ($261–$4,028+/month); idle Azure OpenAI provisioned deployments (PTUs) with zero API requests 7+ days — bills ~$1,460/PTU/month on-demand regardless of traffic
GCP:
- Compute: stopped instances 30+ days (disk charges continue) (HIGH)
- Storage: unattached Persistent Disks (HIGH), old snapshots 90+ days
- Network: unused reserved static IPs — regional and global (HIGH)
- Platform: idle Cloud SQL instances with zero connections 14+ days (HIGH)
- AI/ML (opt-in:
--category ai): idle Vertex AI Online Prediction endpoints with zero observed predictions 14+ days (dedicated nodes continue billing regardless of traffic) — GPU-backed endpoints flagged HIGH risk ($449–$23K+/month); idle Workbench instances (v1 + v2) with no control-plane activity 14+ days — GPU instances flagged HIGH/CRITICAL ($449–$8K+/month); long-running Vertex AI training jobs (CustomJobs + TrainingPipelines) beyond 24h threshold — CRITICAL risk for GPU/accelerator jobs at 3× threshold; idle Cloud TPU nodes (v2–v6e) in READY state with near-zero duty_cycle for 7+ days — idle v4 costs $12.88/hr, v5p-8 costs $33.60/hr; idle Vertex AI Feature Store online stores with zero ReadFeatureValues requests for 30+ days — Bigtable-backed stores bill ~$197/node/month regardless of activity
Rules without a confidence marker are MEDIUM — they use time-based heuristics or multiple signals. Start with --fail-on-confidence HIGH to catch obvious waste, then tighten as your team validates.
Full rule details, signals, and evidence: docs/rules.md
More AI/ML waste rules — orphaned training artifacts in S3
More AWS rules — S3 lifecycle gaps, Redshift idle, NAT Gateway cost leakage (internal services routing through NAT instead of VPC endpoints — S3, DynamoDB, ECR, SSM), unused VPC endpoints
More Azure rules — Azure Firewall idle, AKS node pool idle, Azure Batch unused pools
More GCP rules — GKE node pool idle, BigQuery slot waste, GCS cold storage, Cloud Run idle revisions
Rule filtering — --rules flag to run a subset of rules
docs/rules.md— Detection rules, signals, and evidencedocs/aws.md— AWS IAM policy and OIDC setupdocs/azure.md— Azure RBAC and Workload Identity setupdocs/gcp.md— GCP IAM permissions and Application Default Credentials setupdocs/ci.md— Automation, scheduled scans, and CI/CD integrationdocs/configuration.md— Policy-as-code: exceptions, thresholds, tag filteringdocs/best-practices.md— Rollout strategy, tag filtering patterns, exception patternsdocs/troubleshooting.md— Common errors and fixesdocs/example-outputs.md— Full output examplesSECURITY.md— Security policy and threat modeldocs/infosec-readiness.md— IAM Proof Pack, threat model
Found a bug? Open an issue
Feature request? Start a discussion
Questions? suresh@getcleancloud.com