RedshiftCreateClusterOperator leaks Redshift cluster on failure with partial IAM permissions #61930

SameerMesiah97 · 2026-02-01T16:57:59Z

SameerMesiah97
Feb 1, 2026

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon>=9.21.0rc1

Apache Airflow version

main

Operating System

Debian GNU/Linux 12 (bookworm)

Deployment

Other

Deployment details

No response

What happened

When using RedshiftCreateClusterOperator, a Redshift cluster may be successfully created even when the AWS execution role has partial Redshift permissions, for example lacking redshift:DescribeClusters.

In this scenario, the operator successfully calls create_cluster and the Redshift cluster begins provisioning in AWS. However, subsequent steps—such as waiting for the cluster to become available when wait_for_completion=True—fail due to insufficient permissions.

The Airflow task then fails, but the Redshift cluster continues provisioning or remains active in AWS, resulting in leaked infrastructure and ongoing cost.

This can occur, for example, when the execution role allows redshift:CreateCluster but explicitly denies redshift:DescribeClusters, which is required by the waiter used to monitor cluster availability.

What you think should happen instead

If the operator fails after successfully initiating cluster creation (for example due to missing DescribeClusters or other follow-up permissions), it should make a best-effort attempt to clean up the partially created resource by deleting the cluster.

Cleanup should be attempted opportunistically (i.e. only if the cluster identifier is known and the necessary permissions are available), and failure to clean up should not mask or replace the original exception.

How to reproduce

Create an IAM role that allows redshift:CreateCluster but denies redshift:DescribeClusters.
Configure an AWS connection in Airflow using this role.
(The connection ID aws_test_conn is used for this reproduction.)
Ensure a valid Redshift cluster subnet group exists.
(For example: example-subnet-group.)
Use the following DAG:

from datetime import datetime

from airflow import DAG
from airflow.providers.amazon.aws.operators.redshift_cluster import (
    RedshiftCreateClusterOperator,
)

with DAG(
    dag_id="redshift_partial_auth_cluster_leak_repro",
    start_date=datetime(2025, 1, 1),
    schedule=None,
    catchup=False,
) as dag:
    create_cluster = RedshiftCreateClusterOperator(
        task_id="create_redshift_cluster",
        aws_conn_id="aws_test_conn",
        cluster_identifier="leaky-redshift-cluster",
        node_type="ra3.large",
        master_username="example",
        master_user_password="example",
        cluster_type="single-node",
        cluster_subnet_group_name="example-subnet-group",
        wait_for_completion=True,  # triggers DescribeClusters via waiter
    )

Trigger the DAG.

Observed Behaviour

The task fails due to missing redshift:DescribeClusters permissions, but the Redshift cluster is successfully created and remains active in AWS. The cluster is not cleaned up automatically and continues incurring cost.

Anything else

Redshift clusters begin incurring cost immediately once creation starts, even if the cluster never reaches an available state. When post-creation failures occur, leaked clusters can therefore result in unexpected and ongoing cost.

This issue follows a broader pattern across AWS operators where resources are created successfully but not cleaned up when subsequent steps fail. Apache Airflow has been introducing best-effort cleanup behavior to address this class of problems consistently across providers.

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

eladkal · 2026-02-02T07:07:24Z

eladkal
Feb 2, 2026
Collaborator

If the operator fails after successfully initiating cluster creation (for example due to missing DescribeClusters or other follow-up permissions), it should make a best-effort attempt to clean up the partially created resource by deleting the cluster.

Use setup and teardown to spin and clean resources

https://airflow.apache.org/docs/apache-airflow/stable/howto/setup-and-teardown.html#setup-and-teardown

There is no need to implement such logic per operator.

0 replies

potiuk · 2026-02-15T00:51:55Z

potiuk
Feb 15, 2026
Collaborator

converted to a discussion

2 replies

SameerMesiah97 Feb 15, 2026
Author

@potiuk

Thanks for looking at this issue. But there is a brief discussion in the linked PR #61333 where this issue was deliberated by me, @eladkal and @o-nikolas. I have no problems with this being kept as a discussion but I wanted to ensure that full context was visible to you (and all contributors).

potiuk Feb 16, 2026
Collaborator

Ah. Missed it. Recreated the issue now. #61974

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RedshiftCreateClusterOperator leaks Redshift cluster on failure with partial IAM permissions #61930

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RedshiftCreateClusterOperator leaks Redshift cluster on failure with partial IAM permissions #61930

Uh oh!

SameerMesiah97 Feb 1, 2026

Apache Airflow Provider(s)

Versions of Apache Airflow Providers

Apache Airflow version

Operating System

Deployment

Deployment details

What happened

What you think should happen instead

How to reproduce

Anything else

Are you willing to submit PR?

Code of Conduct

Replies: 2 comments · 2 replies

Uh oh!

eladkal Feb 2, 2026 Collaborator

Uh oh!

potiuk Feb 15, 2026 Collaborator

Uh oh!

SameerMesiah97 Feb 15, 2026 Author

Uh oh!

potiuk Feb 16, 2026 Collaborator

SameerMesiah97
Feb 1, 2026

Replies: 2 comments 2 replies

eladkal
Feb 2, 2026
Collaborator

potiuk
Feb 15, 2026
Collaborator

SameerMesiah97 Feb 15, 2026
Author

potiuk Feb 16, 2026
Collaborator