|
| 1 | +# Event-Driven Data Pipeline with Lambda Durable Functions |
| 2 | + |
| 3 | +This serverless pattern demonstrates how to build an event-driven data processing pipeline using AWS Lambda Durable Functions with **direct SQS Event Source Mapping** and Lambda invoke chaining. |
| 4 | + |
| 5 | +## How It Works |
| 6 | + |
| 7 | +This pattern demonstrates an event-driven data processing pipeline using AWS Lambda Durable Functions with direct SQS Event Source Mapping. When a message arrives in the SQS queue, it directly triggers the durable function (no intermediary Lambda needed). The durable function then orchestrates a series of specialized processing steps using Lambda invoke chaining - first validating the incoming data, then transforming it (converting data_source to uppercase), and finally storing the processed results in DynamoDB. Throughout this process, the durable function automatically creates checkpoints, enabling fault-tolerant execution that can recover from failures without losing progress. The entire pipeline operates within the 15-minute ESM execution limit, making it ideal for reliable batch processing workflows. |
| 8 | + |
| 9 | +## Architecture Overview |
| 10 | + |
| 11 | +The pattern showcases two key Durable Functions capabilities: |
| 12 | +1. **Direct Event Source Mapping**: SQS directly triggers the durable function (15-minute limit) |
| 13 | +2. **Lambda Invoke Chaining**: Orchestrates specialized processing functions |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +## Key Features |
| 18 | + |
| 19 | +- **Direct ESM Integration**: No intermediary function needed |
| 20 | +- **15-minute execution constraint**: Demonstrates ESM time limits |
| 21 | +- **Fault-tolerant processing**: Automatic checkpointing and recovery |
| 22 | +- **Microservices coordination**: Chains specialized Lambda functions |
| 23 | +- **Batch processing**: Handles multiple SQS records per invocation |
| 24 | +- **Simple storage**: Uses DynamoDB for processed data |
| 25 | + |
| 26 | +## Important ESM Constraints |
| 27 | + |
| 28 | +⚠️ **15-Minute Execution Limit**: When using Event Source Mapping with Durable Functions, the total execution time cannot exceed 15 minutes. This includes: |
| 29 | +- All processing steps |
| 30 | +- Function invocations |
| 31 | +- No long wait operations |
| 32 | + |
| 33 | +## Use Cases |
| 34 | + |
| 35 | +- ETL pipelines with validation and transformation |
| 36 | +- Event-driven microservices orchestration |
| 37 | +- Batch processing with fault tolerance |
| 38 | +- Data processing workflows requiring checkpointing |
| 39 | + |
| 40 | +## Prerequisites |
| 41 | + |
| 42 | +- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) configured with appropriate permissions |
| 43 | +- [AWS SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html) latest version installed |
| 44 | +- [Python 3.14](https://www.python.org/downloads/release/python-3140/) runtime installed |
| 45 | + |
| 46 | +## Deployment |
| 47 | + |
| 48 | +1. **Build the application**: |
| 49 | + ```bash |
| 50 | + sam build |
| 51 | + ``` |
| 52 | + |
| 53 | +2. **Deploy to AWS**: |
| 54 | + ```bash |
| 55 | + sam deploy --guided |
| 56 | + ``` |
| 57 | + |
| 58 | + Note the outputs after deployment: |
| 59 | + - `DataProcessingQueueUrl`: Use this for `<QUEUE_URL>` |
| 60 | + - `ProcessedDataTable`: Use this for `<PROCESSED_DATA_TABLE>` |
| 61 | + |
| 62 | +3. **Test the pipeline**: |
| 63 | + ```bash |
| 64 | + # Send a test message to SQS |
| 65 | + aws sqs send-message \ |
| 66 | + --queue-url <QUEUE_URL> \ |
| 67 | + --message-body '{"data_source": "test.csv", "processing_type": "standard"}' |
| 68 | + --region <REPLACE_REGION> |
| 69 | + ``` |
| 70 | + |
| 71 | +4. **Verify successful processing**: |
| 72 | + ```bash |
| 73 | + # Check if data was processed and stored in DynamoDB |
| 74 | + aws dynamodb scan --table-name <PROCESSED_DATA_TABLE> --query 'Items[*]' --region <REPLACE_REGION> |
| 75 | + ``` |
| 76 | + |
| 77 | + **Success indicators:** |
| 78 | + - You should see at least one item in the DynamoDB table |
| 79 | + - Original input data: `"data_source": "test.csv"` |
| 80 | + - Transformed data: `"data_source": "TEST.CSV"` (uppercase transformation applied) |
| 81 | + - Execution tracking with unique `execution_id` |
| 82 | + - Timestamps showing when data was processed and stored |
| 83 | + |
| 84 | + This confirms the entire pipeline worked: SQS → Durable Function → Validation → Transformation → Storage → DynamoDB |
| 85 | + |
| 86 | +## Components |
| 87 | + |
| 88 | +### 1. Durable Pipeline Function (`src/durable_pipeline/`) |
| 89 | +- **Direct SQS Event Source Mapping**: Receives SQS events directly |
| 90 | +- **15-minute execution limit**: Must complete all processing within ESM constraints |
| 91 | +- **Batch processing**: Handles multiple SQS records per invocation |
| 92 | +- **Lambda invoke chaining**: Orchestrates validation, transformation, and storage |
| 93 | +- **Automatic checkpointing**: Recovers from failures without losing progress |
| 94 | + |
| 95 | +### 2. Specialized Processing Functions |
| 96 | +- **Validation Function**: Simple data validation checks |
| 97 | +- **Transformation Function**: Basic data transformation |
| 98 | +- **Storage Function**: Persists processed data to DynamoDB |
| 99 | + |
| 100 | +## Monitoring |
| 101 | + |
| 102 | +- CloudWatch Logs for execution tracking |
| 103 | +- DynamoDB table for processed data |
| 104 | +- SQS DLQ for failed messages |
| 105 | + |
| 106 | +## Configuration |
| 107 | + |
| 108 | +Key environment variables: |
| 109 | +- `ENVIRONMENT`: Deployment environment (dev/prod) |
| 110 | +- `PROCESSED_DATA_TABLE`: DynamoDB table for processed data |
| 111 | +- `VALIDATION_FUNCTION_ARN`: ARN of validation function |
| 112 | +- `TRANSFORMATION_FUNCTION_ARN`: ARN of transformation function |
| 113 | +- `STORAGE_FUNCTION_ARN`: ARN of storage function |
| 114 | + |
| 115 | +## ESM-Specific Considerations |
| 116 | + |
| 117 | +- **Execution Timeout**: Set to 900 seconds (15 minutes) maximum |
| 118 | +- **Batch Size**: Configured for optimal processing (5 records) |
| 119 | +- **Error Handling**: Uses SQS DLQ for failed batches |
| 120 | +- **Efficient Processing**: Optimized for speed to stay within time limits |
| 121 | + |
| 122 | +## Error Handling |
| 123 | + |
| 124 | +- Automatic retries with exponential backoff |
| 125 | +- Dead Letter Queue for failed messages |
| 126 | +- Partial batch failure support |
| 127 | +- Checkpoint-based recovery |
| 128 | + |
| 129 | +## Cost Optimization |
| 130 | + |
| 131 | +- Pay only for active compute time |
| 132 | +- Efficient batch processing |
| 133 | +- Automatic scaling based on queue depth |
| 134 | + |
| 135 | +## Cleanup |
| 136 | + |
| 137 | +```bash |
| 138 | +sam delete |
| 139 | +``` |
| 140 | + |
| 141 | +## Learn More |
| 142 | + |
| 143 | +- [AWS Lambda Durable Functions Documentation](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html) |
| 144 | +- [Event Source Mappings with Durable Functions](https://docs.aws.amazon.com/lambda/latest/dg/durable-invoking-esm.html) |
| 145 | +- [Lambda Invoke Chaining](https://docs.aws.amazon.com/lambda/latest/dg/durable-examples.html#durable-examples-chained-invocations) |
0 commit comments