-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
The AWS SDK for Ruby crashes with ArgumentError: invalid byte sequence in US-ASCII when S3 returns XML responses with UTF-8 content but missing charset=utf-8 in the Content-Type header or no Content-Type header at all. This occurs in the Aws::S3::Plugins::Http200Errors plugin when it attempts to match regex patterns against strings that Net::HTTP has assigned US-ASCII encoding but contain UTF-8 byte sequences.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
The SDK should gracefully handle S3 XML responses regardless of Content-Type header charset specification, since S3 consistently declares encoding="UTF-8" in the XML declaration.
Current Behavior
The SDK crashes with encoding errors when processing valid S3 responses from list_object_versions and delete_objects operations that return inconsistent Content-Type headers.
Reproduction Steps
This issue occurs in production environments under high-volume S3 operations and cannot be reliably reproduced in controlled environments.
Triggering conditions:
- High-frequency calls to list_object_versions or delete_objects APIs
- S3 service returns responses with:
- Content-Type:
application/xml(missingcharset=utf-8), OR - No Content-Type header at all
- Content-Type:
- Net::HTTP assigns US-ASCII encoding to response body containing UTF-8 XML
- Http200Errors plugin attempts regex matching on the mismatched encoding
Production evidence:
- 385,000 occurrences per hour during normal operations
- Systematic across multiple S3 APIs (
list_object_versions,delete_objects) - Load-dependent (correlates with high-volume operations)
Possible Solution
Proposed fix with performance improvements:
# A regular expression to match error codes in the response body
CODE_PATTERN = %r{<Code>(.+?)</Code>}
private_constant :CODE_PATTERN
# A list of encodings we force into UTF-8
ENCODINGS_TO_FIX = [Encoding::US_ASCII, Encoding::ASCII_8BIT].freeze
private_constant :ENCODINGS_TO_FIX
# A regular expression to match detect errors in the response body
ERROR_PATTERN = /<\?xml\s[^>]*\?>\s*<Error>/
private_constant :ERROR_PATTERN
# A regular expression to match an error message in the response body
MESSAGE_PATTERN = %r{<Message>(.+?)</Message>}
private_constant :MESSAGE_PATTERN
def check_for_error(context)
xml = context.http_response.body_contents
# Fix encoding issue when S3 returns UTF-8 content with inconsistent headers
xml = xml.force_encoding('UTF-8') if xml.is_a?(String) && ENCODINGS_TO_FIX.include?(xml.encoding)
if xml.match?(ERROR_PATTERN)
error_code = xml.match(CODE_PATTERN)[1]
error_message = xml.match(MESSAGE_PATTERN)[1]
S3::Errors.error_class(error_code).new(context, error_message)
elsif incomplete_xml_body?(xml, context.operation.output)
Seahorse::Client::NetworkingError.new(S3::Errors.error_class('InternalError').new(context, 'Empty or incomplete response body'))
end
endKey improvements:
- Encoding normalization: Safely converts problematic encodings to UTF-8 before regex processing
- Performance optimization: Uses compiled regex constants instead of inline literals (reduces object allocation)
- Efficient boolean check: Uses match? instead of match for error detection (no MatchData object creation)
- Safe conversion: Only applies encoding fix to specific problematic encodings
- Backward compatible: No changes to method signature or return values
Additional Information/Context
Production impact:
- 385,000 occurrences in 1 hour of production traffic
- 337,000 list_object_versions responses with Content-Type: application/xml (missing charset)
- 48,000 delete_objects responses with Content-Type: application/xml (missing charset)
- 94 delete_objects responses with no Content-Type header
- Environment dependency: Issue appears specific to production environments under high load. Controlled testing in lower-volume environments does not reproduce the issue, suggesting it's triggered by specific S3 service conditions or load patterns in production.
Technical details:
- Affected APIs:
list_object_versionsanddelete_objectsconfirmed - AWS SDK version: aws-sdk-s3 1.208.0
- Region: Observed in
us-west-1
Contact: I'm an Amazon employee and can provide additional production data or coordinate internally if needed for investigation.
Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version
aws-sdk-s3
Environment details (Version of Ruby, OS environment)
MRI Ruby 3.3, any OS, any platform (arm, amd)