Skip to content

S3 Http200Errors plugin fails with encoding error when S3 omits charset in Content-Type header #3337

@johvet

Description

@johvet

Describe the bug

The AWS SDK for Ruby crashes with ArgumentError: invalid byte sequence in US-ASCII when S3 returns XML responses with UTF-8 content but missing charset=utf-8 in the Content-Type header or no Content-Type header at all. This occurs in the Aws::S3::Plugins::Http200Errors plugin when it attempts to match regex patterns against strings that Net::HTTP has assigned US-ASCII encoding but contain UTF-8 byte sequences.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

The SDK should gracefully handle S3 XML responses regardless of Content-Type header charset specification, since S3 consistently declares encoding="UTF-8" in the XML declaration.

Current Behavior

The SDK crashes with encoding errors when processing valid S3 responses from list_object_versions and delete_objects operations that return inconsistent Content-Type headers.

Reproduction Steps

This issue occurs in production environments under high-volume S3 operations and cannot be reliably reproduced in controlled environments.

Triggering conditions:

  1. High-frequency calls to list_object_versions or delete_objects APIs
  2. S3 service returns responses with:
    • Content-Type: application/xml (missing charset=utf-8), OR
    • No Content-Type header at all
  3. Net::HTTP assigns US-ASCII encoding to response body containing UTF-8 XML
  4. Http200Errors plugin attempts regex matching on the mismatched encoding

Production evidence:

  • 385,000 occurrences per hour during normal operations
  • Systematic across multiple S3 APIs (list_object_versions, delete_objects)
  • Load-dependent (correlates with high-volume operations)

Possible Solution

Proposed fix with performance improvements:

# A regular expression to match error codes in the response body
CODE_PATTERN = %r{<Code>(.+?)</Code>}
private_constant :CODE_PATTERN

# A list of encodings we force into UTF-8
ENCODINGS_TO_FIX = [Encoding::US_ASCII, Encoding::ASCII_8BIT].freeze
private_constant :ENCODINGS_TO_FIX

# A regular expression to match detect errors in the response body
ERROR_PATTERN = /<\?xml\s[^>]*\?>\s*<Error>/
private_constant :ERROR_PATTERN

# A regular expression to match an error message in the response body
MESSAGE_PATTERN = %r{<Message>(.+?)</Message>}
private_constant :MESSAGE_PATTERN

def check_for_error(context)
  xml = context.http_response.body_contents

  # Fix encoding issue when S3 returns UTF-8 content with inconsistent headers
  xml = xml.force_encoding('UTF-8') if xml.is_a?(String) && ENCODINGS_TO_FIX.include?(xml.encoding)

  if xml.match?(ERROR_PATTERN)
    error_code    = xml.match(CODE_PATTERN)[1]
    error_message = xml.match(MESSAGE_PATTERN)[1]

    S3::Errors.error_class(error_code).new(context, error_message)
  elsif incomplete_xml_body?(xml, context.operation.output)
    Seahorse::Client::NetworkingError.new(S3::Errors.error_class('InternalError').new(context, 'Empty or incomplete response body'))
  end
end

Key improvements:

  1. Encoding normalization: Safely converts problematic encodings to UTF-8 before regex processing
  2. Performance optimization: Uses compiled regex constants instead of inline literals (reduces object allocation)
  3. Efficient boolean check: Uses match? instead of match for error detection (no MatchData object creation)
  4. Safe conversion: Only applies encoding fix to specific problematic encodings
  5. Backward compatible: No changes to method signature or return values

Additional Information/Context

Production impact:

  • 385,000 occurrences in 1 hour of production traffic
    • 337,000 list_object_versions responses with Content-Type: application/xml (missing charset)
    • 48,000 delete_objects responses with Content-Type: application/xml (missing charset)
    • 94 delete_objects responses with no Content-Type header
  • Environment dependency: Issue appears specific to production environments under high load. Controlled testing in lower-volume environments does not reproduce the issue, suggesting it's triggered by specific S3 service conditions or load patterns in production.

Technical details:

  • Affected APIs: list_object_versions and delete_objects confirmed
  • AWS SDK version: aws-sdk-s3 1.208.0
  • Region: Observed in us-west-1

Contact: I'm an Amazon employee and can provide additional production data or coordinate internally if needed for investigation.

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

aws-sdk-s3

Environment details (Version of Ruby, OS environment)

MRI Ruby 3.3, any OS, any platform (arm, amd)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.investigatingIssue is being investigated

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions