Skip to content

Optimize email.header.decode_header by avoiding list.pop(0) calls - 1.02x improvement #143542

@heikkitoivonen

Description

@heikkitoivonen

Feature or enhancement

Proposal:

The email.header.decode_header function does multiple calls to parts.pop(0) in a loop, where parts is a list. This is slow, because it has to shift the rest of the elements every time.

I have locally changed code where I stop doing pop(0), and instead operate on list index, which avoids the issue.

This is what pyperf says (the test string is copied from unit tests):

$ python -m pyperf timeit 'from email.header import decode_header;decode_header("=?ISO-8859-1?Q?Andr=E9?= Pirard <[email protected]>")' -o email_decode_header_baseline.json
# apply optimization
$ python -m pyperf timeit 'from email.header import decode_header;decode_header("=?ISO-8859-1?Q?Andr=E9?= Pirard <[email protected]>")' -o email_decode_header_optimized.json
$ python -m pyperf compare_to email_decode_header_baseline.json email_decode_header_optimized.json 
Mean +- std dev: [email_decode_header_baseline] 6.21 us +- 0.05 us -> [email_decode_header_optimized] 6.11 us +- 0.06 us: 1.02x faster

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    pendingThe issue will be closed if no feedback is providedperformancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytopic-emailtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions