Implement multipart copy and copying a particular version#308
Implement multipart copy and copying a particular version#308
Conversation
|
bors try |
|
Relevant: JuliaCloud/AWS.jl#695 |
| [multipart copy](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html). | ||
|
|
||
| # Optional Arguments | ||
| - `part_size_mb`: maximum size per uploaded part, in mebibytes (MiB). |
There was a problem hiding this comment.
I wonder if it's worth exposing an option that allows matching the part size between the source and destination. IIUC, that should make the range-based accesses faster while copying. If a file is big enough for a multipart copy, it was probably uploaded with a multipart upload, in which case the parts and their sizes can be obtained with S3.get_object_attributes. Lacking that permission, one can also get the part size with S3.head_object by passing Dict("partNumber" => 1) as a query parameter, and the number of parts will be in the entity tag of the source object.
| to_bucket, | ||
| to_path, | ||
| "$bucket/$path", | ||
| source, | ||
| Dict("headers" => headers); | ||
| aws_config=aws, | ||
| kwargs..., |
There was a problem hiding this comment.
[JuliaFormatter] reported by reviewdog 🐶
| to_bucket, | |
| to_path, | |
| "$bucket/$path", | |
| source, | |
| Dict("headers" => headers); | |
| aws_config=aws, | |
| kwargs..., | |
| to_bucket, to_path, source, Dict("headers" => headers); aws_config=aws, kwargs... |
| "x-amz-copy-source-range" => string( | ||
| "bytes=", first(byte_range), '-', last(byte_range) | ||
| ) |
There was a problem hiding this comment.
[JuliaFormatter] reported by reviewdog 🐶
| "x-amz-copy-source-range" => string( | |
| "bytes=", first(byte_range), '-', last(byte_range) | |
| ) | |
| "x-amz-copy-source-range" => | |
| string("bytes=", first(byte_range), '-', last(byte_range)), |
Summary of changes: - `s3_copy` now supports a `version` keyword argument that facilitates copying a specified version of an object. - A new function `s3_multipart_copy` to mirror `s3_multipart_upload` has been added, which calls `UploadPartCopy` in the API. - An explicit `cp(::S3Path, ::S3Path)` method has been implemented, which avoids the fallback `cp(::AbstractPath, ::AbstractPath)` method that reads the source file into memory before writing to the destination. - `cp(::S3Path, ::S3Path)` allows the user to opt into a multipart copy, in which case multipart is used when the source is larger than the specified part size (50 MiB by default). A multipart copy is unconditionally used when the source is at least 5 GiB. This behavior mimics that of the AWS CLI. Note that this now requires an additional API call to `HeadObject` in order to retrieve the source size.
57bf305 to
27ad265
Compare
I've had these changes locally for months (possibly a year or more?) but hadn't committed or pushed them. I don't know if/when I'll have the bandwidth to ensure this gets over the finish line, so if someone is interested in picking this up then please feel free to do so.
Summary of changes:
s3_copynow supports aversionkeyword argument that facilitates copying a specified version of an object.s3_multipart_copyto mirrors3_multipart_uploadhas been added, which callsUploadPartCopyin the API.cp(::S3Path, ::S3Path)method has been implemented, which avoids the fallbackcp(::AbstractPath, ::AbstractPath)method that reads the source file into memory before writing to the destination.cp(::S3Path, ::S3Path)allows the user to opt into a multipart copy, in which case multipart is used when the source is larger than the specified part size (50 MiB by default). A multipart copy is unconditionally used when the source is at least 5 GiB. This behavior mimics that of the AWS CLI. Note that this now requires an additional API call toHeadObjectin order to retrieve the source size.