-
-
Notifications
You must be signed in to change notification settings - Fork 326
Enable data sieving for chunks that can't be cached #6111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Enable data sieving for chunks that can't be cached #6111
Conversation
|
Related writeup: Note that this is a rather quick fix for the issue and a bit of refactoring in the chunked I/O code would be better in the long run. |
src/H5Dchunk.c
Outdated
| */ | ||
| dset_info->dset->shared->cache.contig.sieve_buf_size = H5F_SIEVE_BUF_SIZE(dset_info->dset->oloc.file); | ||
| dset_info->dset->shared->cache.contig.sieve_loc = HADDR_UNDEF; | ||
| dset_info->dset->shared->cache.contig.sieve_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are to force re-reads into the sieve buffer on each I/O to account for cases like changing the extent of a dataset between writes and reads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the sieve buffer isn't persistent for chunked datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we free the sieve buffer after I/O since the data in it is useless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sieve buffer is currently freed when the dataset is closed, but we could free it after I/O if you think that's better; it just seemed more efficient to keep it around. Setting sieve_loc to HADDR_UNDEF and sieve_size to 0 is just to make sure that the library doesn't read stale data from the sieve buffer in the case where the dataset extent changes. Note the sieve_size here is just the value tracking how much data is currently in the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't need the data in the buffer keeping it around is effectively treating it like a buffer on the free list. The sieve buffer is already integrated with the free list however, so I think we should just free it after I/O (it will be held open by the H5FL code if appropriate).
src/H5Defl.c
Outdated
| /* Get the sieve buffer size for this dataset - the smaller of the dataset size and | ||
| * the sieve buffer size from the FAPL is used | ||
| */ | ||
| dset->shared->cache.contig.sieve_buf_size = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the EFL code here to set the sieve buffer size to the smaller of the dataset size and the FAPL-specified size. This just matches the logic elsewhere in the library and what's currently documented for H5Pset_sieve_buf_size():
Internally, the library checks the storage sizes of the datasets in
the file. It picks the smaller one between the size from the file
access property and the size of the dataset to allocate the sieve
buffer for the dataset in order to save memory usage.
7f37860 to
cb23842
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a sieve buffer is not used, the test in this file is designed to run a long time to try and cause a timeout in CMake for TestExpress of 0, but run very quickly for a TestExpress of 3. We currently do testing with a TestExpress of 0 for every PR, but those tests should probably be moved to a scheduled run as originally intended since tests like this can be problematic for testing at that level for every PR.
Fixed an issue that prevented use of a data sieve buffer for I/O on dataset chunks when those chunks couldn't be cached by the library. This issue could result in worst-case behavior of I/O on a single data element at a time when chunks are non-contiguous with respect to memory layout. Added a test to attempt to catch performance regressions in I/O on dataset chunks that are non-contiguous with respect to memory layout Updated the External File List logic to set the data sieve buffer size to the smaller of the dataset size and the size set in the FAPL, similar to the logic elsewhere in the library
cb23842 to
f5e42d8
Compare
|
I would consider this a new feature, not a fix |
|
Has data sieving ever been used in this case before? The AI seems to think so but I think it's wrong. |
fortnern
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking "request changes", really just requesting discussion/responses to my comments, then we'll decide if changes are needed
I can see that, but it was very clearly a bug due to oversight in the way that the chunking code was written which led to drastic performance problems.
I haven't gone through the history much, but randomly picking 1.10.3 as a test branch shows the same problem. |
b1a99b2 to
680331c
Compare
680331c to
24a543e
Compare
| H5FL_EXTERN(H5S_sel_iter_t); | ||
|
|
||
| /* Declare the external PQ free list for the sieve buffer information */ | ||
| H5FL_BLK_EXTERN(sieve_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the sieve buffer being managed by H5Dchunk.c as well, it needs a reference to the free list. This is a bit messy as the sieve buffer is still almost completely controlled by H5Dcontig.c, but refactoring that seems outside the scope of this PR.
|
|
||
| done: | ||
| /* Free dataset sieve buffer and reset cached fields */ | ||
| if (dset_info->dset->shared->cache.sieve.sieve_buf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the resetting of fields in these sections is mostly redundant with the same above, I wanted to keep both so the context (comment) isn't lost in potential future refactoring.
Fixed an issue that prevented use of a data sieve buffer for I/O on dataset chunks when those chunks couldn't be cached by the library. This issue could result in worst-case behavior of I/O on a single data element at a time when chunks are non-contiguous with respect to memory layout.
Added a test to attempt to catch performance regressions in I/O on dataset chunks that are non-contiguous with respect to memory layout
Updated the External File List logic to set the data sieve buffer size to the smaller of the dataset size and the size set in the FAPL, similar to the logic elsewhere in the library
Important
Re-enable data sieve buffer for non-cached dataset chunks to improve I/O performance and add a test for performance regressions.
H5D__chunk_read()andH5D__chunk_write()inH5Dchunk.c.H5D__efl_construct()inH5Defl.cto set sieve buffer size to the smaller of dataset size and FAPL size.chunk_non_contig_mem_ioinio_perf.cto catch performance regressions for non-contiguous memory layout chunks.CHANGELOG.mdto document the performance fix.io_perftoH5_EXPRESS_TESTSinCMakeLists.txt.This description was created by
for cb23842. You can customize this summary. It will automatically update as commits are pushed.