optimization datasets usage#3470
optimization datasets usage#3470Alexandr-Solovev wants to merge 66 commits intouxlfoundation:mainfrom
Conversation
|
I think moving the data to a folder named 'dev' makes it harder to find. |
I’m open to moving the data to a different folder. My first suggestion was oneDAL/data, but I’m not sure. Open to other suggestions, @Vika-F. |
What's wrong with the current location? |
~100MB of datasets are duplications and I would like to merge/unify them in the same folder |
But why would that require moving them elsewhere? Perhaps they could be grouped by dataset instead of by algorithm, or something like that. |
Okey, the problem is: samples And I want to remove datasets duplications here. For me it makes sense to move them in a separate folder with common acces for all samples and examples |
Could it be under a similar folder as where the CSV reader is? |
|
having a data/ folder in the root directory makes sense to me |
Don't think so, we have separate csv readers for oneDAL and DAAL and its pretty deep inside |
|
I am Ok with the placement of all the data files in the ./data folder, but need to check that the BOMs generation is Ok. |
samples/daal/cpp/mpi/sources/covariance_dense_distributed_mpi.cpp
Outdated
Show resolved
Hide resolved
samples/daal/cpp/mpi/sources/covariance_dense_distributed_mpi.cpp
Outdated
Show resolved
Hide resolved
samples/daal/cpp/mpi/sources/kmeans_init_csr_distributed_mpi.cpp
Outdated
Show resolved
Hide resolved
| size_t skip = rowStart; | ||
| while (skip > 0) { | ||
| size_t s1 = trainDataSource.loadDataBlock(skip); | ||
| size_t s2 = trainLabelSource.loadDataBlock(skip); | ||
| if (s1 == 0 || s2 == 0) | ||
| break; | ||
| skip -= s1; | ||
| } |
There was a problem hiding this comment.
This loop requires a comment at least.
And maybe it is possible to load the data without this loop at all?
samples/daal/cpp/mpi/sources/ridge_regression_norm_eq_distributed_mpi.cpp
Outdated
Show resolved
Hide resolved
| DataSource::doAllocateNumericTable, | ||
| DataSource::doDictionaryFromContext); | ||
|
|
||
| size_t skip = rowStart; |
There was a problem hiding this comment.
Please add this comment like in another similar place:
/* Skip rows before rowStart */
Though I do not fully understand why the loop is required, and why loadDataBlock(rowStart) is not enough.
Vika-F
left a comment
There was a problem hiding this comment.
Thanks for this restructuring!
The changes look good to me. Let's wait for the CI and LGTM!
|
/intelci: run |
Description
PR: Dataset Cleanup, De-duplication, and Example Parameter Refactoring
This PR introduces a cleanup and restructuring of datasets and examples, along with parameter de-hardcoding and new data split utilities.
Changes
root/data(or simplydata) and removed duplicate files.implicit_alscase where it is still required.Impact
make onedal_dpcbuild size by approximately 120 MB.Checklist:
Completeness and readability
Testing
Performance