Conversation
|
Hello @aldder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2022-02-11 12:19:56 UTC |
|
Hi @aldder , please try to add some test cases in the |
Codecov Report
@@ Coverage Diff @@
## master #41 +/- ##
==========================================
+ Coverage 97.51% 97.88% +0.37%
==========================================
Files 3 3
Lines 643 756 +113
==========================================
+ Hits 627 740 +113
Misses 16 16
Continue to review full report at Codecov.
|
WenjieZ
left a comment
There was a problem hiding this comment.
Please see the comments and make changes accordingly.
tscv/_split.py
Outdated
| self.n_groups = N | ||
| self.test_splits = k | ||
|
|
||
| def split(self, X, y=None, groups=None): |
There was a problem hiding this comment.
The canonical way of doing this is to redefine _iter_test_indices(self, X, y=None, groups=None) from the base class and generate only the test indices. The base class will take care of the rest. Please refer to the other derived classes and make modification accordingly.
There was a problem hiding this comment.
The problem here is that in order to check if the training set size is > 0 we need to compute the complement of test indices after their generation.
And if we already do this, there is no point in discarding this information to recalculate it after
There was a problem hiding this comment.
The check of the training set size (and the test set size in some cases) can be delayed and implemented in GapCrossValidator._iter_train_indices() and its 3 siblings. You can also implement it in GapCrossValidator.split() if you find it a hustle to implement it four times.
There was a problem hiding this comment.
On second thought, I don't find it necessary to check the non-emptiness of the training set. Some models/algorithms can output a default estimator/strategy given an empty training set (e.g., equal weight portfolio). If your model/algorithm requires a non-empty training set, it's probably a good idea to check it in the model/algorithm rather than in the cross-validator. That's why I didn't implement this check when I released the package.
That said, I probably forgot to check the non-emptiness of the test set.
There was a problem hiding this comment.
Ok then, I will replace the implementation of the split method with _iter_test_indices
About the non-emptiness of the test set I think you prefer to implement it in the base class with a specific PR, so I won't touch it, ok?
tscv/_split.py
Outdated
| n_splits : int | ||
| Returns the number of splitting iterations in the cross-validator. | ||
| """ | ||
| return len(list(combinations(range(self.n_groups), self.test_splits))) |
There was a problem hiding this comment.
Use the combination number to generate the result directly rather than instantiating all combinations.
| from itertools import chain | ||
| from itertools import chain, combinations | ||
| from inspect import signature | ||
| from scipy.special import comb |
From "Advances in Financial Machine Learning" book by Marcos López de Prado
the implemented version of Combinatorial Cross Validation with Purging and Embargoing
explaining video: https://www.youtube.com/watch?v=hDQssGntmFA