-
Notifications
You must be signed in to change notification settings - Fork 111
Description
Hi,
I am trying to create 100 rows of data, out of that i pass 99 in training, and only 1 in test data. But I am getting this error -
ValueError: Number of splits 10 is greater than the number of samples: 1.
Below is the code snippet:
create a list of base-models
def get_models():
models = list()
models.append(LinearRegression())
models.append(ElasticNet())
models.append(SVR(gamma='scale'))
models.append(DecisionTreeRegressor())
models.append(KNeighborsRegressor())
models.append(AdaBoostRegressor())
models.append(BaggingRegressor(n_estimators=10))
models.append(RandomForestRegressor(n_estimators=10))
models.append(ExtraTreesRegressor(n_estimators=10))
return models
cost function for base models
def rmse(yreal, yhat):
return sqrt(mean_squared_error(yreal, yhat))
create the super learner
def get_super_learner(X):
ensemble = SuperLearner(scorer=rmse, folds=10, shuffle=True, sample_size=len(X), random_state=42)
# add base models
models = get_models()
ensemble.add(models)
# add the meta model
ensemble.add_meta(LinearRegression())
return ensemble
from mlens.visualization import corr_X_y
create the inputs and outputs
X, y = make_regression(n_samples=100, n_features=4, noise=0.5)
split
X, X_val, y, y_val = train_test_split**(X, y, test_size=1,** random_state=42)
print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)
create the super learner
ensemble = get_super_learner(X)
fit the super learner
ensemble.fit(X, y)
summarize base learners
print(ensemble.data)
evaluate meta model
yhat = ensemble.predict(X_val)
print('Super Learner: RMSE %.3f' % (rmse(y_val, yhat)))
Output is : Train (99, 4) (99,) Test (1, 4) (1,)
score-m score-s ft-m ft-s pt-m pt-s
layer-1 adaboostregressor 67.84 8.31 1.31 0.02 0.02 0.01
layer-1 baggingregressor 65.24 7.93 0.34 0.01 0.00 0.00
layer-1 decisiontreeregressor 80.64 16.22 0.11 0.01 0.00 0.00
layer-1 elasticnet 46.53 8.68 0.08 0.00 0.00 0.00
layer-1 extratreesregressor 56.78 10.63 0.79 0.04 0.00 0.00
layer-1 kneighborsregressor 51.99 13.06 0.00 0.00 0.00 0.00
layer-1 linearregression 0.53 0.07 0.00 0.00 0.00 0.00
layer-1 randomforestregressor 66.39 7.15 0.75 0.03 0.00 0.00
layer-1 svr 125.71 19.63 0.07 0.00 0.00 0.00
and then the value error
When I do the same using manual creation of libraries, as described here -
https://machinelearningmastery.com/super-learner-ensemble-in-python/
it totally works, but it DOES NOT work with Mlens.
- kindly help me fix this.
- Also, how can I use random_seed to get the same results? I am using it in train-test split, and then inside super learner, but its not working.
- How it picked linear regression in ensemble.add_meta(LinearRegression()) line?
Kindly guide.