Perform Uniform sampling for equally weighted sample space training & evaluation

Currently, we just use the whole dataset, which can be 
1) A bit too much samples
2) It's heavily biased against scores that aren't popular, but are more significant, such as 90%, in contrast to 99.5%, which doesn't say much

To even the training, we should try to uniformly sample across the sample space for a more representative measure and training process