Why isn't spatial_temporal_view_decomposition used in evaluate_a_set_of_videos.py?
Honestly, you should have an evaluate.py module so you don't duplicate so much code. It also makes it easier for others to build predictors, like my repo where I have to disentangle gnarly differences between the two implementations.