GPU = true NData = 16 After about 40 epochs it gets stuck getting reward continuously. Action is none Looks like None is one of the actions in the enum and is being represented in the layer. We should filter that out.