-
Notifications
You must be signed in to change notification settings - Fork 11
More expensive and fun things to do #4
Description
How much does a win matter to an LLM? What's their "cash?"
I don't know. But some proposals out there include a genepool type concept -- more copies of the LLM -> success.
With this in mind, I wonder if winning LLMs could be told there would be a second copy of themselves in the next round if they win. And then you could tell LLMs that one of them is a copy, and either winning will get a copy made next time. Perhaps you say the model type perhaps not.
This opens up a bunch of metastrategy, avenues for lying, ("Yes, I am in fact Claude 3.5!" -- phi4) coordination that makes this a sort of doubly iterated game theoretic evaluation.
This could go with leaving notes for their future self -- you have 100 words to give yourself private advice.
Adding these in would allow us to evaluate how well the LLMs can self-program with external reward, and see if any can create sustained success, or how and when they can do that.