Page 3 of
4
ARC’s current approach to ELK is to point to latent structure within a model by searching for the “reason” for particular correlations in the model’s output. In this post we’ll walk through a very simple example of using this approach to identify gliders in the game of life.
…
»
An informal description of ARC’s current research approach, follow-up to Eliciting Latent Knowledge
…
»
ARC has released a paper on Formalizing the presumption of independence, an open problem currently central to our approach to Eliciting Latent Knowledge (ELK).
…
»
From January - February the Alignment Research Center offered prizes for proposed algorithms for eliciting latent knowledge. In total we received 197 proposals and are awarding 32 prizes of $5k-20k. We are also giving 24 proposals honorable mentions of $1k, for a total of $274,000.
…
»
Thank you to all those who have submitted proposals to the ELK proposal competition. We evaluated 30 distinct proposals from 25 people. We awarded a total of $70,000 for proposals from 8 people.
…
»
Roughly speaking, the goal of ELK is to incentivize ML models to honestly answer “straightforward” questions where the right answer is unambiguous and known by the model. We are offering prizes of $5,000 to $50,000 for proposed strategies for ELK.
…
»
In this post I’ll describe some possible approaches to eliciting latent knowledge (ELK) not discussed in our report. These are basically restatements of proposals by Davidad, Rohin, Ramana, and John Maxwell. For each approach, I’ll present one or two counterexamples that I think would break it.
…
»