ARC has released a paper on Formalizing the presumption of independence, an open problem currently central to our approach to Eliciting Latent Knowledge (ELK).
Mathematical proof aims to deliver confident conclusions, but a very similar process of deduction can be used to make uncertain estimates that are open to revision. A key ingredient in such reasoning is the use of a "default" estimate of \(\mathbb E[XY]=\mathbb E[X]\mathbb E[Y]\) in the absence of any specific information about the correlation between \(X\) and \(Y\), which we call the presumption of independence. Reasoning based on this heuristic is commonplace, intuitively compelling, and often quite successful—but completely informal.
In this paper we introduce the concept of a heuristic estimator as a potential formalization of this type of defeasible reasoning. We introduce a set of intuitively desirable coherence properties for heuristic estimators that are not satisfied by any existing candidates. Then we present our main open problem: is there a heuristic estimator that formalizes intuitively valid applications of the presumption of independence without also accepting spurious arguments?
About the relation to ELK, in Appendix F we write:
[A heuristic estimator] may let us see “why” a model makes its predictions. We could potentially use [it] to distinguish cases where similar behaviors are produced by very different mechanisms—for example distinguishing cases where a model predicts that a smiling human face will show up on camera because it predicts there will actually be a smiling human in the room, from cases where it makes the same prediction because it predicts that the camera will be tampered with.
For more details on the connection between the two problems, see Mechanistic anomaly detection and ELK and Finding gliders in the game of life.
Comment via LessWrong, Alignment Forum.