Last week, ARC released a paper called Towards a Law of Iterated Expectations for Heuristic Estimators, which follows up on previous work on formalizing the presumption of independence. Most of the work described
…
»
Machine learning systems are typically trained to maximize average-case performance. However, this method of training can fail to meaningfully control the probability of tail events that might cause significant harm. For instance, while
…
»
ARC has released a paper on Backdoor defense, learnability and obfuscation in which we study a formal notion of backdoors in ML models. Part of our motivation for this is an analogy between
…
»
ARC's current research focus can be thought of as trying to combine mechanistic interpretability and formal verification. If we had a deep understanding of what was going on inside a neural
…
»
Earlier this year ARC posted a prize for two matrix completion problems. We received a number of submissions we considered useful, but not any complete solutions. We are closing the contest and awarding
…
»
The Alignment Research Center’s Theory team is starting a new hiring round for researchers with a theoretical background. Please apply here.
Update January 2024: we have paused hiring and expect to reopen
…
»
Here are two self-contained algorithmic questions that have come up in our research. We're offering a bounty of $5k for a solution to either of them—either an algorithm, or a
…
»