{"version":1,"type":"rich","provider_name":"Libsyn","provider_url":"https:\/\/www.libsyn.com","height":90,"width":600,"title":"41 - Lee Sharkey on Attribution-based Parameter Decomposition","description":"What's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms within neural networks: Attribution-based Parameter Decomposition, or APD for short. Patreon: https:\/\/www.patreon.com\/axrpodcast Ko-fi: https:\/\/ko-fi.com\/axrpodcast Transcript:  https:\/\/axrp.net\/episode\/2025\/06\/03\/episode-41-lee-sharkey-attribution-based-parameter-decomposition.html &amp;nbsp; Topics we discuss, and timestamps: 0:00:41 APD basics 0:07:57 Faithfulness 0:11:10 Minimality 0:28:44 Simplicity 0:34:50 Concrete-ish examples of APD 0:52:00 Which parts of APD are canonical 0:58:10 Hyperparameter selection 1:06:40 APD in toy models of superposition 1:14:40 APD and compressed computation 1:25:43 Mechanisms vs representations 1:34:41 Future applications of APD? 1:44:19 How costly is APD? 1:49:14 More on minimality training 1:51:49 Follow-up work 2:05:24 APD on giant chain-of-thought models? 2:11:27 APD and &quot;features&quot; 2:14:11 Following Lee's work &amp;nbsp; Lee links (Leenks): X\/Twitter: https:\/\/twitter.com\/leedsharkey Alignment Forum: https:\/\/www.alignmentforum.org\/users\/lee_sharkey &amp;nbsp; Research we discuss: Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-Based Parameter Decomposition: https:\/\/arxiv.org\/abs\/2501.14926 Toy Models of Superposition: https:\/\/transformer-circuits.pub\/2022\/toy_model\/index.html Towards a unified and verified understanding of group-operation networks: https:\/\/arxiv.org\/abs\/2410.07476 Feature geometry is outside the superposition hypothesis:  https:\/\/www.alignmentforum.org\/posts\/MFBTjb2qf3ziWmzz6\/sae-feature-geometry-is-outside-the-superposition-hypothesis &amp;nbsp; Episode art by Hamish Doodles: hamishdoodles.com ","author_name":"AXRP - the AI X-risk Research Podcast","author_url":"https:\/\/axrp.net","html":"<iframe title=\"Libsyn Player\" style=\"border: none\" src=\"\/\/html5-player.libsyn.com\/embed\/episode\/id\/36828940\/height\/90\/theme\/custom\/thumbnail\/yes\/direction\/forward\/render-playlist\/no\/custom-color\/88AA3C\/\" height=\"90\" width=\"600\" scrolling=\"no\"  allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen><\/iframe>","thumbnail_url":"https:\/\/assets.libsyn.com\/secure\/content\/189144680"}