{"version":1,"type":"rich","provider_name":"Libsyn","provider_url":"https:\/\/www.libsyn.com","height":90,"width":600,"title":"29 - Science of Deep Learning with Vikrant Varma","description":"In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way that doesn't generalize), but then suddenly switch to understanding the 'real' solution in a way that generalizes. What's going on with these discoveries? Are they all they're cracked up to be, and if so, how are they working? In this episode, I talk to Vikrant Varma about his research getting to the bottom of these questions. Patreon: patreon.com\/axrpodcast Ko-fi: ko-fi.com\/axrpodcast &amp;nbsp; Topics we discuss, and timestamps: 0:00:36 - Challenges with unsupervised LLM knowledge discovery, aka contra CCS &amp;nbsp; 0:00:36 - What is CCS? &amp;nbsp; 0:09:54 - Consistent and contrastive features other than model beliefs &amp;nbsp; 0:20:34 - Understanding the banana\/shed mystery &amp;nbsp; 0:41:59 - Future CCS-like approaches &amp;nbsp; 0:53:29 - CCS as principal component analysis 0:56:21 - Explaining grokking through circuit efficiency &amp;nbsp; 0:57:44 - Why research science of deep learning? &amp;nbsp; 1:12:07 - Summary of the paper's hypothesis &amp;nbsp; 1:14:05 - What are 'circuits'? &amp;nbsp; 1:20:48 - The role of complexity &amp;nbsp; 1:24:07 - Many kinds of circuits &amp;nbsp; 1:28:10 - How circuits are learned &amp;nbsp; 1:38:24 - Semi-grokking and ungrokking &amp;nbsp; 1:50:53 - Generalizing the results 1:58:51 - Vikrant's research approach 2:06:36 - The DeepMind alignment team 2:09:06 - Follow-up work &amp;nbsp; The transcript:  axrp.net\/episode\/2024\/04\/25\/episode-29-science-of-deep-learning-vikrant-varma.html Vikrant's Twitter\/X account: twitter.com\/vikrantvarma_ &amp;nbsp; Main papers: &amp;nbsp;- Challenges with unsupervised LLM knowledge discovery: arxiv.org\/abs\/2312.10029 &amp;nbsp;- Explaining grokking through circuit efficiency: arxiv.org\/abs\/2309.02390 &amp;nbsp; Other works discussed: &amp;nbsp;- Discovering latent knowledge in language models without supervision (CCS): arxiv.org\/abs\/2212.03827 - Eliciting Latent Knowledge: How to Tell if your Eyes Deceive You:&amp;nbsp;https:\/\/docs.google.com\/document\/d\/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8\/edit - Discussion: Challenges with unsupervised LLM knowledge discovery:&amp;nbsp;lesswrong.com\/posts\/wtfvbsYjNHYYBmT3k\/discussion-challenges-with-unsupervised-llm-knowledge-1 - Comment thread on the banana\/shed results:&amp;nbsp;lesswrong.com\/posts\/wtfvbsYjNHYYBmT3k\/discussion-challenges-with-unsupervised-llm-knowledge-1?commentId=hPZfgA3BdXieNfFuY - Fabien Roger, What discovering latent knowledge did and did not find:&amp;nbsp;lesswrong.com\/posts\/bWxNPMy5MhPnQTzKz\/what-discovering-latent-knowledge-did-and-did-not-find-4 - Scott Emmons, Contrast Pairs Drive the Performance of Contrast Consistent Search (CCS):&amp;nbsp;lesswrong.com\/posts\/9vwekjD6xyuePX7Zr\/contrast-pairs-drive-the-empirical-performance-of-contrast - Grokking: Generalizing Beyond Overfitting on Small Algorithmic Datasets:&amp;nbsp;arxiv.org\/abs\/2201.02177 - Keeping Neural Networks Simple by Minimizing the Minimum Description Length of the Weights (Hinton 1993 L2):&amp;nbsp;dl.acm.org\/doi\/pdf\/10.1145\/168304.168306 - Progress measures for grokking via mechanistic interpretability:&amp;nbsp;arxiv.org\/abs\/2301.0521 &amp;nbsp; Episode art by Hamish Doodles:&amp;nbsp;hamishdoodles.com ","author_name":"AXRP - the AI X-risk Research Podcast","author_url":"https:\/\/axrp.net","html":"<iframe title=\"Libsyn Player\" style=\"border: none\" src=\"\/\/html5-player.libsyn.com\/embed\/episode\/id\/30988158\/height\/90\/theme\/custom\/thumbnail\/yes\/direction\/forward\/render-playlist\/no\/custom-color\/88AA3C\/\" height=\"90\" width=\"600\" scrolling=\"no\"  allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen><\/iframe>","thumbnail_url":"https:\/\/assets.libsyn.com\/secure\/content\/171189798"}