Paul Christiano — Preventing an AI takeover - Dwarkesh Podcast

Paul Christiano is the world’s leading AI safety researcher. My full episode with him is out! We discuss: - Does he regret inventing RLHF, and is alignment necessarily dual-use? - Why he has relatively modest timelines (40% by 2040, 15% by 2030), - What do we want post-AGI world to look like (do we want to keep gods enslaved forever)? - Why he’s leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon, - His current research into a new proof system, and how this could solve alignment by explaining model's behavior - and much more. Watch on YouTube . Listen on Apple Podcasts , Spotify , or any other podcast platform. Read the full transcript here . Follow me on Twitter for updates on future episodes. Open Philanthropy Open Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations. For more information and to apply, please see the application : https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/ The deadline to apply is November 9th ; make sure to check out those roles before they close. Timestamps (00:00:00) - What do we want post-AGI world to look like? (00:24:25) - Timelines (00:45:28) - Evolution vs gradient descent (00:54:53) - Misalignment and takeover (01:17:23) - Is alignment dual-use? (01:31:38) - Responsible scaling policies (01:58:25) - Paul’s alignment research (02:35:01) - Will this revolutionize theoretical CS and math? (02:46:11) - How Paul invented RLHF (02:55:10) - Disagreements with Carl Shulman (03:01:53) - Long TSMC but not NVIDIA Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Paul Christiano — Preventing an AI takeover

About this episode