Copy the formatted transcript to paste into ChatGPT or Claude for analysis
- Can you have a conversation with an AI
[00:02] where it feels like you talk to Einstein or Feynman
where you ask them a hard question,
they're like, "I don't know."
[00:10] And then after a week they did a lot of research-
- They disappear and come back.
[00:13] Yeah. - And they come back
and just blow your mind.
If we can achieve that,
that amount of inference compute
[00:19] where it leads to a dramatically better answer
as you apply more inference compute,
I think that will be the beginning
of, like, real reasoning breakthroughs.
(graphic whooshing)
AI-generated overview
Aravind Srinivas, CEO of Perplexity, discusses building an answer engine that combines search with large language models (LLMs) to provide cited, Wikipedia-style responses. Srinivas explains how Perplexity forces AI to cite sources for every claim, reducing hallucinations through retrieval-augmented generation (RAG). He draws inspiration from Google's founding principles, particularly Larry Page's focus on latency and user experience. The conversation explores technical advances in AI including transformers, self-attention mechanisms, and post-training techniques like RLHF. Srinivas argues that future AI breakthroughs will come from iterative inference compute rather than just pre-training scale. He discusses business models, contrasting Google's ad-based approach with Perplexity's subscription model, and addresses open-source AI, safety concerns, and the path toward AGI through bootstrapped reasoning and self-play mechanisms.
Perplexity's core innovation is forcing LLMs to cite sources for every sentence, similar to academic paper writing, which dramatically reduces hallucinations by grounding answers in verifiable web content.
The key to disrupting Google is not competing on their terms with traditional search, but creating an entirely different UI where answers, not links, occupy the primary real estate.
Future AI reasoning breakthroughs will come from applying more inference compute iteratively—allowing models to 'think' for extended periods (days or weeks) rather than just scaling pre-training data.
The transformer's success came from combining WaveNet's parallel computation insight with attention mechanisms, enabling efficient GPU utilization through masked convolutions and self-attention without parameters.