PaLM: Scaling Language Modeling with Pathways
Scaling language models using Google's Pathways system, achieving state-of-the-art performance across hundreds of language understanding and generation tasks.
Researcher | Engineer | Investor | Entrepreneur | Dad
Hi, I'm Vedant. I work on advancing the state of the art in AI and using it to build new products.
I am an AI researcher at Google DeepMind and a founding member of the Gemini core team. I led Minerva and worked on PaLM and PaLM 2. I led the Algorithms and Reasoning teams at OpenAI and co-developed Codex. I founded and was CEO of Kemvi (acquired by HubSpot), and led the ML Labs team at HubSpot, focused on developing new machine learning products with language. I studied physics at Columbia.
My work has been covered by TechCrunch, Fortune, Wired, Technology Review, and others, and I have publications and patents spanning machine learning, natural language processing, program synthesis, medical imaging, human-computer interaction, black hole physics, and quantitative finance.
2023 – Present
Founding member of the Gemini core team. Focused on large-scale model pretraining and capability induction in posttraining.
2021 – 2023
Led Minerva and worked on PaLM and PaLM 2.
2019 – 2021
Led the Reasoning, Algorithms Core, and Multimodal teams. Co-developed Codex. Managed 20+ researchers responsible for shipping Image GPT, Jukebox, GPT-f, DALL-E, and CLIP, and contributing to GPT-3.
2017 – 2019
Joined via acquisition of Kemvi to lead the Labs team. Launched new applications of deep learning in sales and marketing, including Smart Compose, transcription, automated image captioning, search typo correction, and signup fraud prevention.
2013 – 2017
Founded and sold a deep learning startup; acquired by HubSpot. Raised angel and venture rounds, led product strategy and research+engineering team focused on generating customized sales and marketing content with deep learning.
2003 – 2010
Studied physics with research in theoretical cosmology (exotic black holes), experimental astrophysics (cosmic microwave background), and experimental particle physics (electron bubbles in liquid helium).
I invest in and advise technology companies
Scaling language models using Google's Pathways system, achieving state-of-the-art performance across hundreds of language understanding and generation tasks.
Introduced HumanEval benchmark for evaluating code generation capabilities of large language models, foundational work for GitHub Copilot and similar tools.
Technical report on Gemini, Google DeepMind's multimodal AI model family with state-of-the-art capabilities across text, image, audio, and video understanding.
Long-context understanding with up to 10M token context window, enabling new applications in document analysis and reasoning.
Next generation of PaLM with improved multilingual, reasoning, and coding capabilities.
Comprehensive benchmark (BIG-bench) with over 200 tasks for evaluating language model capabilities beyond simple imitation.
Demonstrated how language models can solve complex mathematical and quantitative reasoning problems through improved training approaches.
Latest generation Gemini model with enhanced reasoning, multimodality, and agentic capabilities.
Open-source language model family designed for responsible AI development and deployment.
Discovered the "grokking" phenomenon where neural networks suddenly generalize long after overfitting, with implications for understanding deep learning.
Analysis of how language models generalize to longer sequences than seen during training.
Classification of eccentric timelike orbits in charged black hole spacetime using dynamical systems theory, with applications to gravitational wave astronomy.
Patent for systems and methods to automatically generate personalized content for sales and marketing communications using machine learning.
Patent for methods and systems for dynamic visualization of complex information and data patterns.
Statistical analysis revealing evidence of market manipulation ("bear raids") at the beginning of the 2007 financial crisis.
Application of AI and statistical methods to improve diagnosis and treatment of glaucoma.
Research on adversarial attacks against transformer-based language models.
Analysis of vulnerability patterns in complex high-dimensional systems using network theory and statistical methods.
Analysis of short selling regulations and their impact on market stability, presented to the SEC.