Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)
While most conversations about generative AI focus on chatbots, Thomas Wiecki (PyMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy.
Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable.
We talk through:
Using LLMs as “synthetic consumers” to simulate surveys and test product ideas
How Bayesian modeling and causal graphs enable transparent, trustworthy decision-making
Building closed-loop systems where AI generates and critiques ideas
Guardrails for multi-agent workflows in marketing mix modeling
Where generative AI breaks (and how to detect failure modes)
The balance between useful models and “correct” models
If you’ve ever wondered how to move from flashy prototypes to AI systems that actually inform business strategy, this episode shows what it takes.
LINKS:
The AI MMM Agent, An AI-Powered Shortcut to Bayesian Marketing Mix Insights (https://www.pymc-labs.com/blog-posts/the-ai-mmm-agent)
AI-Powered Decision Making Under Uncertainty Workshop w/ Allen Downey & Chris Fonnesbeck (PyMC Labs) (https://youtube.com/live/2Auc57lxgeU)
The Podcast livestream on YouTube (https://youtube.com/live/so4AzEbgSjw?feature=share)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
--------
1:00:45
--------
1:00:45
Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)
While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.
Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines.
We talk through:
- Treating LLM workflows as ETL pipelines for unstructured text
- Error analysis: why you need humans reviewing the first 50–100 traces
- Guardrails like retries, validators, and “gleaning”
- How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs
- Cheap vs. expensive models: when to swap for savings
- Where agents fit in (and where they don’t)
If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.
LINKS
Shreya's website (https://www.sh-reya.com/)
DocETL, A system for LLM-powered data processing (https://www.docetl.org/)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/3r_Hsjy85nk)
Shreya's AI evals course, which she teaches with Hamel "Evals" Husain (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
--------
41:27
--------
41:27
Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters
While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of parameters down to this week’s release: Gemma 270M, the smallest member yet of the Gemma 3 open-weight family. At just 270 million parameters, a quarter the size of Gemma 1B, it’s designed for speed, efficiency, and fine-tuning.
We explore what makes 270M special, where it fits alongside its billion-parameter siblings, and why you might reach for it in production even if you think “small” means “just for experiments.”
We talk through:
- Where 270M fits into the Gemma 3 lineup — and why it exists
- On-device use cases where latency, privacy, and efficiency matter
- How smaller models open up rapid, targeted fine-tuning
- Running multiple models in parallel without heavyweight hardware
- Why “small” models might drive the next big wave of AI adoption
If you’ve ever wondered what you’d do with a model this size (or how to squeeze the most out of it) this episode will show you how small can punch far above its weight.
LINKS
Introducing Gemma 3 270M: The compact model for hyper-efficient AI (Google Developer Blog) (https://developers.googleblog.com/en/introducing-gemma-3-270m/)
Full Model Fine-Tune Guide using Hugging Face Transformers (https://ai.google.dev/gemma/docs/core/huggingface_text_full_finetune)
The Gemma 270M model on HuggingFace (https://huggingface.co/google/gemma-3-270m)
The Gemma 270M model on Ollama (https://ollama.com/library/gemma3:270m)
Building AI Agents with Gemma 3, a workshop with Ravin and Hugo (https://www.youtube.com/live/-IWstEStqok) (Code here (https://github.com/canyon289/ai_agent_basics))
From Images to Agents: Building and Evaluating Multimodal AI Workflows, a workshop with Ravin and Hugo (https://www.youtube.com/live/FNlM7lSt8Uk)(Code here (https://github.com/canyon289/ai_image_agent))
Evaluating AI Agents: From Demos to Dependability, an upcoming workshop with Ravin and Hugo (https://lu.ma/ezgny3dl)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/VZDw6C2A_8E)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16)
--------
45:40
--------
45:40
Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy
Traditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads research data science in Moderna’s data science and AI group, and over breakfast at SciPy we explored why AI products break the old rules, what skills different personas bring (and miss), and how to keep systems alive after the launch hype fades.
You’ll hear the clink of coffee cups, the murmur of SciPy in the background, and the occasional bite of frittata as we talk (hopefully also a feature, not a bug!)
We talk through:
• The three personas — and the blind spots each has when shipping AI systems
• Why “perfect” tests can be a sign you’re testing the wrong thing
• Development vs. production observability loops — and why you need both
• How curiosity about failing data separates good builders from great ones
• Ways large organizations can create space for experimentation without losing delivery focus
If you want to build AI products that thrive in the messy real world, this episode will help you embrace the chaos — and make it work for you.
LINKS
Eric' Website (https://ericmjl.github.io/)
More about the workshops Eric and Hugo taught at SciPy (https://hugobowne.substack.com/p/stress-testing-llms-evaluation-frameworks)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16)
--------
38:08
--------
38:08
Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference
Colab is cozy. But production won’t fit on a single GPU.
Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo scripts to scalable systems. In this episode, he joins me to demystify distributed training and inference — not just for research labs, but for any ML engineer trying to ship real software.
We talk through:
• From Colab to clusters: why scaling isn’t just about training massive models, but serving agents, handling load, and speeding up iteration
• Zero-to-two GPUs: how to get started without Kubernetes, Slurm, or a PhD in networking
• Scaling tradeoffs: when to care about interconnects, which infra bottlenecks actually matter, and how to avoid chasing performance ghosts
• The GPU middle class: strategies for training and serving on a shoestring, with just a few cards or modest credits
• Local experiments, global impact: why learning distributed systems—even just a little—can set you apart as an engineer
If you’ve ever stared at a Hugging Face training script and wondered how to run it on something more than your laptop: this one’s for you.
LINKS
Zach on LinkedIn (https://www.linkedin.com/in/zachary-mueller-135257118/)
Hugo's blog post on Stop Buliding AI Agents (https://www.linkedin.com/posts/hugo-bowne-anderson-045939a5_yesterday-i-posted-about-stop-building-ai-activity-7346942036752613376-b8-t/)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/stop-building-agents)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
Zach's course (45% off for VG listeners!): Scratch to Scale: Large-Scale Training in the Modern World (https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39) -- https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39
📺 Watch the video version on YouTube: YouTube link (https://youtube.com/live/76NAtzWZ25s?feature=share)
A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson.
It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.