PodcastScienzeLearning Bayesian Statistics

Learning Bayesian Statistics

Alexandre Andorra
Learning Bayesian Statistics
Ultimo episodio

206 episodi

  • Learning Bayesian Statistics

    Bayesian Statistics vs Epistemology, with Vaden Masrani

    29/06/2026 | 1 h 40 min
    Support & Resources
    → Support the show on Patreon
    → Bayesian Modeling Course (first 2 lessons free)

    Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work

    Takeaways:
    Q: What's the difference between Bayesian statistics and Bayesian epistemology?
    A: Bayesian statistics uses Bayes' theorem on actual data: you put a prior over parameters, combine it with a likelihood, and the data is allowed to tell you your model is wrong. Vaden loves it. Bayesian epistemology, in his tongue-in-cheek phrase, is "Bayesian statistics minus the statistics" - taking Bayes' theorem as a general account of how anyone should reason under uncertainty, including about events where there is nothing to count. The first is falsifiable and grounded; the second, he argues, lets people attach authoritative-sounding numbers to pure belief.

    Q: Why is it a problem to put a probability on a one-off future event like human extinction?
    A: Because there are no statistics behind it. Vaden's trigger example is Toby Ord's The Precipice, where a data-derived probability (supervolcanoes per millennium) is placed side by side with a probability of extinction-by-superintelligence that came from no data at all. His reaction is the statistician's first instinct: where are the numbers coming from, and what could ever make them come out differently? A subjective degree of belief is fine as a hunch. The trouble starts when it is communicated as though it were an objective, data-grounded frequency.

    Q: What does Vaden Masrani actually like about Bayesian statistics?
    A: The freedom to encode domain knowledge as a prior and have the result respect common sense - estimating an average human height, you can rule out zero and a hundred feet before seeing a single measurement. But the part he keeps stressing is falsifiability: you fit the model, compare it to data, and the data can tell you the model was bad. That contact with reality is exactly what makes the statistics legitimate and what the epistemology lacks. On Bayesian-versus-frequentist for engineering problems, he says he has no dog in the fight -- both are useful, and any working statistician uses both.

    Full takeaways here

    Chapters:

    00:24:01 What's the difference between Bayesian statistics and Bayesian epistemology?
    00:33:12 How can Bayesian epistemology lead to bad real-world decisions?
    00:36:36 Is Bayesian or frequentist statistics better for real-world problems?
    00:39:31 What is the problem of induction, and how does Bayesian epistemology try to solve it?
    00:43:50 What are the main logical problems with Bayesian epistemology?
    00:48:40 What is Popper's critical rationalism, and how does falsifiability fit in?
    00:52:31 How does critical rationalism work when you can't run a clean experiment?
    01:15:03 Why should you treat criticism as a gift, even when it hurts?
    01:19:54 How do Stoicism and equanimity help you handle criticism?
    01:23:19 Why does critical rationalism apply to everyday life, not just science?

    Thank you to my Patrons for making this episode possible!

    Links from the show here
  • Learning Bayesian Statistics

    Why Bayesian Statistics Is More Computational Than Ever

    19/06/2026 | 4 min
    Today's clip is from Episode 158 featuring Stefan Radev. In this conversation, Alex Andorra and Stefan break down a core argument from their paper: Bayesian statistics has never been more computational than it is now, and simulation is the thread that ties the whole workflow together.

    Stefan parcellates the Bayesian workflow into four stages, and this clip covers the first two. Stage one is model specification, where the workflow community has long recommended prior predictive checks. You can do this informally, just running simulations from your model and eyeballing whether the output meets your expectations, or formally, à la Michael Betancourt, by pushing your model's high-dimensional output through a transformation into a low-dimensional, interpretable space and checking it against reality.

    The punchline: a surprising number of models can be discarded before you've even seen real data, yet Stefan notes these checks remain underused in practice.

    Stage two is model verification, where the question shifts to whether your inferences are well calibrated. This is the territory of simulation-based calibration and parameter recovery studies, classic tools that have always carried a steep computational price. You simulate thousands of synthetic datasets and run inference on every single one, which is exactly why these checks are so often skipped in papers, even though doing one well can be a contribution in its own right.

    Here's where amortized simulation-based inference changes the math entirely. Checks that used to take days now take seconds, and instead of laboriously running inference dataset by dataset, you get millions of posterior samples essentially for free. The calibration checks that the field has always known it should be doing finally become cheap enough to actually do.

    Get the full discussion here

    Support & Resources
    → Support the show on Patreon
    → Bayesian Modeling Course (first 2 lessons free)

    Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
  • Learning Bayesian Statistics

    Exact GPs vs Approximations: When to Use Each (and Why It Matters)

    10/06/2026 | 4 min
    Today's clip is from episode 159 featuring Matthijs Hollanders. In this conversation, Alex and Matthijs dig into a deceptively practical question: when you're modeling wildlife across space and time with Gaussian Processes, how do you keep the math from becoming computationally unbearable - and what does good engineering actually look like in the field?

    Matthijs explains that for most real camera trapping datasets, exact GPs still hold up fine. The reason is less about clever math and more about ecological reality: researchers are usually resource-constrained, so datasets tend to be a few hundred sites, not thousands.

    And when datasets do get large, they're rarely one giant connected grid - they're clusters of independent regions. That structure is exploitable. Run a separate, smaller GP per region, share the hyperparameters, and you avoid building the massive covariance matrix that makes exact GPs expensive in the first place.

    But the more interesting thread is where this is heading. Alex introduces Hilbert Space Gaussian Processes (HSGPs) - an approximation that makes compute time nearly linear in dataset size, rather than cubic. The catch, as Matthijs points out, is that approximations aren't always better: if your dataset isn't large enough to be in the regime where the approximation accuracy kicks in, you're better off with the exact GP and its mathematical guarantees. The rule of thumb is simple - if you can use the vanilla GP, just do it.

    Get the full discussion here

    Support & Resources
    → Support the show on Patreon
    → Bayesian Modeling Course (first 2 lessons free)

    Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
  • Learning Bayesian Statistics

    #159 Bayesian Occupancy Models, with Matthijs Hollanders

    08/06/2026 | 1 h 26 min
    Support & Resources
    → Support the show on Patreon
    → Bayesian Modeling Course (first 2 lessons free)

    Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work

    Takeaways:
    Q: What is a Bayesian occupancy model and what problem does it solve?
    A: An occupancy model accounts for the fact that you don't always detect a species when surveying for it, especially when the species is rare. A naive count of where you found it underestimates true occupancy. The model adds a repeated-measures component: you visit each site multiple times, and from the pattern of detections vs. non-detections it estimates a detection probability. Matthijs framed it as a zero-inflation structure where the zero-inflation happens at the site level rather than the observation level -- which keeps the model conceptually simple, just a standard GLM with a Bernoulli “is the species here at all?” stacked on top of a detection-rate process.

    Q: What are Automated Recording Units and why don't traditional occupancy models handle them well?
    A: ARUs are camera traps and acoustic monitors that record continuously over deployment periods of days, weeks, or months. The data they produce isn't a sequence of discrete human-led surveys; it's a continuous-time observation stream. Traditional occupancy models were designed for the discrete case -- a human visits a site, records yes or no, goes home. With ARUs, the question becomes how to bin or threshold the continuous data without losing the richer signal it actually contains.

    Q: When should you not reach for occARU?
    A: When your dataset is large and your survey interval is fine-grained. The bottleneck is Stan's fitting speed -- years of daily count data across many sites will fit slowly. The workaround is to bin coarser (weekly or monthly), which doesn't hurt occupancy estimation at all and only loses some detection-rate resolution. If you're only interested in occupancy, big grouping windows are fine.

    Full takeaways here

    Chapters:
    00:12:14 What is an occupancy model and what problem does it solve?
    00:16:16 What are Automated Recording Units and why do they need different models?
    00:18:45 What is the occARU R package and why does it exist?
    00:23:55 Why does occARU model counts directly rather than binary detection?
    00:26:38 What does multi-species hierarchical modeling with Gaussian processes look like?
    00:32:22 How does occARU implement Gaussian processes efficiently?
    00:41:01 Why are Gaussian processes such a powerful but tricky modeling tool?
    00:44:11 What is variance decomposition with global-local shrinkage priors?
    00:49:02 How does occARU leverage recent Stan features for zero-sum constraints?
    00:57:37 When does within-chain parallelization actually help?
    01:01:30 How does Monte Carlo integration reduce high Pareto-k values?
    01:15:27 When does occARU underperform and what's on the roadmap?

    Thank you to my Patrons for making this episode possible!
    Links from the show here.
  • Learning Bayesian Statistics

    Can AI Learn What Experts Know? Automating Prior Elicitation with Generative Models

    02/06/2026 | 4 min
    Today's clip is from episode 158 featuring Stefan Radev. In this conversation, Alex and Stefan explore a genuinely fascinating problem: how do you turn an expert's intuition into a mathematically valid prior distribution - and can AI help automate that process?

    Alex explains that prior elicitation is essentially a translation problem. Experts don't walk around thinking in probability distributions - their knowledge lives in intuitions, rules of thumb, and rough ranges. The challenge is converting that into something a Bayesian model can actually use.

    The traditional approach? Ask an expert for quantiles or a mean, then parameterize your prior with hyperparameters and simulate until the model-implied quantities match what the expert described. If your pipeline is differentiable end-to-end, you use gradient descent. If not, you fall back to something like Bayesian optimization. Either way, you're iterating toward a prior that genuinely reflects expert knowledge - not just a convenient assumption.

    But the really exciting part is what came next. In a follow-up paper, they pushed this further: instead of optimizing within a fixed parametric family (say, a Gaussian), they replaced the prior entirely with a normalizing flow - a flexible generative network - and ran the same procedure. No assumed distribution family. Just let the data and the expert's knowledge shape the prior from scratch.

    The catch? More flexibility means more non-identifiability and stability headaches. But the direction is clear: a fully automated, end-to-end pipeline for building priors from non-probabilistic expert knowledge. And in 2026, that pipeline could theoretically be driven by an agent.

    Get the full discussion here

    Support & Resources
    → Support the show on Patreon
    → Bayesian Modeling Course (first 2 lessons free)

    Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
Altri podcast di Scienze
Su Learning Bayesian Statistics
Are you a researcher or data scientist / analyst / ninja? Do you want to learn Bayesian inference, stay up to date or simply want to understand what Bayesian inference is? Then this podcast is for you! You'll hear from researchers and practitioners of all fields about how they use Bayesian statistics, and how in turn YOU can apply these methods in your modeling workflow. When I started learning Bayesian methods, I really wished there were a podcast out there that could introduce me to the methods, the projects and the people who make all that possible. So I created "Learning Bayesian Statistics", where you'll get to hear how Bayesian statistics are used to detect black matter in outer space, forecast elections or understand how diseases spread and can ultimately be stopped. But this show is not only about successes -- it's also about failures, because that's how we learn best. So you'll often hear the guests talking about what *didn't* work in their projects, why, and how they overcame these challenges. Because, in the end, we're all lifelong learners! My name is Alex Andorra by the way. By day, I'm a Senior data scientist. By night, I don't (yet) fight crime, but I'm an open-source enthusiast and core contributor to the python packages PyMC and ArviZ. I also love Nutella, but I don't like talking about it – I prefer eating it. So, whether you want to learn Bayesian statistics or hear about the latest libraries, books and applications, this podcast is for you -- just subscribe! You can also support the show and unlock exclusive Bayesian swag on Patreon!
Sito web del podcast

Ascolta Learning Bayesian Statistics, Piante come noi e molti altri podcast da tutto il mondo con l’applicazione di radio.it

Scarica l'app gratuita radio.it

  • Salva le radio e i podcast favoriti
  • Streaming via Wi-Fi o Bluetooth
  • Supporta Carplay & Android Auto
  • Molte altre funzioni dell'app