ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

How to Perform Bayesian Linear Regression in Python + R

Matt Rosinski

Senior Data Scientist | Business Insights | Causal AI

å‘å¸ƒæ—¥æœŸ: 2023å¹´4æœˆ4æ—¥

In the previous edition of Data Science Code in Python + R we walked through how to build your own data set with spotifyr. This week we will be performing Bayesian linear regression with this dataset.

One of the biggest challenges to getting started with Bayesian regression is setting up your environment. Datacamp style courses on Bayes tend to gloss over this issue. But before we launch into that let's start with a little motivation. Why on earth would you go to all this trouble of installing Stan and C++ toolchains when you can simply use ordinary classical linear regression?

Why go to the trouble of Bayesian regression

There are several potential reasons for using Bayesian approaches, but one is that you can estimate the probability distributions for your model parameters. This is something that the classical linear regression approach does not give you since it assumes that these parameters are fixed and your data are random. The Bayesian approach is to take the opposite view that the data you have, your priors, are fixed and the parameters are random. This leads to different types of abilities. Of particular interest to me is the ability to estimate the uncertainty in your parameters. This can be very useful for decision making where you can estimate your chances of being correct in your estimates of model parameters. The other is the ability to extend your data simulation skills.

Bayesian regression with Stan

So today we are just going to get you started now that you might have a little more motivation to learn more. Our goals are simply to:

Install C++ toolchain
Install PyStan and/or RStanArm
Verify your installation is working with your first model

On top of getting your environment ready for Bayesian regression you may also need to learn a new programming language. The overwhelming majority of recommendations I have found are to use Stan, primarily because of its popularity and community support. Stan is a language developed by Andrew Gelman and collaborators. The documentation and installation instructions can be found here.

If you don't fancy launching into learning a new language the R code utilises a package that has abstracted the Stan language away from the user. This will allow you to get started with building Bayesian models quickly. The Python code example will show you how simple linear regression can be performed using Stan.

Installation

You need to configure your R installation to be able to compile C++ code. Follow the links below for your respective operating system for more instructions:

To work with PyStan we also need to install a C++ compiler: gcc â‰¥9.0 or clang â‰¥10.0. But the first caveat you need to realise is that, as mentioned in the documentation, installing PyStan on a Windows machine is a little challenging, please don't waste your time. My recommendation for if you are using Windows is to setup Windows Subsystem for Linux (WSL). This video from Learn Stan with Ric will run you through the entire process including gcc installation.

Following the pystan 3.6.0 documentation you can use pip

python3 -m pip install pystan

Installing RStanArm should be far simpler and I recommend you install the current CRAN version following the instructions in the Stan documentation.

The last tip if you are working in Windows is to launch VSCode from WSL using the terminal command >> code. This should install and open a VSCode IDE within your WSL virtual Linux machine. An alternative is to install and launch Jupyter lab but I found running Bayesian simulations with PyStan to be simpler directly in VSCode.

é¢†è‹±æŽ¨è

Data Analysis with Seaborn: Analyzing Data Using Visualizations

Data Analysis with Seaborn: Analyzing Data Usingâ€¦

Benjamin Bennett Alexander 5 ä¸ªæœˆå‰

Python 3.12: Unpacking Three Exciting New Features

Benjamin Bennett Alexander 1 å¹´å‰

Python and Data Analysis: A Match Made in Tech Heaven

Quantum Analytics NG 1 å¹´å‰

Performing Simple Bayesian Linear Regression in R

Once you have completed the installations you should be ready to attempt your first Bayesian regression. To begin gently we will use the RStanArm package which is a high level API for Stan using standard R functions. In the code below we will import the spotifyr data we generated in the previous edition of Data Science Code in Python + R. As you can see the syntax is very similar to a standard linear regression.

To perform to the same task using PyStan on a Jupyter notebook we will need to do quite a bit more work.

Bayesian Linear Regression Using PyStan on a Jupyter Lab Notebook

The first step is simple enough.

But in the next step we need to import stan and set up the code using Stan syntax.

Please refer to the Stan documentation for more details about the Stan language. ChatGPT also does a great job of translating RStanArm code into Stan so you might want to try that approach!

The next thing you need to do is compile the model. As you can see below, we need to import nest_asyncio to run Bayesian regression from within a Jupyter lab notebook. I've used the interactive jupyter window within VSCode for the example below.

model fit parameters for stan model from Python

If you look at the mean column in the resulting summary_df above you will notice that these parameters are very close to those estimated using RStanArm, but with a lot of additional work!

Summary

If you just want to try Bayesian Regression out I highly recommend the RStanArm package in R. It will get you up and running far quicker than running on Python. If you want to try out the Python example, be prepared for some false starts and a bit of additional pain, especially if you are working in Windows. Next week we will discuss post simulation checks and making predictions with our Bayesian model. If you made it this far, however, take a well-earned break and we'll see you next week!

If you enjoyed this edition of Data Science Code in Python + R like, subscribe and share with your friends who are interested in Bayesian regression.

~ Matt

Better Decisions with Data

28,512 ä½å…³æ³¨è€…

è®¢é˜…

Alessio Polymeropoulos

1 å¹´

Awesome work, but I believe it is more valuable for beginers especially to provide all the available steps in R with pure MCMC and not with stan software. Also Bayesian model selection within linear regression it is a very hot topic but very challenging.

èµž

å›žå¤

2 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Matt Rosinskiçš„æ›´å¤šæ–‡ç«

How to Build a Hierarchical Bayesian Model with PyMC (and Make a Comeback)

2024å¹´10æœˆ9æ—¥

How to Build a Hierarchical Bayesian Model with PyMC (and Make a Comeback)

Imagine you're at a packed stadium, heart pounding as your favorite NFL team trails by six points with just ten minutesâ€¦
How to build a hierarchical Bayesian model (and include team-specific effects on win probability)

2024å¹´9æœˆ25æ—¥

How to build a hierarchical Bayesian model (and include team-specific effects on win probability)

By Matt Rosinski In the world of sports analytics, predicting the outcome of an NFL game at any given moment is both aâ€¦

10 æ¡è¯„è®º
How to estimate the chance your NFL team will win (even if the game has already started)

2024å¹´9æœˆ11æ—¥

How to estimate the chance your NFL team will win (even if the game has already started)

Imagine youâ€™re watching a tense NFL gameâ€”your team is ahead by just a few points, and thereâ€™s only 10 minutes left onâ€¦

3 æ¡è¯„è®º
How to Make Better Decisions with Data (and Leverage Your Subject Matter Expertise)

2024å¹´9æœˆ10æ—¥

How to Make Better Decisions with Data (and Leverage Your Subject Matter Expertise)

Announcing the New Season of "Better Decisions with Data"! Iâ€™m excited to kick off a new chapter for my newsletter! Asâ€¦

4 æ¡è¯„è®º
The Eye Test: How to Find Conditional Probabilities Using Multi-Dimensional Arrays

2023å¹´7æœˆ18æ—¥

The Eye Test: How to Find Conditional Probabilities Using Multi-Dimensional Arrays

Welcome to your virtual data science eye test! Today we will be fitting you with some multi-dimensional lens that willâ€¦
Monte Carlo Simulation: How to Model Labour Requirements for a Call Centre (and the Data Generating Process)

2023å¹´6æœˆ27æ—¥

Monte Carlo Simulation: How to Model Labour Requirements for a Call Centre (and the Data Generating Process)

Monte Carlo simulation is a powerful technique used in data science, engineering, and business to model and analyzeâ€¦

3 æ¡è¯„è®º
How to Build a Faster Bayesian Linear Regression Model with Bambi + BRMS (Even With NUTS)

2023å¹´6æœˆ20æ—¥

How to Build a Faster Bayesian Linear Regression Model with Bambi + BRMS (Even With NUTS)

In a previous edition of Data Science Code in Python + R I demonstrated the use of rstanarm in R and PyStan in Pythonâ€¦

5 æ¡è¯„è®º
The Chance Framework: How to Explain A/B Test Results to Managers Using Probability (Without p-values)

2023å¹´6æœˆ13æ—¥

The Chance Framework: How to Explain A/B Test Results to Managers Using Probability (Without p-values)

Introduction A/B testing is a staple in the world of data science, deployed to test changes and optimizations to aâ€¦

2 æ¡è¯„è®º
Revise Your Priors: Updating Marketing Metrics with Bayesian Analysis in Python + R

2023å¹´6æœˆ6æ—¥

Revise Your Priors: Updating Marketing Metrics with Bayesian Analysis in Python + R

When it comes to marketing analytics, such as email click-through rates, we often face a challenge of data scarcity. Inâ€¦

5 æ¡è¯„è®º
Bayesian Methods: A Powerful Tool for Estimating Conversion Rate Uplift

2023å¹´5æœˆ30æ—¥

Bayesian Methods: A Powerful Tool for Estimating Conversion Rate Uplift

In our data-driven world, conducting A/B testing to compare conversion rates between two different designs orâ€¦

2 æ¡è¯„è®º

See all articles

How to Perform Bayesian Linear Regression in Python + R

Matt Rosinski

Senior Data Scientist | Business Insights | Causal AI