Longitudinal Multilevel Modeling: A Fundamental Pillar in the Architecture of Machine Learning and Deep Learning Algorithms
Kay Chansiri, Ph.D.
Research Scientist | ML & GenAI for Social Impacts | Human-Computer Interaction
In today's post, we will diverge from our usual focus on machine learning and AI to delve into the world of longitudinal multilevel modeling (MLM). This statistical approach is a foundational element underpinning ML algorithms, such as Generalized Linear Mixed-Model Trees, Neural Networks with Hierarchical Structures, Deep Learning Models for Structured Data, and Cluster Analysis in Unsupervised Learning. A thorough understanding of MLM provides an excellent starting point for advancing further in your machine learning journey.
My latest article on cross-sectional multilevel modeling, published by Towards Data Science, has amassed over 40,000 reads so far. Almost 3 years since its publication, it's time for me to revisit the topic, this time focusing on longitudinal data. But before delving deeper, let's start with a quick quiz.
Imagine you're the lead data scientist at a streaming service company. Your team aims to test whether a new feature of the company's streaming channel enhances customer satisfaction over three months. Two junior data scientists present you with different plots of user satisfaction. The first shows average satisfaction change over time for all users, represented by a single average line (Plot 1). The second displays a more complex plot (Plot 2), showcasing variation in user satisfaction at the baseline and their differing trajectories over time.
As the lead data scientist, which plot would indicate that your junior colleague has programmed a model accurately reflecting real-world consumer behavior? Also, what are the mathematical equations behind each plot?
Plot 1:
Plot 2:
In my newly published article on GitHub, I explore these questions and discuss the step-by-step process of conducting longitudinal MLM. Using the streaming service example, the article covers the following topics:
?? Why does the flexibility of MLM outperform Repeated Measures ANOVA in longitudinal projects? Are the error terms across these two methods similar? I discuss the contexts in which you should use one analysis method over the other.
?? What are MLM terminologies? If you're confused about the differences between fixed versus random effects, level 1 versus level 2 equations, residuals versus random effects, balanced versus imbalanced design, and variance versus covariance in model interpretation, this article is for you.
?? I also discuss general notations in MLM. If you find it challenging to understand the differences between γ01, γ11, β0j, β1j, and so on, let's unpack how you can effectively discern the meanings behind these numbers and Greek symbols, and how that would help you understand the clustering patterns of your data.
? The article features the end-to-end process of model building, starting from null models to models with random effects. You will explore how the Likelihood Ratio Test (LRT), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) can serve as valuable tools in this assessment and how to interpret these values.
?? Lastly, I discuss the types of Intraclass Correlation Coefficient (ICC). What does it mean when you encounter an ICC that is close to zero or one?
Dive into my full article on GitHub for a blend of theory and practical examples for your future multilevel modeling projects.
?? Link to my GitHub post
#DataScience #MachineLearning #Statistics #LongitudinalData #AI
Great breakdown! Understanding longitudinal MLM is like unlocking the secrets behind powerful AI, turning complex concepts into real-world applications.