Ask an Analyst | What is a Model?
John P. Gough
Assistant Vice President, Editor of the Journal of Advancement Analytics
Do you work in fundraising? Have you ever wondered about what all those people in development/advancement services are doing at your organization? Have you ever sat in a meeting with your data people and thought; what on earth are they talking about? If so, this mini-series is for you! Over the course of the next 12 months I’ll be answering questions frequently asked by non-technical fundraising professionals. If you have a question that you would like answered, feel free to leave it in the comments section below.
What is a Model?
'Model' is a term that gets thrown around quite a lot in our world of big data; model this, and model that - models seem to be the magical solution for everything. But what is a model, really? Surely your data analyst doesn’t have the figure for professional modeling (at least the analyst writing this doesn’t) so what could they possibly be doing?
Put simply, a model is a depiction or description of behaviors, relationships, and/or processes found in the real world. These models are built using either mathematical or graphical representations of real-world relationships.
When data people start talking about models there are two broad categories of model they are likely referring to:
- Data Model
- Predictive Model
Data Models
A data model is a representation of the relationships that exist within or among data themselves and are created to help design information storage systems like databases to store facts about real-world processes. Within this category of model there are three different levels: conceptual, logical, and physical.
Conceptual models are high-level representations of relationships that exist between entities. An entity is simply an object, person, process, or concept that interacts with other objects, people, processes, or concepts. For example, a conceptual model describing the relationship between a major gift officer and a donor would look something like this:
This model is constructed using something called Chen notation. The rectangles represent entities, the diamonds represent relationships, and the M:N describe the relationship. In this case we have what is called a many-to-many relationship: meaning many donors can be solicited by the same gift officer, and a donor can be solicited by many different gift officers. These models can become very complex as they are built to describe real world processes:
The next level of data model is called a logical model. This shows the actual data structures in relationship to each other and is derived from the conceptual model. A simple logical model looks something like this:
Finally, the physical model actually maps out the location of each data point in a specific database system with the system names and something called a domain which among other things simply means an attribute’s data type (e.g. birthdays are dates, ages are integers, and names are text, etc.).
Having good data models that accurately reflect the reality of your processes and business rules is key to building efficient systems that make data both easy to enter and extract.
Predictive Models
Statistical Models
There are many different types of predictive model out there. Perhaps the most well-known are statistical models; the simplest belonging to the general linear model family. Most of us will have gone through at least one statistics course in our lives and will have hopefully encountered linear regression. Put simply, linear regression is a method of fitting a line to a set of data to best describe the relationships between the variables within that data to predict an outcome. Once the relationships have been described, we can use those descriptions to make predictions about the future. To illustrate this, I’m going to leave the world of fundraising and enter the world of diamonds with no justification other than to say that diamonds are shiny and fun to talk about.
Suppose I was looking to get engaged and wanted to buy a solitaire diamond ring. I don’t want to overspend, but I also want to get the best bang for my buck. Further, suppose I also had at my disposal a large data set describing attributes of many individual diamonds and their sale price in dollars.
I could use this information to build a model that would predict the price I should be prepared to pay given any combination of characteristics. There are many assumptions associated with linear regression which I won’t go into here, but let’s assume that for now I’m only interested in the relationship between the size of a diamond in carats and its price, and want to build a model that will predict a diamond’s price based on only its carat weight. Here’s where a data scientist (which is what we like to call ourselves) would turn to a software tool like SPSS or R to fit (fit is a fancy term for build) the model so we don’t have to do all of the math by hand. In R, I would write a piece of code that would take my two variables and produce a model:
Out of all this confusing output, I’m really only looking for a few key things. First, I want to see that the 'Pr(>|t|)' values are all below .05. R will let me know this by placing a ‘*’ next to the value. This simply means that the predicted values could not have happened by random chance. Next I want to see what the 'Adjusted R-squared' is; here it is .84 which means that the model explains 84% of the variability in the price. This is really good - by carat weight alone the model can account for the majority of the variability in price between the different diamonds. (I guess size really does matter.) Finally, I want to look at the 'Estimate Std.' which is what we refer to as a model coefficient. For carat the coefficient is 7756.43, which means that for every increase of one unit in carat weight – we should expect the price of the diamond to increase by $7,756.43.
All of this can be reduced to a simple equation:
y = a + bx
or
Diamond Price = -2256.36 + (7756.43 * Carat)
So, if I wanted to predict the price of a one carat diamond I would enter the number into the equation as follows:
5500.07 = -2256.36 + (7756.43 * 1)
According to the model, for a one carat diamond I should expect to pay about $5K. A quick Google search at the time of writing found that the average price of a loose one carat diamond on Google Shopping was $5,523. Not too bad!
Regression isn't just good for the price of diamonds; I could use regression to predict the number of gifts a gift officer will close in a given year based on several key performance indicators (KPI's) like the number of prospects they have in a given stage, how long those prospects have been in that stage, the number of in-person visits they've made this FY, the number of written solicitations they have delivered, etc.
Machine Learning
Another branch of predictive modeling is called machine learning. This type of modeling is often referred to as black box modeling as it is left to the computer to find relationships and build models that predict behavior, but these models often do not explain behavior. One such technique is called neural networking. Much like the synapses in our brain, nodes are created virtually that are then trained and used to predict an outcome. By training we mean that we provide the computer a test case where the outcome is known; for example we show it an image of an apple and tell it that it is looking at an apple. We then show it another image and it is allowed to determine if it is looking at an apple. If it answers correctly it reinforces the nodes that predicted the apple; if it answers incorrectly it readjusts the nodes and moves on to the next image. At the end of the exercise we are left with a series of nodes with numerical weights that don’t really explain much about what makes a picture an image of an apple (it won’t tell us that apples are round and red for example), but it will allow the computer to correctly identify an image of one.
All of these techniques can be applied to fundraising to predict donor behavior and provide insight into the relationships that exist between donors, institutions, gift officers, and philanthropic behaviors.
In Conclusion
Data is all around us and the way it is structured and then analysed is important for providing insight into the work that we do as fundraisers. Structuring our day-to-day business data in a way that supports our processes and then ensuring the quality of that data allows us to ask robust and sometimes complex questions with confidence. The tools and methods we use to answer those questions are rapidly evolving and the frontier for analytics in every industry, not just fundraising, is expanding and full of potential. So the next time you are entering your contact reports, remember, you are contributing to the models that are predicting the donors of tomorrow.
About the Author
John Gough is the Director of Reporting and Analytics in the Office of Development at the University of Texas at Austin. He is also on faculty at the University of Illinois' School of Information Science where he teaches graduate courses as an adjunct lecturer in database design and business analytics.
VP of Development and Alum Engagement/Strategist/Leader/Visionary
7 年Hi John, I would like to participate in your mini course over the next 12 months. As an accounting major in college, I actually took two stat classes; however, I am in need of a refresher! I’m particularly interested in predictive modeling and the relationship between close rates of portfolios with a large discovery mix. This is common in art schools.
Assistant Vice President for Development Services at University of Washington
7 年Great idea John P. Gough
Leading the way on cloud data excellence, generative AI and BI and machine learning.
7 年Very nicely done.