Messy, complicated, and gold.
Meenakshi (Meena) Das
CEO at NamasteData.org | Advancing Human-Centric Data & AI Equity
Welcome to Data Uncollected, a newsletter designed to enable nonprofits to listen, think, reflect, and talk about data we missed and are yet to collect. In this newsletter, we will talk about everything the raw data is capable of – from simple strategies of building equity into research+analytics processes to how we can make a better community through purpose-driven analysis.
Remember last week I asked you a question:
What is the common theme in the picture?
Answer: pure, raw text generated from each scenario.?
?I mean, there could be more themes, sure. So the guesses you made are going into my list of topics. For today, this is what I want to talk to you about – the text we generate.?
Messy, complicated, and gold – that's how I think about a text or qualitative data. And, reading on that has been keeping me busy to pause writing last week (gosh, I missed you last week!).?
Let’s start with some examples to understand just how much text data we are talking about.
?And the boxes you see above are only?four?scenarios where you are already collecting text data. There are plenty more. Remember, all the text data collected are in different formats, available in varying degrees of cleanliness, and perhaps not all point to the same constituents. For example, text data from donor surveys is about a different population than post-event surveys, where not all your donors become your event attendees.
So, why is this data important? What insights could it provide, and why is the processing of this data a challenge? Let's break down those questions into smaller components.
3 reasons why text data is essential:
1.???You get to know the sentiments of your constituents beyond the limitations of quantitative data.
For example, take the Likert-scale question ("How would you rate product A for its efficiency?", with options being Very effective, Effective, Somewhat effective, Not very effective, and Not at all effective). When asked to a diverse population vs. a homogenous population, Likert-scale questions have different outcomes. A strategically placed, follow-up text question can help to capture anecdotes around the extreme responses. For example, say those who selected "Not at all effective" are asked, "Please consider sharing your thoughts around the effectiveness of Product A". We'll talk more about this in a near-future edition.?
2.???You get to augment your coded/quantitative data in predictive modeling.
Now I know this is a broad and subjective topic – that is, to what extent can your text data augment your models. Especially depending on what's the source of the text data – text from 25 gala members is going to look different than 350 survey respondents.
Remember, a simple way to know if your text data supports your modeling or not is simply by ensuring that the source of the text data is relevant to your objective question of the model. When appropriate, this data can be beneficial in the model. Your processed and coded text data in the models can help you capture some of those points that were never captured quantitatively in the first place.
For example, say you are building a predictive model for the next likely major gift donors from your past 5 years of active supporters pool. An uncaptured data point could be an indicator of non-committal prospect affinity. Say, a fundraiser met a prospect who didn't make a gift right away. However, in his post-meeting notes, the fundraiser made a note for the prospect to be "keen, made the meeting on time, interested, and asked questions". Texts like these can give you a directional indication of people interested in your work.
3.???Consciously coding text data obtained from a diverse population can help check those biases that creep into data.
?I will share a recent real-life experience I had with a local nonprofit. One of the organizations that I support financially works towards helping journalists and ethical journalistic practices. I have been consistently giving to their mission in the past few months – all within $100 (including the amount to make a point). I have also left lots of positive comments and feedback whenever I could (feedback pages, emails, Google reviews, etc.) A few days ago, in one of their monthly newsletters, I found a program that got my interest. They were raising money for journalists from marginalized communities. The newsletter had a picture and description of the program. I found a contact email in the description and reached out to ask for more details. The response I received back within a day was, "This is a program for high net worth selected donors. We don't think this is an appropriate program for you. However, you may be interested in [xyz]….".
?I admit – I was deeply disappointed to read that response. Not being included for learning more details, despite my regularly uplifting comments and (albeit small) gifts, seemed like a missed opportunity for them to have a potentially more vital supporter.
Though this example has many points to unpack - something to reiterate here is code that text data! You may have supporters in your data – giving you feedback, positive comments, suggestions, constructive questions, only wanting to engage further.
领英推荐
Challenges of processing text data:
?So, if text data is so valuable, what are the challenges in processing it?
There are primarily two ways you can code your text data – manually/semi-manually or using modeling.
Manually/semi-manually means you or someone from your team will go over the text data, line by line, to code it. This way of processing the text can be either be
This is useful but time consuming.
Alternatively, you can also use modeling to classify, segment, and analyze text. However, the challenge of using modeling is that you need to build time to learn and experiment with this technique. Also, every time you are making a new text classification model, you may need to spend time in exploration, some cleaning, and pre-processing. Using a modeling technique does not mean you can get rid of manually exploring your data. Of course, you may still need to do it, except you save time at coding.
3 points to remember for using text data in predictive modeling:
?Some AI-based products do help in this classification. Now, we can dedicate a whole separate edition to text processing with modeling techniques. However, here are 3 essential things I want you to remember for now when you are exploring text data with predictive modeling in real-time:
For example, texts with explicit statements around specific ethnicity, race, or sexual orientation are often found with toxic sentiments. Therefore, unless you check your text data carefully enough before feeding it into your models, chances are texts related to some marginalized social identifiers may be misclassified and rendered as "not useful".
2. Before leveraging modeling on text data, define an outcome to use the coded text data from modeling. Do you have enough text supporting the different categories/codes you are looking for to achieve that outcome?
3. Understand the evaluation metrics your model uses – what metrics of success are used to confidently classify and segment the text.
?
I understand I am exploring lots of mini themes here. Now, you see why I needed more time last week to read? Send me a message if you are reading this, agree with the general topic here, and yet feel overwhelmed. Text data in the modeling ecosystem is a broad topic, made of many minor points about data collection, intent, technique, and analysis. Start small and build your comfort with this topic in easy steps.
?Like I said, messy, complicated, and gold – that's text data for you.
***?So, what do I want from you today (my readers)?
Today, I want you to
?
***For those reading this newsletter for the first time,?here is some intro of this newsletter?for you. :)
?
Content Consultant for Nonprofits and the People and Companies Who Serve Them
3 年Your experience with the journalism nonprofit reminds me of our conversation last week about what nonprofits miss out on when they only look at one kind of data, or only consider giving capacity as a generosity marker–passionate and engaged donors like you!
Nonprofit fundraiser focused on equity and community
3 年(in best Mommie Dearest voice) No more dirty data!!!
CEO at NamasteData.org | Advancing Human-Centric Data & AI Equity
3 年Marisa DeSalles You were right last week- flawed assumptions do deliver dirty data, which supports making bad decisions. I am pulling that theme around texts/qualitative data. Guess we have more to chat about in the next coffee!