Don't forget small data

Don't forget small data

“Great things are done by a series of small things brought together.” - Vincent Van Gogh

 Practically every meeting I have been to lately, someone drops “big data” and “AI” into the conversation (like they’re sprinkling shaved truffles on spaghetti) . I have to resist the urge to roll my eyes. Please don’t get me wrong, I completely agree that big data analytics, AI and even blockchain can revolutionize the way we store and use data, and all of us will be irrelevant if we don’t understand and incorporate these rapidly evolving tools. However, where we operate in the social impact sector, it is often the small nodes of data that we can ascertain the most from in terms of organizational capacity and potential impact. Not to mention, small data is usually all we have.

 What exactly is small data? In this context, “small” data refers to small sets of particular indicators with a low volume of responses. In other words, big data is essentially a lot of small data. Where big data enables extensive statistical analysis and has immense predictive capabilities, small data can be misleading. Small data sets leave more room for selection/sampling bias, outliers, human inference and manipulation. All this makes small data appear unreliable and therefore irrelevant, but quite the opposite is true. Even if it remains unreliable at times, small data is critical to ensuring social impact is achieved.

 Small data is critical for the very simple reason that it allows us to measure impact for very specific target groups, particularly vulnerable or underserved populations, and it forces us to recognize unique issues faced by those individuals or communities, which may be discounted as outliers in a bigger data set. In these cases, we need small data to be representative.

 Our job is to make small data more efficient to gather and reliable to analyze despite the low volumes. We use small data to diagnose the capacity of organizations, to monitor social impact initiatives and to measure their impact. We employ techniques that are highly customizable and adaptable to the context and population that we are measuring. Because of its challenges, the approach to collect and analyze small data sets is what is determines its value. The devil is in the details.  

Firstly, when we set out to collect small data, the design of our collection tools and processes needs to account for the potential blind spots. Effective data collection tools are often those that can be incorporated into regular operations rather than trying to create an entirely new process. The most effective data collection will be supported by efficient systems. This doesn’t always require a robust, fully customized database, but it should have an element of technology to store and organize data securely and consistently. Data collection tools and processes should follow an iterative innovation process. Repetition, trial and error, redesign – all of these are essential. We don’t know what we don't know, and we’ll only find out by bumping into it. Most importantly, and this is a common pitfall, we shouldn't overcompensate for small data by trying to collect lots of data. It will be easier to dissect and analyze a clear and condensed data set.

Secondly, how we analyze these data sets will determine whether we fall into the various traps created in the absence of larger data sets. Systems are still essential but with a caveat. Systems are great for analyzing big data sets because they have a lot to work with. For small data sets, they are great for organizing data, but they are likely to be less effective than a trained human eye for drawing conclusions. This human eye needs to be conscientious in the analysis process, leveraging the system’s analytic tools without relying solely on them. Experience counts for a lot in small data in recognizing patterns and outliers, identifying valid causal links and disqualifying irregular correlations. In the analysis of small data, a researcher should borrow from multiple frameworks with a trial and error approach. Thus far, there has been no silver bullet in measuring impact. The monitoring and evaluation of organizations is dependent on an individual’s ability to sample intelligently from an a la carte menu of frameworks, incorporating the most relevant into the analysis.

We recognize that new approaches and new technology will continue to rapidly influence how we collect and analyze data. Big data will continue to provide insights that drive our approach and behavior. There are a multitude of ways that AI can contribute to better implementation and responsiveness to data. Blockchain has and will change the way we store and transfer data. While we celebrate these advances, it is crucial that the value of small data and the emphasis on improving small data performance is not overrun by the latest buzzwords. To quote Confucius – Don’t use a cannon to kill a mosquito. Or in our case, sometimes all you need is a little bit of data to measure your impact.

 

要查看或添加评论,请登录

Meeghan Paul的更多文章

  • My 2021 Book List

    My 2021 Book List

    Even though less time in lockdown this past year meant less time spent reading, I still managed to fit in 30 books…

  • The Return

    The Return

    Reunited with old rituals, I point my morning run towards my favorite spot in the city. The spot where MLK stood to…

    1 条评论
  • 2020 Book List

    2020 Book List

    Having more time to read was one of the bright spots in the madness of 2020. In fact, I read more books last year than…

  • Why I love data

    Why I love data

    (and why you should too) I am a self-professed, professional data geek. Admittedly, it is the kind of hobby/profession…

社区洞察

其他会员也浏览了