Data Literacy for Digital Transformations from Pharmacy Fraud Business Case
Navin Sinha, MS, MBA
Owner and CEO at Double Check Consulting (BPO): #AI 4 #Healthy #Food and #Humans
Is it possible to have #data democratization without data literacy? Having data all over the company is not data democratization; making sense out of that data is! Hence data literacy is critical for an organization in digital economy (https://www.gartner.com/smarterwithgartner/a-data-and-analytics-leaders-guide-to-data-literacy/ ). This is from data scientists to developers to data analysts to other stakeholders who are competing for analytics while complementing their knowledge everyday better than yesterday. #Pharmacy #fraud is difficult; large data duplication, too much of data and also lots of missing data makes pharmacy fraud analytics difficult. Clinical codes could have up to 30% mistake; there’s high false positive rate! Graduate talents gravitate towards algorithm when a low grade clinical codes tells you about data quality, completeness and trust in data is hard. Efficiency in data literacy is slow – full of friction. Since no university in USA teaches healthcare fraud, how can a talent get started in this industry? When people change industries like from retail to healthcare, it could be overwhelming experience to start with. So here are some tips to hit the ground running.
It is said that actionable insights are outcome of good blending of people, business process and technology. Lack of business knowledge of pharmacy fraud impacts insights; they don’t pan out from data. It is hard to realize the business value of #AI in Pharmacy fraud without clinical analytics knowledge. Unfortunately, clinical data may have up to 30% coding error. Due to data quality challenges there could be industry specific bias in data such as data duplication, which in turn bias your results from machine learning algorithms. Healthcare Fraud requires soft skills such as empathy for patients over-treated and sense of urgency for companies bleeding dollars to fraudsters. Pharmacy Fraud in USA is increasing at alarming pace; one person all alone can do considerable damage (https://www.beckershospitalreview.com/pharmacy/former-pharma-sales-rep-pleads-guilty-to-50m-prescription-drug-fraud.html?fbclid=IwAR1uIgEnZ9j1elMzBOREFYYAGi6HixvlYhjtZn8vGO6cUGR4Ogsg2Nvq77o )! Furthermore, a group of people get together and collaborate in pharmacy fraud that is below the radar and hence goes undetected for long time like close to a decade (https://www.justice.gov/opa/pr/nine-pharmacists-charged-role-121-million-health-care-fraud-scheme?fbclid=IwAR28ZYLC-z39ILfNNP720wg1htn2iMZtaHvgJ5fTfdINC_rr90pUgecoUgQ ).
Dr. Andrew Ng said, “big data is small data also." Once, we sampled 15,000 unique rows to learn data characteristics for possible clinical coding error and bias in it. Without taking these steps, there’s little chance of Grab Dollar from Hypotheses (GDFH). With small data sense of urgency is easy to express; we became intimate with pharmacy fraud data and learned possible hypotheses. We follow Steve Jobs Quote,” keep what you need and cut out the rest.” In pharmacy fraud, we start with a method that’s easy to execute, easy to present and easy to communicate on how we do “Grab Dollar from Hypotheses (GDFH).” Sure- combining traditional statistics with machine learning algorithm/ Mathematics is powerful and will find more fraud. Healthcare fraud up to 10%; it has lot of randomness. It is complicated and takes years of algorithm combining ability practice. While we’re quite successful in this area, it is hard for many companies! Starting with Linear Regression has advantages – in 5 clicks on any tool you find outlier data points in residual plot; you’re fast. Plus, stakeholders understand that easier solutions are applied first to stop bleeding of dollars ASAP; everyone understands from simple but penetrating visual analytics of Linear Regression. So stakeholders may feel you’re putting yourself at their place – human skills in combination with hard skill is potential to unlock the value hidden in pharmacy fraud data. Let’s take example from a case study: - Linear regression r-square was 14.581x + 2003.1, when independent variable – number of claims varied from 0 to 350 of certain drug combinations. That means fraudsters are taking home any day $2003.10 comfortably. Linear and quadratic R-square was 6%, whereas for cubic and quartic models were 7%. Such a grouping of non-linear model was interpreted as Total_claim_counts on x – axis increased, paid dollars gained height along Y –axis horizontally or several large claim counts didn’t had this pharmacy drug combinations. That explains why as claim counts of this prescription changed from 0 to 50 to 100 and so on – only $14.58 were gained. But given the considerable height of paid dollars at low claim numbers, it was easy to red flag at least $2.0 Million, and look for past behaviors of this pharmacist. Needless to write, fraud behavior was intentional- trying hard to go below the radar and never get caught. After all, threshold of $2003.10 is high and needs to be discreet in order to not found. It is this kind of storytelling and sense of urgency that gets attention of corporate lawyer. In healthcare fraud, data literacy is complete when fraud dollars are returned and/ or someone pays the price of going to jail from data and analytics. Hence no statistical and machine learning method is big or small – it’s always about risk management - dollars added to bottom line.
- In Pharmacy fraud it is important how fast bleeding of dollars is stopped! The picture presented here our Null Hypothesis was high-low or high pharmacy fraud dollars billed in low frequency. That null hypothesis is refuted in visual analytics. Not all visualizations are same and good one helps to conceptualize business use cases. High fraud dollar billed in high frequency resulted in these spikes as tall as Empire state building was another hypothesis. But this billing pattern went on for more than two years, so unlikely hypotheses. Such frauds are sitting duck to shoot at and never seen present more than 6 months in our long experience. So the alternate hypothesis- low dollars billed at high frequencies resulted in such fraud spikes= appears to be on more likely. So essential component of data literacy is- analyze more than one hypothesis from visual analytics evidences. The picture is pharmacy analytics presented in this article is from a simple calculus equation applied on Big Data that helped us find $200,000 in a week. In current economy, it must be at least $300,000! Never underestimate Calculus on Pharmacy Fraud and Healthcare Fraud data in general.