Ethics in Big Data Analytics

Ethics in Big Data Analytics

Data is raw information. One only needs to figure out the patterns within it to arrive at the information it contains. And, information is power, information is money. Using computers and specialized programs, it is now possible to see patterns within large data pools – something that would otherwise have been prohibitively time-consuming. But let us first grasp the difference between ‘data’ and ‘big data’, as these are bandied about frequently.

Consider a school that has Classes 1 to 10, 4 Sections A to D for each Class and 20 students per Section. Thus, the school would have a total of 20 x 4 x 10, i.e., 800 students. The names, dates of birth and examination marks scored by each of the 800 students in each of the examinations is the school’s basic Student Database. This would be augmented by any number of other data fields like Gender of each student, professions and earnings of each students’ parents, medical information of each student, hobbies and strengths of each student, etc. This is still a tiny data set compared to the larger databases, some of which we will discuss next. Now, although the School Database is just a tiny data set, the information it holds is vital to both the school as well as the student. For example, the school would know the right students to select for a nationwide poetry-writing contest, if its database included all student’s hobbies and strengths. And weak students would be benefited if the school identified each student’s weak areas and offered extra assistance, so that the student may catch up with the rest of the class.

Big Data

Now consider the database of a large Credit Card company like Visa or Mastercard. Their database would include crores of Credit Card holders, their names, addresses mobile numbers, how much money each person spends using the credit card, exactly what the person spends on, how much money is paid back every month by the person, etc., at the very least. This is ‘big data’ and every time a card holder buys air tickets, makes hotel reservations, buys groceries, clothes, medicines, etc., the Card company knows the person’s vacation preferences, the items that the person buys as groceries for the month, from where the person buys clothes and what kind of clothes, perhaps even that a person in the card user’s family has diabetes/ cardiac problems, depending on the medicines purchased using the card!

As credit card users, few, if any of us, want any of this data to find its way into the public domain. Like humans, our data should also have rights to privacy!

Again, take the case of online portals – be it real estate, used cars, matrimonial services, career services, or anything else. When one signs up for such online services, one does not only part with one’s money by way of subscription charges, but one would also be sharing precious data about oneself! Big data such as these fetch handsome money in the marketplace!

There are data aggregators who are constantly on the lookout for the databases of Credit Card companies and online portals. They sift through the data to create different databases out of such data sets. For example, a list of frequent flyers can be created from a Credit Card users database. A database of luxury car owners can be created from the database of car portals. A list of young men and women, along with their purchasing power and purchasing habits, can be created by marrying their credit card spends with their data on a matrimony portal, wherever they have availed of both services. Various databases can be created by identifying patterns within a source or by combining big data from multiple sources.

Data aggregators purchase such databases for lakhs of rupees and create dozens of different databases by data mining. These databases are then sold in the marketplace for affordable sums of money. For example, a database of High-Net-Worth individuals (HNIs) is sold at about Rs. 1/- per record. A company that wants to market its products to 1,000 HNIs using email or WhatsApp campaigns can buy it for a mere Rs. 1,000/-, without the HNI ever coming to know how their name, email or phone number came into the possession of the marketeer!

Unethical use of big data analytics for marketing is in fact the minor irritation. What is truly terrifying is the number of frauds that may be possible here. One’s financial data – Aadhar, PAN, Income Tax, Bank details, etc., are all targets for fraudsters. We have heard of the hacking of government databases like the Aadhar database more than once before. Even the Pentagon database has been subject to hacking.

Hackers would turn to unethical big data analytics to steal hard-earned money from the unsuspecting public by various means, including spam calls, SMSs, email, etc. We have seen an explosion of such activities in recent years. Here are some reports of such frauds in the recent past:

  • Research by PwC India finds that 57% of all frauds in India are ‘Platform’ frauds: article in Business Standard dated August 3, 2024.
  • Nearly 800 online financial frauds are reported in India daily: article in The Hindu dated June 25, 2024.
  • Bank frauds go up by 300% and digital frauds up by over 700% in last 2 years as per RBI: article in Economic Times dated May 31, 2024.

Clearly, one cannot stop using credit cards or online service portals simply because there is a danger of unethical data mining: any method of transportation is potentially dangerous, but can we stop using them all and rely on our legs alone, to get from place to place?

Such incidents certainly should not be allowed to happen, and the government has set up guidelines for reporting and prosecuting such cases. Here are some of the government and non-governmental initiatives taken for reporting cyber crimes in India.

? National Cyber Crime Reporting Portal, a government of India portal that includes all kinds of cyber crimes, including financial crimes

? Indian Cyber Squad, the non-governmental bilingual portal detailing the process of reporting a cyber crime.

In addition, a number of excellent non-governmental websites have also explained how to report a cyber crime in detail.

Despite the loftiest intentions, no law can prevent unethical hacking of big data, just as no law has prevented murders: the returns of crime are too tempting to criminals! One should be cautious at all times and not sign up for unwanted or suspicious services. One should be extremely cautious about responding to calls that ask one to part with one’s data: once they get the data they have asked for, they would have sufficient information to empty out the victim’s bank account!

Big Data for the Social Sector

The social sector also has started using big data analytics in order to improve efficiency in operations, develop a better understanding of the region or the communities of their interventions, plan evidence-based programming & policymaking etc. Such efforts are happening in areas such as education, health care, skilling, environment & climate change etc. “Big data can help us cut through politically charged debates and find out what policies actually work from a scientific perspective, making the often-discussed notion of “evidence- based policymaking” a reality”, states Harvard On-line.

Multi-lateral and bilateral institutions have also supported global collaboration for social good. One such example is the UN's Open SDG Data Hub that enables data providers, managers and users to discover, understand, and communicate patterns and interrelationships in the wealth of Sustainable Development Goal data. Refer the article marked 6 at the end, that discusses in detail how many Sustainable Development Goals (SDG) that we may be able to attain by the use of Big Data Analytics. The role of the UN, as well as ‘Global Pulse’, an innovation initiative of the UN Secretary-General on data science in putting Big Data to beneficial use on humanitarian causes is also discussed in the article.

The ‘UN Global Pulse’ platform lists several projects ranging from operational response simulation tools for epidemics to how disaggregated data can help in providing more inclusive transport.

References:

1 https://www.business-standard.com/india-news/57-of-all-fraud-incidents-in-india-are-platform-frauds-pwc-india-123051100526_1.html

2 https://www.hindustantimes.com/business/indians-report-nearly-800-online-financial-fraud-cases-a-day-are-you-protected-report-101719309683432.html

3 https://economictimes.indiatimes.com/industry/banking/finance/banking/bank-frauds-up-nearly-300-in-last-two-years-digital-frauds-up-708-rbi/articleshow/110555108.cms?from=mdr

4 https://cybercrime.gov.in/

5 https://www.indiancybersquad.org/

6 https://www.un.org/en/global-issues/big-data-for-sustainable-development

7 https://www.harvardonline.harvard.edu/course/big-data-social-good

8 https://surveypoint.ai/blog/2022/05/18/big-data-an-introduction-and-application-in-the-social-sector/

Andrew Johnson

Computer Science Major | Data Science & AI Enthusiast | Open to Opportunities in Data

5 个月

This is an excellent post on ethics in that data world! I love how you also noted how big data isn't all bad, but can yield positive results for the greater good.

Sanjiv Tare

Aspiring CSR Sherpa - NuSocia

6 个月

Excellent write up Surendran. On one hand, gives a comprehensive overview of the risks, also comments about how the big data can really be leveraged for common good and serve the under served.

Manju Menon,Ph.D.

Social Impact Enthusiast | CEO & Co-Founder NuSocia | Ph.D. in Social Entrepreneurship(TISS) | GS10K Alumnus

6 个月

Insightful!

要查看或添加评论,请登录

NuSocia的更多文章

社区洞察

其他会员也浏览了