Can you use AI for good where sensitive data is concerned?

Can you use AI for good where sensitive data is concerned?

I think we can all agree we’ve seen dramatic changes in the way we live and work as a result of the COVID-19 pandemic. Notably, there’s been a shift towards the digital - many companies are now embracing remote working technologies at a rate previously unthinkable. There are many positive aspects to this change; it might be a forced situation, but remote working is dispelling the myth that we need to be in the office 5 days a week to be productive. Hopefully, our increased willingness to allow people to work from home will continue long beyond this awful virus and leave a lasting positive impact on our work life balances and mental health.

But, in a world where I spend the majority of my day sat in front of a webcam, I can’t help but ask: at what cost?

For many years now there has been a bubbling concern about how organisations collect our data and what they use it for. Artificial Intelligence (AI) techniques are key for large organisations to leverage the vast quantities of personal data they collect. As such, AI has made quite a reputation for itself in this space – and it’s not always a good one (see Cambridge Analytica). At a time where we are putting more information about ourselves online than ever before, whilst simultaneously being more conscious of social justice issues in our society than ever before, it feels like it's never been more relevant for us to reexamine the question: How can we use AI for good?

Working as a consulting data scientist, I regularly find myself entrusted with (often highly sensitive) client data. As data professionals, we have an important duty of care, not only to our clients as organisations but also to the employees and customers whose personal information these organisations hold. Fortunately, this is where technology can help us, for example InsightMaker’s ability to detect Personally Identifiable Information (PII) in documents and manage the file permissions appropriately. But, whilst anonymising documents by blanking out the sensitive personal information detected this way is great for compliance purposes, life is not quite as simple when looking to build an AI model. Even if PII is masked to me as a data scientist, how can I be sure that the AI model I create is using that information responsibly?

The answers to this question are rarely simple. Complex AI algorithms can be notoriously hard to explain, and it’s even harder to pin down the exact logic for their key decisions. Even removing the potentially sensitive information from the model does not always work, as your model runs the risk of basing its decisions on seemingly innocuous data which is highly correlated to more sensitive fields (ones we should not be making decisions based on, such as gender and ethnicity). In the end it falls to us as data scientists to carefully consider these implications for every model we build and make sure to put in place safeguards which are appropriate to the individual use cases for our models.

This week Milton Keynes Artificial Intelligence (MKAI) are addressing this very issue. In the run-up to this event I was fortunate enough to be invited to attend the Boundless Podcast to discuss this issue with Rudradeb Mitra of Omdena, David Troy of 410Labs and John Kamara of the Machine Learning Institute of Africa. AI keynote speaker and founder of MKAI Richard Foster-Fletcher and I also spent some time later in the week discussing the complex issues surrounding data privacy in more depth.

This blog was prepared in the lead up to MKAI’s AI For Good Expert Forum, June 2020. If you enjoyed this blog or the above podcasts and would like to learn more, you can still catch the MKAI event on YouTube.

As published on the Aiimi website.

Joshua Greenslade

Software Developer at ONYX InSight

4 年

New York Times did a really interesting investigative piece into anonymised phone tracking data, and demonstrated how easy it was to de-anonymsie it. Whilst hand-decoded in the article, it's easy to see how to stick a reasonable simple network on this data to de-anonymise everything! Suspect you're right that sensitive data and what counts as securely sensitive data is gonna be a big topic in the years to come! https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html

回复

要查看或添加评论,请登录

Jack Lawton的更多文章

  • 3 Reasons to Fall in Love With Databricks

    3 Reasons to Fall in Love With Databricks

    At the moment, I am working to develop an enterprise scale digital twin, from the ground up. When assessing the data…

  • General Election 2019: Twitter Analysis

    General Election 2019: Twitter Analysis

    In true Aiimi tradition, this year we once again spun up a Twitter analytics platform to follow the UK General Election…

  • Network Analytics for Novel Hydrophones

    Network Analytics for Novel Hydrophones

    Recently, I’ve had the privilege of working in Anglian Water’s Water Industry Award-nominated data science team. As…

  • Email Classification: The Road to Production

    Email Classification: The Road to Production

    In my previous blog, I introduced our latest project – an email classification system for large UK utilities supplier –…

  • Machine Learning: The Truth is Out There

    Machine Learning: The Truth is Out There

    In this blog I will help to demystify the complexities surrounding text analytics, machine learning and unstructured…

  • Aiimi Analyses: By-Elections

    Aiimi Analyses: By-Elections

    Although analysis of news and social media data is interesting, there are clear limits to this method and no guarantee…

  • Aiimi Analyses: Question Time

    Aiimi Analyses: Question Time

    This week Aiimi are predicting the UK General Election. In any 21st century campaign, social media plays an important…

  • Aiimi Analyses: General Election

    Aiimi Analyses: General Election

    Last year, the Aiimi analytics team took on a huge challenge. We analysed and successfully predicted the result of the…

    1 条评论
  • Unlocking the Secrets of Unstructured Data

    Unlocking the Secrets of Unstructured Data

    Today, information is power and knowing how to harness it can have a significant impact on any business’ profit…

    2 条评论
  • Pedalling Data: Bringing London to life in Kibana

    Pedalling Data: Bringing London to life in Kibana

    More and more, the biggest questions facing businesses are data science questions. In this increasingly digital world…

社区洞察

其他会员也浏览了