Who really owns data?

Data is one of my favorite things. But data is a bit of a nebulous object. It takes many different forms. It comes into existence through many different means. Its a loose, broad word that describes many different things in our society. As a practitioner of machine learning, I work with data all the time. Most AI companies are founded on building unique, hard-to-reproduce datasets. But I've always struggled with the question of how to apply the concept of ownership to many types of data. There are some cases where it is very clear cut, but there are many cases where it is not so.

I'll try to convey the basic conflict for me with a simple example. Company A works with Company B. They exchange email conversations all of the time. Later, Company A uses all emails they've received on-masse to train some sort of super intelligent AI algorithm. Who owned the emails?

This gets at the heart of one of the problems with the ownership of data. For most data, it takes atleast two parties for the data to come into existence. The data is about a social event. The data is a transaction, an exchange between two parties. It takes two to tango they say.

If I go onto my favorite social media website and type in a post, that post becomes data. For the social media website, they would never have gotten this data if I didn't exist. For me, I would have never had this data if the social media website didn't exist. The very existence of the data exists as an exchange between both me and the social media website. Who owns the data? I'm not asking for the legal answer, for which there is probably a complicated answer that varies by jurisdiction. But philosophically, who do you think should own the data?

Just to complicate things further, there was technically another party here too. That is the ISP, for which the very same post, transmitted many times across their network, is now swooped up as data into larger aggregate statistics about the types of data being transferred across their network. And lets not complicate anything further with the chain of hardware products involved.

Part of me tries to answer the question looking back to my first year political science classes. I believe it was John Locke who described the basis of ownership as being that you combine your labor with something in nature. E.g. by putting the effort in to cut down the tree, you get to own the tree. And I like this point of view. In some cases, it helps me to clearly answer the question of who owns the data.

If I go into nature and snap a bunch of photos, its clearly me putting the effort in. We don't usually give nature many ownership rights. Therefore I get to own the photos. I don't give the camera manufacturer much credit since I paid them to buy the camera, and presumably the rights to anything produced by the camera got transferred when I bought it.

In other cases, its absolutely not clear. Take the exact same situation, but now I'm a photo journalist for a nature magazine and the camera is a specialized custom built device on loan to me that takes photographs with insane realism and includes ultraviolet and infrared spectrum. But I take the photograph in my spare time, off work hours, for my personal nature blog. Who owns the data now?

Or now lets say I take my specialized on-loan super camera, but now I'm in an art gallery and I photograph the big crowd of people. I capture most of their faces with such detail, I could use it as feed data for a facial recognition algorithm. In the background of my photo shows several of the most important art pieces in full view and can be easily zoomed in on. I can crop it and post it on my blog. Now suddenly the question of who owns the data is pretty grey or even leaning away from me. In some sense the data now only exists because the magazine loaned me their one of a kind super camera, the artist created the paintings, the crowd of people gathered at that one spot and took the time to face the camera all at once, and I put the effort in to go there and photograph it in my free time. Each of these people have in some sense greater and lesser claims to the ownership of the photo as a piece of data.

When I think about how our society should evolve, I think one of the fundamental questions that needs to be answered collectively as a society is defining what exactly are the rights that different stakeholders have in the vast swarms of data being generated by our devices every day. The fact is, for me, the very concept of ownership breaks down. No one entity can ever really own data. Data is a social phenomenon. It exists in and among the people, and we each play various roles in making it happen.

As our society collectively moves to regulate the technology industry, we need to, as a group, come to a common understanding of what exactly data is and what do we think the associated set of rights around that data should be. The current free-for-all where companies set the rules has been great for business (wonderful for AI), but it ultimately has to come to a close. All businesses should play by a common set of rules on how ownership of data should actually work.

Jeff Thomas

Data Scientist at Abt

4 年

I'd like to add one concept to this idea of who owns the data- democratization of data. I think the major issue with data ownership is when one large tech company (FB or Google) owns most of the data. Data practitioners cannot make the same findings as the large tech company because they don't have access to the data, or the data is not "democratized". Further yet, these companies have then created a market mechanism to sell the data over to advertisers/marketers (that no one can compete with). Because I think this is the greatest problem with data ownership, I think a lot of the issue goes away if large tech companies agree to give everybody access to the aggregated data and low-level data they acquire from us. Otherwise the markets created are completely unfair for other tech companies and small businesses in general. Rant over, lol.

Martijn Spronk

Tech Innovator | Serial Co-Founder | Entrepreneurial Polymath

4 年

One thought that comes to mind as I read your article, is that "ownership" - when it comes to data - has a fundamentally different property compared to the tree cut down in the forest: data can be replicated ad infinitum. Because of this, your question might have answered itself: is the photo owned by yourself, by the people who'se photo was taken? By the artist of the painting you photographed? Well there does not have to be a single owner. It just opens up a whole other question, and that is, how you deal with multiple-ownership of something.

回复
Pedro Pessoa

Creating Partnerships Globally. Hiring Talented Software Engineers in North America. #videorecruiter #newjobsahab

4 年

Bradley Arsenault that was a great read. The major thought I had is that in each of these cases, most of the parties you mentioned had good claim to ownership over the data — if there had been a conversation ahead of time they could have properly laid out expectations. A lot of people and companies like to take advantage of ambiguity, but a conversation ahead of time is typically a good practice and beginning to solutions.

回复

要查看或添加评论,请登录

Bradley A.的更多文章

社区洞察

其他会员也浏览了