Before you use the Sample Data Set

Before you use the Sample Data Set

Welcome People, Bots & Algo’s of THE INTERWEBS. This is the Thursday Edition of The Analyst.

?

The Data you’re given Now What.

Before you use the Sample Data Set

Before we explore the database tool, we're sketching out the type of data that'll eventually populate this database and imagine what good sample data criteria is.

This will help build our confidence in our data model. With our need for data, we simply go to the internet.

?

We search,

we find,

we download,

we use.

?

Well, not so fast. Be cautious of using existing datasets. There are three crucial questions to ask.

?

Number one, where did it come from and for what original purpose?

What are the data gathering procedures being used and how much harm is introduced?

And number three, do you own the rights to do what you want with the data?

?

Each question helps you pinpoint where that procured dataset matches your needs and where there are gaps.

Let's say that I decide to use my social media data as a representative sample to help imagine the Instagram’s database.

Since the dataset is my social media content, I can answer questions one to three with ease.

?

For question one, this data was requested from a social media company to serve as a local copy of all posts sent on that social platform.

?

There's no sensitive information shared in these posts. For question two, the data gathering processes remain unknown.

The social media company provides a request for archive data portal so I can retrieve my content via zip file within 48 hours of the request.

?

And the answer to question three is yes, and the social media company can do what they want with this content as well. If the answer to question three is a yes, proceed.

?

Using Existing Dataset

?

Number one, determine data ownership with informed consent preferred.

Number two, identify privacy, confidentiality, and security aspects of any dataset early on. And number three, pinpoint vulnerable data features early on.

We need to consider more factors regarding the fitness of data to build out test scenarios.



要查看或添加评论,请登录

Shrikesh M.的更多文章

社区洞察