DIK(I)W? Start with Data

DIK(I)W? Start with Data

Seeking wisdom?

You begin with data, organize related data into information, correlate and analyze information into knowledge, and then build insight. In time, you achieve wisdom. [appropriate sound effect – maybe the Windows ta-da?]

?The concept of this flow from...

data->

information->

knowledge->

insight->        

wisdom.

...is called the DIKW pyramid.? I’ve added that insight part – because I think it takes specificity to reach wisdom.? In other words, wisdom has context that happens after knowledge.

Your goal is wisdom for your organization, and that path to wisdom begins with good, clean, organized, known data for integration between systems.

Data’s one of my favorite things.? When I was in my early 20s, I earned(??) the nickname of Datahead.? The things I can remember and classify – putting everything in its mental place, was, I guess, impressive.

For example, I know my library card number.? Do you? Credit card number and all of the peripherals?

Many of the things we used to memorize are now abstracted; we keep phone numbers in our contacts, and we just need to know how to reference the person. Our phones supply their names when they call us.

But – data.? Today’s topic is data. So, we’ll discuss a few properties about it before we talk – another time – about using and integrating it.

Data Types

We start with the type of the data.

String, Boolean, Numeric data and its subtypes.? BLOB!

I’ll unpack these.

We have to start with BLOB – don’t we? It stands for Binary Large Object. Think of images, video, audio, etc.? Now you know a thing.? BLOB! For a video, you could consume part of it, but to do so, you have to have access to the whole to get your part.? So, the BLOB is considered a piece of data.

Most data stored in a database for use in systems is structured.

Boolean

Boolean – it’s binary.? Yes, or no for data if you choose this type.? If data people are lazy, they don’t default a “Yes” or “No,” and you run into a three-condition Boolean, which is a logical nightmare and the cause of both all the moral problems in the world but also the prevalence of skinks in my yard and black cats in my home. Definitely. Maybe.

So, anyway – the three-condition Boolean tomfoolery is Yes, No, or Unset.? It’s best practice to default to yes or no…because unset is just evil…see skinks, cats.

A Boolean is the answer to any question you can ask a Magic 8 Ball. ?You can substitute True for Yes and False for No.? Same thing.? Truly.?

Numeric

Numeric data types include integers (and subsets thereof) and floating point numbers – those with decimals.

With numeric datatypes, there are no letters.

Sometimes in data integration or use, numbers are converted to text, and when they are, they’re called strings.? Leading to the string datatype.

String

Alphanumeric text is called a string.? It’s a best practice to give it a length constraint for when it’s a single piece of data.? When it’s something like a document, then size may suffice.?

I’ll give you an address length sample below.

Date (date/time)

Date formats may vary, so date storage should be clear what time zone’s in consideration and what format’s in use in storage.

Dates can also be transmitted as strings and reconverted to dates if their format supports that – and the information about that format’s known.

For example, much of the world considers dates as dd/mm/yyyy, where dd = day, mm= month, and yyyy (or yy) = year. ?In America, we use mm/dd/yyyy.?

So, if you merely see 01/02/2024, are you working with January 2nd or February 1st?? Need to know – via specification – what the data represents.

Data Cleanliness and Quality

If data types are clearly articulated in structured data (think database), then we know what to expect for each field and can ensure that structure within storage and rules around what’s stored.

A lot of data, though, is unstructured, and that cleanliness and quality and decisions regarding use have to be made long after the data is stored in its original form.

Think about a driver’s license number.? It’s a string – has a certain length it won’t exceed, but the length may vary by state. If I need to transmit that data somewhere, and it’s expecting one fewer character than my actual driver’s license number, what happens? Knowing and discussing the answer as early as possible helps determine that.

(Yes, things like this do happen).

Another good example is that US export data lengths were established back in the EDI (electronic data interchange) era.? An address line is 35 characters. Along came the Automated Export System (AES) from the US government, demanding address data.? What’d it allow per line?? 32 characters.

What’s the solution in this case?? Truncation.? Send it the first 32.? But decisions like this require discussion and best practices.?

For numbers, truncation to the left of any decimals is disastrous. Does your company make 7 million each year or 70 million?

This turned into storytelling with Heather, but hopefully the discussion of cleanliness and quality at time of use is helpful in understanding some challenges in implementation if we’re looking for accuracy.

Data Classification and Labeling

When we store data in files, typically it’s in the form of information – data intermingled together.? In databases, it’s structured for use in a variety of contexts; some data is the result of computations of combined data, logically or mathematically.

We talk about data classification and labeling for use within (or restriction from) AI and also for privacy. ?

In other words, classification/labeling are required to keep your HR sensitive information outside of generative AI computations.? No access for you, Microsoft Copilot.

Data Bias

A piece of data – a datum – is just a thing. Bias enters in the context of usage and what’s measured/determined and stored in that datum. So, a piece of data is just a representation of a very simple single thing. ?

In other words, it’s the key and how we might derive the value. Back to key/value pairs

What do we do about that?

I saw Phadrea Boinodiris speak last October at the Northwest Arkansas Tech Summit about bias in AI.? Her book can be found here – it’s on my “to acquire and read” list…which is an immense list.

https://www.amazon.com/AI-Rest-Us-Phaedra-Boinodiris/dp/B0C6W1KJ49/

Data, AI.? Yeah – outside of scope here today while we’re hanging out with each individual datum.

Data Privacy

Do we even want to go here today?? It’s the foundation of everything we try to achieve in working with data – ensuring the right people have access to their own data’s use.? Others are restricted without explicit permission.

So, one paragraph.? If you’d like to follow an expert in the field, look at Brian Blakley . ?

The Data of Grammar

I’ll keep this short – this extension of the rather long opening piece.

Grammar has “data types.” So, engineers who’re writing or working to upskill writing skill to do more writing, think of it that way.

·???????? Parts of speech (noun, verb, adjective, adverb, pronoun, preposition)

·???????? Specialized usage (verbal -> gerund, as an example). Participles

·???????? Advanced readability concepts – including when to break the rules.? I love sentence fragments because they introduce conversation to writing; we don’t, after all, speak in complete sentences to each other at all times. They only work, though, when you’ve established language mastery.

Some definite complex rules in here.? Things like American English versus British English.? Organization versus organisation on spelling. Collective nouns in American English that feel “weird” to the rest of the world.

An example is to refer to staff as singular – very American. Staff is versus staff are. A fix? Say "staff members"

Keeping Up Appearances

Erik Boemanns and I recorded a podcast, and that’ll release soon.? We talk about reducing the profitability of cybercrime.? Let’s do THAT, right?

I also wrote about that at Elnion this week, since it’s on my mind and heart. You’ll recognize themes from posts I do here. https://elnion.com/2024/08/05/devaluing-cybercrime/

Another Elnion piece from last week.? I write here once a week. My birthday article – My Library, Your Library - https://elnion.com/2024/07/29/my-library-your-library/

Upcoming?? Tomorrow I’m talking with a group in the morning about my long software career.

Codistac, Redux

About a year ago I procured an office at Missouri State University’s efactory , an amazing entrepreneurial space.? I had grand plans of offering training about cyber hygiene and doing culture work within organizations, especially with HR teams from hiring practices through to knowledge of security awareness at the level we need and really making the case for it.

Efactory offers great training rooms.? I was super excited. After making some inroads and planning, a life event made it difficult to get to that office on a regular basis. That, coupled with expanded work with Missouri Cybersecurity Center of Excellence, is leading me to abandon the office and that business model and refocus Codistac on the 3 exact areas where I shine and where work comes to me.

·???????? Writing (please act shocked).? Specifically brand-boosting messaging for technology companies that speaks to customers in their language instead of yours.

·???????? Software requirements, specifically for data integration. Yes, with security considerations in here, much like Moms try to sneak nutrition into good-tasting food.

·???????? Business strategy work for technology companies.

You can take a look at the redesign at https://www.codistac.com.? I’m seeking business in all of these areas. A 4th and natural extension is software testing – mostly from a quality “how brittle is your happy path” perspective. ?

Some Quick Asks

Will you follow two pages?? Once is Codistac , where I plan to start writing regularly soon.? The other is the Missouri Cybersecurity Center of Excellence – same.

Also, please follow this newsletter.? It arrives every other Tuesday.

I write everyday, so please follow me, too, and interact if you see something that's worthy.

If you find my services intriguing, let's talk. I'm part time at Missouri Cybersecurity Center of Excellence and do have some space for additional work at Codistac beyond the clients I'm already serving.


Brian Blakley

Information Security & Data Privacy Leadership - CISSP, FIP, CIPP/US, CIPP/E, CIPM, CISM, CISA, CRISC, Certified CISO

3 个月

Heather Noggle, thanks for the mention! Your insights on the DIKIW pyramid are spot-on. As a fellow data privacy advocate, I appreciate your emphasis on data cleanliness and quality, which are vital for protecting personal information. Clean, well-classified data not only aids analysis - but also - as you know - ensures ethical/lawful handling of personal information which builds & maintains trust. Thank you for supporting our community and sharing valuable insights.

回复

spot on, strongly agreed well and defined

April Webster Halden

Log all data, store hot, fast queries for years at half the cost

3 个月

Heather Noggle I love your writing. Data, storage, accessibility is key. :) ??

Chris Marshall

A better way to protect your company against ransomware | Many backups fail to recover. We fix that.

3 个月

Great article! When people don’t give careful thought to their data types and structure, bad things can happen. An ounce of prevention…

??Ivette B.

Information Science Innovator | Privacy, People & Risk Matters Change Agent

3 个月

Very insightful. Love the framing of it. Makes sense and a good reminder.... it's how we interact and integrate w IT that matters. ????????????

要查看或添加评论,请登录

Heather Noggle的更多文章

  • Freeze. Your. Credit.

    Freeze. Your. Credit.

    Now is the time, my friends. I didn’t give this advice last year for cybersecurity awareness month, but I’m sure giving…

    32 条评论
  • Happy Fall Cleaning Month!

    Happy Fall Cleaning Month!

    Let’s wash our online draperies together and trim the virtual hedges, shall we? It’s Fall Cleaning Month – October –…

    7 条评论
  • Wisdom Requires Focus

    Wisdom Requires Focus

    We swam in data (lakes). Gathered information like we would morel mushrooms if we knew where to find them.

    8 条评论
  • Have You Seen This Scam?

    Have You Seen This Scam?

    A friend received this message (below) recently. I've redacted all of the person's information and removed some of the…

    19 条评论
  • What Do You Know?

    What Do You Know?

    If you know, you know. Data, Information, Knowledge, Insight, Wisdom.

    9 条评论
  • The 411

    The 411

    In the 90s and earlier, you could call 411 – on your landline, of course – and reach “Information.” Also, Information…

    8 条评论
  • My Evening as a Foot; No Lampshades

    My Evening as a Foot; No Lampshades

    Late summer 2011 - I was Pitter, as immortalized in this grainy photograph. Supposed to be part of Pitter and Patter, 2…

    10 条评论
  • Houston, We Have a Problem. Or Do We?

    Houston, We Have a Problem. Or Do We?

    Do you serve people with your products? Of course you do! Imagine yourself as an IT or software service provider. You…

    11 条评论
  • SSH! Data’s Moving.

    SSH! Data’s Moving.

    Maybe it’s like the noise level in a library, data in motion. Perhaps it sounds more like a gurgling faucet or the…

    23 条评论
  • Your Move, Player

    Your Move, Player

    We named the first of the black kitten intrepid interlopers Desdemona when we found them (the second was in a tree)…

    7 条评论

社区洞察

其他会员也浏览了