登录查看更多内容

Getting Item Data Right

Daniel O'Connor

Global Product Manager | Lead Solutions Architect | Implementation Strategist

发布日期: 2016年11月14日

A friend of mine in who's opinion I greatly trust recently questioned my use of negative tones in my blogs. The easy answer is that it's easier to find things wrong in the presentation of an item than it is to find things that are right. I am my own self-confirming bias, so to speak. I have spent so long looking for data issues that the bright spots don't stand out to me anymore.

The more difficult answer is that it is really difficult to show when a business does item data the right way. If the data is right it is just... "right". There should be nothing special about data that is correct, because that is the expectation. Incorrect data can stand out like a veggie burger at a Texas barbeque, so it is much easier to point out.

So to try to reverse the trend of displaying what is wrong with data here are some examples of data done correctly. I've included comparisons where the data is not-so-right, but only to illustrate my point. As usual, the screen shots are limited to avoid calling out individual businesses or websites, and I have pulled from a variety of sites to avoid being accused of being biased or partial.

#1 Case Conundrum

One of the most common errors in data is mixing case standards. What usually happens is a business decides to use Title Case, and then mid-stream decides to change to Sentence case or other variation. When this happens this is what you end up with:

Notice that there is title case for the top three values and all lower case for the other two. Because of this the title case values appear on top because they faceting engine sees capital letters before lower case. This makes it very difficult to find the feature you are looking for, as it is difficult to know if it will appear in title case or lower case.

Here is an example of this done correctly:

All 42 values in this list are in title case. In fact, when searching this web site I found no instances of mixed case errors. That isn't to say there are not any: I just could not find the experience where those errors occurred. This is important because I know exactly where to find Infant Body Support in this list. In the previous example you might have to look two places if you even scrolled far enough to see the case issue.

This issue links directly to item findability. If someone in the first example required a car seat with a canopy they may not find it in the list depending on how the word canopy's case is applied. In the second example they know exactly where to find it. In the first example the customer has two choices: Keeping looking through the list if they don't see canopy spelled with a capital "C", or look somewhere else. In the second example the customer doesn't have to make that choice.

Secondly, it just looks nicer. Seeing a single case unifies the presentation that looks professional and complete. Faceting is an example I like to use a fair amount because those facets come from item data. The presentation level of the web site has spent an enormous amount of time building the right mix of filtering and container development, and the data supports that resource spend. Would you say the first example justifies that spend based on the presentation of the data shown?

#2 Classification Chaos

Another findability enabler that comes from item data is classification on a web site. During item data collection an item is classified into a hierarchy of other common items. That classification is then used to move items into the hierarchy on the E-Commerce or CMS experience. If the item is classified incorrectly in the data collection hierarchy there is little chance of it finding its way to the correct location in the presentation. You cannot fix bad data with presentation, and these kinds of errors become very visible very fast.

However, some times terminology trips us up. For example, the industry might refer to an item with one classification name and consumers understand it with another. We all know the word plastic, but there are dozens of types of plastic, from ABS (Acrylonitrile Butadiene Styrene) to PTFE (Polytetrafluoroethylene) to PS(Polystyrene) to HDPE (High-Density Polyethylene). If you saw a web site with ABD, PTFE, PS, and HDPE as the materials for an item it would make no sense versus just stating Plastic. This can happen with classifications as well, such as with Induction Cooktops.

Let us start once again with the bad example. This is an experience that involves induction cooktops and smooth surface cooktops showing in this presentation. Technically an induction cooktop has a smooth surface, but a smooth surface cooktop is not an induction cooktop. They are actually two different electric cooking technologies that need to be classified separately. See the image below:

The reason this experience appears like this is due to two possible reasons. Either the classification system separates smooth surface from induction, but not everyone that selected a classification of induction selected the correct cooktop surface type, or there are smooth surface cooktops that have been classified as Induction in their classification. Because Induction cooktops are smooth this is a rational choice in data as long as the context is missing. Now let's see this done correctly:

You may notice that they do not show a separate smooth top and induction categorization at the top of this facet. That is because both smooth top and induction are electric, and therefore can appear in the same classification. They then appropriately break out electric and gas cooktops based on their technology in the Cooking Surface attribute. They even allow for combination electric and gas cooktops. There is no confusion over what cooktop belongs to each classification. This is done correctly, avoiding confusing faceting situations and the clutter that occurs when incorrect items appear in a presentation experience.

There are two important aspects to why this works. The first is that it is documented somewhere both the item data collection program and the website navigation program have access to. Secondly, the item taxonomy team and the web navigation program are obviously communicating to each other to come up with the best presentation experience based on the terminology in the data collection taxonomy. Do you think the first example shows good communication or documentation?

#3 Normalization Nonsense

One of the keys to any data collection system is the ability to normalize the values across like items. It is important to ensure that you use the same terminology across your platform with the same constraints for the same data point. As comparison engines have grown more complex they highlight where this normalization process fails, such as below:

Here are three items in a comparison engine with their battery requirements listed. Notice that the middle item has quotes around AAA but the outside two do not. Also notice that (not included) and (sold separately) mean the exact same thing but look different in this presentation element. It is definitely a small issue, but an issue non-the-less.

Next we can see normalization in a comparison engine working correctly:

As this is a discussion regarding normalization we will leave out any comment regarding the choice of attributes to display and how to display them. All the Yes/No attributes either show up with a "Y" or an "N", as much as "Yes" or "No" might be preferable. The name of the Print Width attribute is awkward but the values displayed are consistent. The Shipping Description attribute uses the same value in a consistent format.

This presentation functions from an item data perspective because the system limitations for data collection enforce normalization. Many systems provide you an open text entry box that you can type whatever you want in, meaning item to item variations should be expected. A good controlled vocabulary with systems that enforce data normalization will make even the dullest data at least seem consistent.

In Summary

My blogs regarding the results of item data issues are becoming more difficult to write. Simply put, the industry is starting to see the value of normalized controlled data. It used to be that I could go to any website, search for 10 minutes, and have multiple examples of bad data or bad data practices. Now, it takes longer. The issues are more difficult to find, and they take someone more trained in looking for them to see them.

This is great news for those businesses that see presentation beyond just making items available on their web site. It means that a data centric approach has provided value to that business, and they see their item data as an asset. However, it also comes as a cautionary tale: It is easy to become lax and let data quality slip. Some of the issues I found for this blog were not there months ago. A data quality program is an investment in your data assets, but so is putting the resources into maintaining that data quality program. Seeing your data quality systems as a project rather than a process and program can severely hinder the return on investment in that asset.

Charles Meyer Richter

Principal information architect & diagnostician at Ripose Pty Limited

8 年

Daniel thank you for posting this article and Winston S. thank you for bringing this to my attention. As far as I understand 'normalised control data' is that it is used to remove redundancies. The examples you have provided are 'things' which may or may not have 'data' elements such as colour, heat_source etc. There are 2 ways of looking at this classification conundrum: One has to do with object orientation and the other with data normalisation. Object orientation programming (c1990) which requires the viewer to use 3 terms of reference, namely encapsulation (binding an object to a higher level object eg 'Cooking_surface') polymorphism (finding similarities eg electric_sourced, battery_sourced, carbon_sourced) and inheritance (navigations eg given a gas_sourced is a type of carcon_sourced surface). The problem with polymorphism is that it only (according to my research) included the mutually exclusive property (the OR) but not the mutually inclusive property (the AND) nor the capability of recursion (or many-to-many relationships between mutually exclusive types. If this had been done properly then the induction property of the cooking surface would be a mutually inclusive property of the cooking surface, but only if the cooking surface was electric or battery, which then means that the original classification of the Cooking_surface, is incorrect. In data normalisation where data is known, Ted Codd, the 'father' of data normalisation taught the 'value' of the 3rd normal form of data was to ensure that the data item (for example date_of_birth) depended on the key, the whole key and nothing but the key (ie non-redundancy). The problem with this is that the 'key' is an artificially selected attribute, whereas the data item like date_of_birth is a real attribute. The late Ted Codd (1923-2009) may have realised this conundrum in 1974 when he and Raymond F. Boyce (1947-1974) discovered the BCNF or 3.5 normal form. Unfortunately, with the early death of Boyce, Codd probably had no one else to collaborate with on this nf and so probably left it there. If Ted Codd had realised the similarity between object orientation's flaw in polymorphism, he may have pushed the 3.5 nf to become 4nf (mutually exclusive), 5nf to cover the mutually inclusive property and 6nf to cover recursion. Perhaps Codd would have also discovered that normalisation required data whereas an artifact like knowledge did not, then perhaps he would have avoided the trap he set for the unwary data analyst. In 1984 I discovered the potential flaw in the 3.5nf conundrum and introduced my intermediate solution which was to state that a typed entity could also play multiple roles. But in 1990 I dropped the role approach and settled on the mutually inclusive form (5nf). The AI engine I wrote (1990) thus included both the extended polymorphic property and the 5nf as they were one and the same as well as the 6nf. These discoveries will (if implemented) greatly reduce the time and effort to produce high-quality, non-redundant databases and processes. However, they require a more advanced form of business modelling than the current enterprise architectural paradigms provide as these not only use 7 different starting positions but take far to long to produce a strategic plan that DevOps can then focus their attention to find the relevant data items and thus completely avoid the data normalisation step. From the pen of a 45+ year veteran in the domains of business information (objectives, knowledge & strategies) & information projects (data & applications)

查看更多评论

要查看或添加评论，请登录

Daniel O'Connor的更多文章

Defining PIM By Defining Product Information

2019年10月6日

Defining PIM By Defining Product Information

When I first entered the Product Data world over a decade ago the definition of product information was pretty slim…
The Amazon Effect Versus The Google Effect - How Search Is Changing Product Data

2018年10月19日

The Amazon Effect Versus The Google Effect - How Search Is Changing Product Data

Originally published on Awareweb.com There are two significant pressures on manufacturers and retailers that affect…
Refrigerator Disasters and the Lessons of Pot Money

2017年7月25日

Refrigerator Disasters and the Lessons of Pot Money

Recently a connection of mine on LinkedIn lamented the fact that refrigerators are not moving fast enough towards the…
PIM Is Not About Data

2017年4月20日

PIM Is Not About Data

I've spent a good portion of the last 2 years telling you how important product data is to your business and how…

1 条评论
Three Keys to a Successful Product Data Project BEFORE You Start the Project

2016年12月19日

Three Keys to a Successful Product Data Project BEFORE You Start the Project

The words "data quality" have become ubiquitous in the business lexicon in the last 5 years. Businesses are hiring…

6 条评论
Product Taxonomies and House Building: A Comparitive Analogy

2016年11月28日

Product Taxonomies and House Building: A Comparitive Analogy

Several years ago I was sitting in a meeting with several senior directors and managers who were engaging in a…

4 条评论
In Item Data, the Little Things Count

2016年11月11日

In Item Data, the Little Things Count

When I was a kid I loved to take things apart. Growing up in Canada in the 70's and 80's we actually had to rent our…

3 条评论
News Flash: Your Item Data is a Mess

2016年11月10日

News Flash: Your Item Data is a Mess

One of the perks of the role I play in item data is that I get to see a lot of it. The curse of the role I play in item…

1 条评论
A Few Fun Facts About Item Data

2016年10月21日

A Few Fun Facts About Item Data

So your company has embarked on a data quality journey, and you have decided to start with item data. Congratulations!…

2 条评论
Why Customer Service is About More than Customer Service

2016年7月29日

Why Customer Service is About More than Customer Service

There is a commercial that runs for Comcast in my area with a customer service representative stating that the company…

See all articles

社区洞察

Market Research

How can you improve data quality with branching questions?

Getting Item Data Right

Daniel O'Connor

Global Product Manager | Lead Solutions Architect | Implementation Strategist

#1 Case Conundrum

#2 Classification Chaos

#3 Normalization Nonsense

In Summary

Daniel O'Connor的更多文章

社区洞察

其他会员也浏览了

Understanding the world through Data - Factfulness

Why are we still talking about Data? A guide to fixing the biggest problem in your commission system.

TURNING SURVEYS INTO GOLD By W H Inmon

Four Weeds of Data Analysis That are Easy to Get Lost In

How to clean dirty data

When to use a Wide vs. Long Data?Format

What are differences Between Concatenate and Textjoin ??

Alternative Public Data - Is It For Me? ??

Data Cleaning: Transforming Raw Data into Gold

Your Company's Data Is Like a River

#1 Case Conundrum

#2 Classification Chaos

#3 Normalization Nonsense

In Summary

Daniel O'Connor的更多文章

Defining PIM By Defining Product Information

The Amazon Effect Versus The Google Effect - How Search Is Changing Product Data

Refrigerator Disasters and the Lessons of Pot Money

PIM Is Not About Data

Three Keys to a Successful Product Data Project BEFORE You Start the Project

Product Taxonomies and House Building: A Comparitive Analogy

In Item Data, the Little Things Count

News Flash: Your Item Data is a Mess

A Few Fun Facts About Item Data

Why Customer Service is About More than Customer Service

社区洞察

其他会员也浏览了

Understanding the world through Data - Factfulness

Why are we still talking about Data? A guide to fixing the biggest problem in your commission system.

TURNING SURVEYS INTO GOLD By W H Inmon

Four Weeds of Data Analysis That are Easy to Get Lost In

How to clean dirty data

When to use a Wide vs. Long Data?Format

What are differences Between Concatenate and Textjoin ??

Alternative Public Data - Is It For Me? ??

Data Cleaning: Transforming Raw Data into Gold

Your Company's Data Is Like a River