POLICE UNPREDICTABILITY
Bill Inmon
Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member
POLICE UNPREDICTABILITY
By W H Inmon
When a person crosses the bridge from structured data to unstructured data, that person leaves one world behind and enters a completely different domain. Structured data has data models. Text has taxonomies. Structured data has fields and keys. Unstructured data and text have random words and thoughts. Structured data has well organized and well behaved data. Unstructured data has text and context.
When one crosses the boundary between structured data and unstructured data, it is as profound a difference as walking ten miles on land then jumping into the ocean for a swim. One minute you are doing one activity and the next minute you are doing a completely different activity.
Nowhere does that difference show up more poignantly than in dealing with police reports.
Consider the police report. There is certainly some structured data in the report – the date of the incident, the reporting officer, the time the report was created, the location of the incident being reported. All of this information is information that every report has attached to it. But the interesting part of the report is in what is being described about the incident. That is where the meat is. And this information is decidedly not structured.
Consider the problem the police department faces when looking at many – thousands and hundreds of thousands of reports. Suppose the police want to ask the question – is there a pattern that is recurring throughout these reports? Is there a perpetrator that is committing more than one aberration of justice?
In classical data modelling, the approach to data modelling is to find recurrent patterns. And that approach works well if the perpetrator commits many crimes and always does it the same way. But life is such that that rarely happens. Instead the police have to look for unpredictable data to find patterns that exist among different reports. The patterns change with every crime and perpetrator. The patterns even change on a day to day basis.
One day the police are looking for a man that is 45 and that is over 300 pounds, that is white, and that smokes Marlboros. The next day the police are looking for a woman under 25 who is blonde and who takes meth and who has tattoos of James Dean on her back. She has a Scotty dog named Tipper. The next day the police are looking for a gang member who has braided hair and who wears a Disneyworld T shirt and who drives a 2006 Camaro and who is armed and dangerous.
In other words, there is no predictability whatsoever in what the police want to search for. One day the police want to search for one thing and the next day the police want to search for something entirely different. The standard approach of doing data modelling and defining standard keys just will not meet the needs of the police department. The police are looking for unpredictable data, not predictable data.
The classical methods of organizing data – using data models and fields and keys and indexes – just doesn’t meet the needs of the police department.
Fortunately, there is technology that does meet the needs of the police. With textual ETL you can strip ALL of the relevant text off into a data base. Then you can look for one thing one day and something completely different the next. Furthermore, the computer doesn’t care how many documents you have to analyze. If you have a lot of documents to analyze, you just get another computer. Or a bigger, faster computer. Looking at a huge number of documents and looking at them quickly is not a problem.
Structured data and processing is good where there is predictability of data and of processing. But a whole different approach is needed where there is unpredictability of data and processing.
___________________________________________________________________________
Bill Inmon is the founder of Forest Rim Technology, a company in Denver, Colorado that helps customer hear and interpret the voice of the customer. In addition, Bill has a class PRACTICAL TEXT ANALYTICS that is free and is given over the Internet. To find out more about the class, contact [email protected].
Experto en Estrategia de Datos, Consultor en Gobierno de Datos, CDMP Master, CertGED, PMP
4 年Definitivamente son 2 mundos completamente diferentes los datos estructurados y no estructurados, gracias por compartir Bill Inmon