How Business Requirements Become Data Models

How Business Requirements Become Data Models

This article shows the following.

  1. A brief history of IT data systems
  2. How IT projects are run the correct way with business requirements taking precedence over everything else
  3. How business requirements are transformed into data models
  4. How changes in business requirements affect data structure



Data History

First of all, one must know the history of data modeling, computing, and the Internet to understand how the information technology industry descended into chaos as far as data projects are concerned. Long ago, when the Internet was young before cell phones were common, and when almost no one had a computer at home, five people were primarily responsible for the creation of modern real-time data systems and the supporting methodologies and technologies. From the perspective of major corporations entrenched in their existing profitable ways, they were often treated like the four horsemen of the apocalypse and the devil himself. They were Lewis Latimer, Dr. Phillip Emeagwali, Dr. Edgar F. Codd, Ralph Kimball, and William Inmon.

Dr. Edgar F. Codd wrote the standards for relational databases and ACID compliance which enabled multiple users to use the same data, same database, and the same database structures at the same time without interfering with each other’s work nor the viability of the databases themselves. The Codd standards enabled technology to leap from just files in a file system to in-memory real-time data processing with encoded data validation rules based on business requirements. After Codd published his standards, many companies claimed to be compliant as a marketing ploy but were not. Even today, there are only three databases that are fully compliant with the Codd standards, ACID compliant, in-memory data capable, and real-time data capable for millions of users concurrently; PostgreSQL, IBM DB2/UDB Enterprise only version, and Oracle.

William Inmon wrote the standards for data modeling in the area of data warehousing using normalized relational data. At about the same time, Ralph Kimball wrote the standards for multidimensional data modeling, which is used for high-performance real-time data systems and business intelligence, advanced predictive analytics, and AI data cores. The Ralph Kimball strategy is taken directly from object-oriented programming and enables real-time high-performance transactions when paired with a major object-oriented language such as Java, Objective-C, or C++. William Inmon borrowed part of the Kimball data modeling strategy for data marts which are used to communicate with applications, leaving the bulk of the data structure focused solely on master data management (MDM), data integrity, long-term storage, and data quality. The Inmon strategy is not a real-time strategy and relies on repeatable ETL, extract transform, and load of data, from the master data source into data marts on a schedule via batch processing or automated triggering systems. Inmon’s strategy also includes data lakes and meta-data for unstructured data in original formats such as PDF and MS Word documents. Most companies and governments require, but often do not have, a hybrid system with a mixture of both strategies. Most databases do not conform to any known strategy and are in custom, disorganized, or poorly designed formats, but are closest to the Inmon strategy without the data marts, data lake, and metadata.

Most of the developments in data management took place in the 1970’s. However, big companies such as IBM already had a monopoly on most computer systems and relied heavily on file-system databases and text file processing systems such as AS/400. In the early days of computing, computers, as they were called, were actually physical human people, usually black people who were gifted at mathematics and engineering with enormous patience. Some of those "computers" became the first Fortran programmers and punch card system operators. The IBM mechanical machines were called "Business Machines," not computers in those days. IBM stands for International Business Machines, which ironically, they no longer make. Since the 1950’s IBM had sold mechanical computers and later text file processing systems. However, since almost nobody had a computer at home, and the Internet was young and virtually unknown, there was no need for advanced security. To hack an old IBM computer, one would literally have to be standing beside it and know how to operate it. Additionally, most organizations had less than five megabytes of digital data in the 1950’s through the 1960’s.

Way back in the late 1800’s a little-known inventor, Lewis Latimer, the fifth man back in time, had invented carbon fiber and the carbon-based filament used to make the light bulb actually work continuously. The invention of carbon fiber would eventually lead to the printed computer circuit board and the microcomputer age. All iPhones, televisions, and computers use carbon fiber printed circuit boards and other components derived from carbon fiber and the Latimer light bulb. On a side note, the Latimer light bulb is the one used everywhere, even by Nicola Tesla to light the World’s Fair. The Edison bulb used a paper filament and only lasted a fraction of a second. Edison hired Latimer, stole his work, and then didn’t pay his wages as he did with many inventors of the time. American history almost never mentions Latimer for the usual reasons. At the time of the invention of the light bulb, it seemed that nothing else could possibly be invented.

However, by the 1970s, with new computers being created, it was clear that using digital data was a major change that would save millions of dollars in cost. Most companies believed they would eliminate paper completely in a few years. Also, many companies have been saying that for over 50 years. Strangely, nobody has eliminated paper. However, companies and governments started to collect large amounts of digital data that caused the old file-system database systems to run very slowly. Additionally, they only supported one user at a time because the files in the file system had to be completely overwritten on every update to the data. That leads to problems with one person overwriting the work of another person accidentally. If two people worked on the same file and one person removed a mistake in the file while the other was just adding something to the file. If the person who removed the mistake saved the file first, the mistake would be put back when the second person saved the same file from their system back to the file-system-based database. A relational database allows two people to work on single records that may be part of the same data set, but not dependent on the same file for updates, reads, and writes. If user one modified line 5 and user two added line 8, they both could have their work saved without affecting the other person. If they happen to be working on the same exact line, the system could be configured to notify the users of a pessimistic or optimistic lock on the line. Since relational databases operate on the millisecond level or faster, one-thousandth of a second, the latter situation is very unlikely. The users would have to try to modify the same exact line at the same one-thousandth of a second.

Dr. Edgar F. Codd and his ideas for multi-concurrent user relational databases were revolutionary, but also seen as a threat to IBM’s existing sales and profit despite the fact that he was an IBM employee. As it would happen, he was pushed out of IBM by internal politics. However, a young company called Oracle adopted his ideas along with the data modeling standards of Kimball and Inmon. Additionally, a company called Ingress began to adopt new standards to increase performance. Very few people have heard of Ingress today because they eventually stopped innovating and updating their data systems after major successes and sales to the US Federal government and NASA. However, Oracle continued to innovate. Other companies such as Sybase followed the example of Ingress and stopped with the basic support for the William Inmon strategy with little thought to multi-user scalability because, again, almost nobody had a computer at home and usually only a few scientists were using the databases. Most people could not foresee seven billion people operating on an internationally connected network. Today, people can get email in the middle of the woods, via motorbike messenger, cell phone, and even wireless computers floating in orbit on the International Space Station. However, most technology did not keep up with the growth of the Internet and the demand for data processing speed.

Eventually, Sybase was purchased by Microsoft for the creation of MS SQL Server, but later split off as a separate company after the anti-trust lawsuits against Microsoft in the 1990s. Additionally, IBM and Microsoft have constantly, including presently, seeded the market with false information to push companies backward in technological development in order to compete with Oracle, which usually holds over 46 percent of the commercial database market. That false data may include the use of NoSQL tabular databases, columnar databases, Intel CPU-based hardware for high-capacity computing, and file system databases such as Hadoop. Unfortunately, most people are willingly swallowing the propaganda pole instead of doing actual research.

Backing up to the late seventies and early 1980s, enterprise data modeling, EDM, became an international mandatory standard for almost all software development with persistent storage of data. By then IBM was forced to create the DB2 database to fight off competition from the new company, Oracle, but only after failing to fight them off by seeding the market with false information about columnar databases, a middle technology between file system NoSQL tabular databases and relational databases. Eventually, it was clear that the in-memory relational database was superior to all other databases because of in-memory encryption security, encoded business rules, and performance. However, such a system would require real experts at very high wages. As recently as 2015, Oracle has hired resources that cost five million dollars/year for tasks such as cloud services development for databases.

Into the early 1980s, most information technology jobs were still low-paying jobs staffed by mostly secretaries who were mostly women and minorities, some of whom were retrained from manual mathematical computing groups, telephone switchboard operators, and secretaries. However, something happened and companies realized that they could not only store their own data digitally, but send their data to other parts of the world instantly across the global network, which would eventually be called the Internet. At the time, a company called Tyco Electronics, not the toy company, was responsible for laying undersea cables for the Internet. Tyco was later re-branded TE Connectivity. The TE subsidiary SubCom, the name of its under-sea cabling operation, was recently sold to Cerberus Capital Management LP for a mere $325 million. Today, they sell cables to Facebook and Google to increase the dominance of single corporations. Meanwhile, Amazon is planning a global wireless network that may make the entire system obsolete.

Back at the beginning of the Internet, once the Department of Defense Advanced Projects Agency adopted the standards created by Dr. Phillip Emeawali, the Internet became a viable global network that could be used for almost anything, including e-commerce, online services, and real-time long-distance collaboration in the scientific community. Additionally, the non-mechanical microcomputer had almost completely replaced IBM’s mechanical systems by the 1980s. Additionally, a microcomputer predicted election results far faster than a mechanical computer and proved the value of the microcomputer and printed circuit board technology. Lastly, student education software and entertainment software became available on computers, which made computers begin to spread into homes.

Unfortunately, that caused the salaries of information technology workers to increase dramatically. Information technology began to attract more people looking to "steal" a high salary even if they could not actually do the work nor like the work. Like most industries, hiring switched from hired for competence to hired by connections, age, ethnicity, sex, fraud scams, staffing agencies, etc. Very few IT workers are hired purely based on competence discovered through a practical exercise. Because the workforce changed so drastically, a key component required for software and data system development faded from the workforce very quickly; patience. People looking for fast money usually have little to no patience and want everything immediately.

In the late 1980’s, the new workforce decided that enterprise data modeling was too time-consuming and they were able to convince a new generation of managers that it was a waste of time with no return on investment. However, they were just too impatient and too inexperienced to see the benefits and return on investment in the form of stable software with maximum security. By the mid-1990s almost none of the workers in information technology knew what was enterprise data modeling. By that time, the Internet was a global force and created new international industries of trade. Suddenly, major corporations had tens of millions of international customers trying to buy everything on the Internet or trying to get support on the Internet. At the same time, consumer demand and the expansion of Internet use created massive security risks because anybody could connect to almost any computer even from another country.

Unfortunately, most home computer systems still use file systems to store data and most have no encryption and very little security if any. Because the new money-hungry workforce had pushed out the older, diverse, and more experienced workers and international standards for data governance, such as data modeling, most companies suddenly realized they had a problem. They had thrown the proverbial baby out with the proverbial bath water when they eliminated data modeling in an attempt to do everything fast instead of correctly with a plan, which includes data governance, project management, SDLC or similar methodologies, staff separation of duties by skill-set, security & encryption design, etc. Many IT workers now fake performing as many as 12 jobs while collecting one-third in salary what one job is worth.

However, if they don’t pretend to do the dozen or so jobs, work free overtime, etc., they will likely be replaced by H1B workers who will resume faking the jobs at a lower rate. It is common for code development, data modeling, ETL, security, database administration, system administration, interface development, requirements documentation, and several other aspects of IT to be assigned to one person. Clearly, nobody can do that many jobs in eight hours/day, but the “bosses” keep assigning the work so they can pocket the pay that should be going to a dozen resources. Instead of paying a team of twelve people an average of six million dollars, which is what their skills are worth based on what IT earns in revenue combined with the average cost of living, they will pay one person $80,000 to $130,000 or even less to an H1B, then simply pocket the rest of the money. Most people also work under constant threat of being fired, so they have to pretend to do things they cannot actually do to survive and managers will only retain employees who tell them what they want to hear even if they know it’s a lie and scientifically impossible. That’s how billionaires are made, stealing wages in a way that is semi-legal where they operate. The pressure usually comes from the top down and is triggered by greed.

Unfortunately, due to the aforementioned behavior, by the start of the twenty-first century, billions of people had had compromised data and compromised identities. Global fraud exceeded five trillion dollars by the year 2019 and is still hard to detect because of poorly designed data systems backed by inadequate staff by either numbers, skill level, or possibly both. The common chaotic IT situation has also led to vampire-like staffing agencies, which routinely keep up to 95% of pay from a contract and often leave workers with less than a living wage, to exhibit illegal behavior in order to force resources to work on their contracts. For example, if a resource has a high skill level, especially if they are female or in a minority group, it is easy for a staffing agency to have them fired from their current job to make them available for a job with their agency. They simply have to make a phone call to HR, spread some false information that is emotionally inflammatory to most people, or something related to legal litigation that is published on the Internet, and then wait for the company to shoot itself in the foot by firing their best resources.

The HR personnel, stirred by pure emotion, and without validating the false information, will have the person fired, usually with a lie or fake termination letter, stating fictitious budget cuts for just one position. Anyone who knows corporate accounting knows that department budgets are rarely governed with funding for a single position and the funding almost never changes in the middle of a quarter and is usually allocated an entire quarter, if not a full year in advance. HR will then falsify their records indicating that the employee stopped showing up for work to evade paying unemployment benefits and to evade having to explain their unprofessional behavior to their own superiors. Departments of labor are notorious for taking the "word" of HR representatives when they claim a worker simply stopped showing up for work even when the job pays over $300,000 / year or more.

The staffing agencies then cannot guarantee employment, but may obtain permission to submit a resume on behalf of the exploited employee to their client, often at a low rate that may even be below a living wage. This practice has created mass homelessness in places such as San Francisco because it is rarely punished by the California Department of Labor. There are literally millions of such cases in litigation and most state departments of labor get hundreds of thousands of similar labor complaints from employees. However, most departments of labor do nothing to help their citizens. Staffing agencies and biased HR representatives have been like a wave of cancer on the global workforce.

Meanwhile, the owners of such companies are becoming short-term billionaires at the expense of global data security and quality. Ironically, many companies have gone out of business because of long-term costs from poor data quality, fraud, and constant computer system breaches. It’s almost like HR is terminating themselves long-term with their behavior that is designed to destroy their own employees. That behavior may also include firing people at age 64 so they cannot collect their retirement benefits after 20 to over 30 years of service at a reduced pay rate, or keeping women and minorities in lower positions or completely out of the workforce.

Meanwhile, the quality of Information Technology, all of which is in some way based on the intellectual property of women and/or minorities, continues to degrade and cost trillions of dollars globally. Meanwhile, department heads continue to hire based on age, ethnicity, sex, or in some cases appearance. I once managed a department head whose entire team somehow had F-cup breasts. There seemed to be lots of strange talk around his cubicle about fighter jets such as the F-35. Eventually, the CIO asked me, "What the hell is going on there?" "Time to fire up the anti-bias training," I responded. Terrible hiring practices, workplace harassment, and hostile work environments typically cost companies over $64 billion / year, in addition to the costs of bad data.

As it would happen, companies and governments began to hire research firms such as Gartner to, “Figure out what the hell is going on!” Gartner and similar firms concluded that the reason IT systems are so easy to compromise and data issues are so rampant is due to the lack of adherence to the original time-consuming, but necessary, data governance policies and processes such as data modeling, data governance, and separation of duties with work and life balance. Most importantly, the new IT workforce was found to be skipping the analysis and design phases of over 93 percent of IT projects, stating, “We don’t have time.” In actuality, they should be saying, “We don’t have patience and wisdom that comes from long-term experience, nor an experienced technical management staff that supports us.”

Without an analysis, there can be no business requirements documents. Without business requirements documents, there can be no data modeling, no data governance, no high performance, no scale-ability, and no adequate security for data. That means that business rules, which include security rules, are almost never encoded into the structure of the database using an enterprise data model. What usually happens is that someone, usually a code developer, haphazardly throws together disconnected groups of unsecured database tables without telling anyone.

Essentially, most companies and governments went backward in technology to non-secured text files, databases designed like tabular single-table databases, and even file system management databases. Also, the newer technologies, such as Oracle, are being run in legacy compatibility mode for those older technologies and therefore recreate the performance issues Oracle, PostsreSQL, and DB2 were invented to prevent. No matter how advanced the relational database management system is, it requires a real data model design derived from business rules in order to perform well. Unfortunately, real data modelers also cost real money which no one wants to pay. Most people who are falsely calling themselves data modelers are secretly implementing legacy technologies without explicitly informing anyone.

Most importantly, none of those legacy technologies can defend themselves against modern computer systems. It’s like fighting a battle against a modern fighter jet using the Wright brother’s first airplane. No matter what the people using the Wright brother’s plane say, they will die in the battle if the plane itself doesn't kill them first. However, it’s easy to find thousands of people who support outdated technologies and even re-brand them with new names as if they are new, while the actual truth is that they have neither the skills nor experience to use modern secure technologies that operate in real time. That’s why it’s so easy to crack most computer systems.

Unfortunately, most people are in such a hurry to make money by pushing products to market long before they are ready, they end up costing themselves far more in legal fees and government fines after a data breach. One bank, Capital One, did not hire an elite team of specialists and used in-memory databases to manage their data for five million dollars in labor. They chose to use cheap labor from mediocre employees, college students, and H1B workers with Hadoop on AWS. Their HR representatives were also seen in public as overtly racist and age-biased in areas such as Richmond Virginia. The fines alone after the breach were sixteen times higher than the labor of the elite team, 75 percent of which was either female or minorities, which also nearly matches the demographics of the population of the United States, which is over 82 percent female and minority combined. When one excludes 82 percent of the workforce from IT, data system security, quality, and performance will always be reduced significantly if not completely eliminated.

They say that the root of all evil is money. The same is definitely true in information technology. Money caused most of the competent workforce to be replaced by people hungry for money. Then, billionaires and governments devised a scheme to steal most of the wages from IT. Since most IT workers were originally women and minorities, nobody did anything to stop it. In 2020, during the COVID-19 pandemic, the effects of discrimination were very apparent in the form of death tolls among the poor and minority communities. At the same time, those with white-collar IT jobs could work from home even though for decades they have had a 68 percent failure rate for IT projects and an average 25 percent cancellation rate due to failure to meet business requirements which include time constraints and budget constraints. The combined failure and cancellation rate is 93 percent and is traceable back to a lack of analysis and professional enterprise data modeling.

The lack of data modeling is due to the manufacturing style management which seeks to cut labor at all costs, forgetting that data modeling is a cost-reduction strategy. Instead of paying the realistic 1.5 million to 36 million budget for data modeling, most companies attempt to hire a fake data modeler, code developer, or someone with a computer science degree, which is for manufacturing computers, for less than $200,000. If funds are cut from a cost reduction strategy, which is what IT is, costs are exponentially higher everywhere else.

Some of the IT issues were also rooted in money because companies wanted to lower the cost of software development and maintenance by importing cheaper workers and exempting IT workers from overtime pay in federal law. Those “tricks” came from the manufacturing industry and did not take into account the effect of lower-quality work in advanced IT systems. Those cost reduction measures drastically decreased the quality of information technology work and transferred approximately three hundred billion dollars per year from the United States to India and billions more to China. Foreign workers routinely send money back to their own countries and, again, staffing agencies routinely keep 50 percent to over 95 percent of the pay of workers they subcontract.

The transfer of some of the wealth out of the country, which is about 25 percent of the cost of using US Citizen IT professionals with fairly paid overtime, increased mass poverty, crime, and other social issues in the United States. Meanwhile, Indians and other foreign workers were being “programmed” to say, “We are only taking jobs Americans don’t want,” and "We are so smart, our children are not challenged in American schools," which are obvious lies since American IT worker unemployment is over 35 percent, but reported as 3.5 percent by ignoring most of the unemployed after a few months. Unfortunately, some of the discarded US Citizen workers became the hackers responsible for major breaches of banks, governments, and corporations. At the same time, the Indian and Chinese workers are leaving most systems unsecured, unencrypted, and they often sell access to hackers. Additionally, may companies are discriminating against US Citizens by citizenship in favor of seemingly cheap foreign labor, but it's actually far more expensive due to quality issues and government fines for discrimination.

However, the amount of wealth stolen from the citizen IT workers in the US using foreign workers and unfair overtime laws is over one trillion dollars/year. That causes many home-related supply businesses to go out of business in America, which increases unemployment.

The hackers are not stealing that amount, but the compromised and weak computer systems are costing US businesses approximately three trillion / year due to poor data issues, the cost of constantly fixing bad data, costs from issues with global supply chains due to data issues, and other data related issues. Even without the hacking, the costs would still be well over three trillion dollars. Add to that the five trillion per year in global fraud and you can see how big the data issue has become. So, effectively, the plan to steal labor wealth has backfired triple the amount that is usually stolen plus some more.

The new trend to combat the three trillion in losses from bad data is to bring back EDM, Enterprise Data Modeling, and data governance. However, a real EDM specialist has an hourly cost of $500 to over $2500, depending on the type of job, time constraints, etc. Being accustomed to paying peanuts to foreign workers, most of whom are paid at the IT rates from the 1990s without 30 years of inflationary adjustment, nobody wants to pay the market value for an enterprise data modeler. There are people in San Francisco who make 100,000 / year, but cannot afford housing costs that are in the millions and taxes that are effectively over 50 percent of gross pay.

Additionally, most people claiming to be data modelers are actually just data analysts and database administrators. They are not actually capable of doing the job correctly, if at all. That makes the problems worse and reinforces the saying that money is the root of all evil. So, in a rather large nutshell, that’s how we got to where we are with the 85 percent or higher annual data project failure rate, a workforce without the experience and patience to do the IT jobs correctly, five trillion dollars in annual global fraud, and an annual average of three trillion dollars in losses due to bad data for businesses in the United States alone in addition to the annual average of sixty-four billion in workplace discrimination lawsuits. All of the aforementioned issues reduce the size of the US GDP by an estimated 28 trillion dollars.

Why does the US allow the loss of 28 trillion / year? The short answer is resistance to change. To understand this, one must understand that the US was built with slave labor and the practice never ended. The US always tries to implement slavery, which has the opposite effect on any industry that is not based solely on manual labor. If one reads the 13th Amendment of the US Constitution, slavery is still legal in the United States. They simply make the enslaved people out to be criminals to circumvent internal laws against trading products created using slavery. The US has the world's largest prison population for this reason.

Because slavery and war always worked to make trillions of dollars when the US was founded, it's nearly impossible to change how Americans conduct business even when such methods have been obsolete and illegal for centuries. Even when American technology changes the world, they still try to use slavery to mass produce the technology by exploiting foreign workers, or imprisoning domestic workers inside of prison factories. Most business owners don't even bother to question the slavery practices, which may be actual abduction slavery, prison slavery, food price fixing, paying below living wages, employing the homeless, employing foreign workers with wages far below domestic wages and local living wages, etc. To survive, millions of people choose to break the law, commit fraud, etc.

Because of this ignorance on the part of managers and business owners, I have often seen middle management speak to a senior data modeler, or architect as if they were a slave working on a factory floor, and cause entire companies, such as First Union Bank, Wachovia Bank, Deutsche Bank, Silicon Valley Bank, Bear Stearns, etc. to go completely out of business costing trillions of dollars. Managers don't seem to understand that real IT professionals are rare and not easy to replace. A data modeler can simply resign and withhold intellectual labor to cause an entire business to fail. Because many people in management have an enslaver mentality, they cannot succeed at managing IT; 68% of IT projects fail, 25% are canceled for a total failure of 93%. NO COMPANY CAN SURVIVE WITHOUT ITS DATA OR THE USE OF ITS DATA.

28 TRILLION US GDP Reduction Due to Poor IT & Data Management

  1. 5 trillion in fraud against minorities, made possible by politics and poor data systems, has a 16 trillion dollar reduction in GDP
  2. 2 trillion in fraud against Sr. citizens has an 8 trillion effect on GDP, especially because retirement accounts are heavily targeted.
  3. 1.3 trillion stolen by fake professionals from India and China, some of whom have been arrested for spying for Russia and/or China.
  4. 3 trillion in direct losses to companies due to poor data systems
  5. The issues above are creating massive national debt by creating annual deficits due to fraud, discrimination, and direct theft.

In summary, that’s why everybody has bad data. However, the fraud and theft due to bad data are larger than the entire United States GDP.

Running an IT Project Correctly

There are many aspects to running an IT project but the first three steps are the most critical.

  1. Throw away manufacturing style management principles
  2. Complete an exhaustive analysis and document business requirements in actual business requirements documents
  3. Hire the right people at living wages without discrimination, nepotism, and fraud schemes using corrupt staffing agencies

Manufacturing Management Style Issues

The aspects of manufacturing style management, which are designed to reduce the cost of manufacturing in order to increase profit per sale, have the opposite effect on information technology. Since the software is not tangible, the replication cost is almost zero. The cost-cutting in information technology using a manufacturing style of management is actually cutting the equivalent of research and development costs. However, manufacturing managers do not realize that they are not in manufacturing. First, they try to get the cheapest labor instead of the best labor as one would require for research and development. The company with the best R&D department usually dominates the market. Therefore it is not logical to cut labor costs at the R&D level.

Another major aspect of manufacturing style management that can derail an IT project is the substitution of resources for critical labor. In a manufacturing plant, one can easily retrain a worker to attach bracket A to bracket B instead of bracket C to bracket D because it is manual labor that requires minimal or no knowledge of the finished product. However, in IT, each person is specialized intellectually for a specific type of labor that usually takes years to more than a decade to learn. Therefore when a code developer, for example, is substituted for a data modeler, a resource that has thirty times the skills of a code developer, the work simply will never be done correctly no matter how much time is allotted because the developer may not even be capable of learning the abstract thinking processes of a data modeler. That is especially true because code developers often develop interfaces that look for similarities in data for generating reports and services. A data modeler must primarily look for how data is different and uniquely identifiable to create a system with data integrity. The thinking processes are totally opposite. It would be the equivalent of substituting a pilot with a flight attendant.

Another major aspect of manufacturing style management that causes IT project failure is adding additional cheap labor to a project after the project time constraints are exceeded or other issues arise. The collaboration costs of an IT team will cause the project to move slowly. Although it sounds counterintuitive to a manager trained with an MBA or in manufacturing-style management, making the team smaller with more experienced professionals would be the correct action. Software development is not a process of manual labor but a process of intellectual artistry. The bigger the team becomes, the more likely the project will fail. Additionally, one senior professional with more than twenty years of experience can be equivalent to or exceed the performance of over three hundred recent college graduates to mid-level workers.

One such project that proves that point of the Pennsylvania State welfare, unemployment, and health care system project conducted by IBM with over three hundred H1B junior-level workers. The project ended with IBM being fired and a loss of over 170 million dollars. It was later discovered that they could have hired exactly one guy to do it right or a team of five people for one to five million dollars. However, all six of those people were either minorities, over forty, or “physically unattractive,” according to one state employee, or a “weirdo,” according to her male counterpart. I find people make hiring decisions using everything except one’s ability to do the job. In many cases, most managers are hiring people they hope to use for sex later. Remember the guy with the team of F-cup-breasted women? Every one of them complained of sexual harassment. The push to return to the office, despite COVID-19, 88 fungal pathogens replicating in building HVAC systems from melting ice due to climate change, new polio outbreaks, candida pandemic, RSV pandemic, and several other outbreaks, is solely for sexual harassment in most cases.

Additionally, most people don’t know that normal people are not usually good at writing software. Most people who would voluntarily choose to do such a thing and be good at it will probably be eccentric in some way. Additionally, fraud and kickback schemes were later discovered between state employees and contractors such as Deloitte. The common practice of fraud often excludes women, minorities, or anyone over 35 from employment on major contracts. Anyone who is seemingly “too honest” is also excluded. Managers expect employees and contractors to lie when told to lie. These are very common characteristics of manufacturing operations. One master data modeler, who can do the job well, on time, and correctly, costs between $5,000,000 and $36,000,000 / year depending on the project type. A team of 300 low-waged, mediocre employees or H1B workers costs about $50,000 each, but adds up to at least $15,000,000, but produces nothing usable.


Analysis Design & Business Requirements Documents

In a proper project, an IT team must have a business analyst with the authority to collect business requirements from any employee or contractor. Since employees routinely withhold information in an attempt to create their own job security even if it harms the company, the business analyst must have the support of senior levels of management to remove any employee who refuses to cooperate with the collection of requirements. Employees who say, "I don't have time." when told to document requirements should be given 100% free time at home without pay and be replaced with a real professional. If the requirements are incomplete in any way, the entire project will fail. For accounting-related projects, a failed project will eventually force the entire company out of business even if it's a major bank.

The objective is to document how the business or government works, the special processes that set it apart from its competitors, legal requirements, and its objectives. The end result of the analysis should be a business requirements document with a conceptual data model that lists each field of data that must be collected with a narrative of how the data is uniquely identified, collected, stored, secured, and used.

The business requirements document, BRD, is then delivered to an enterprise data modeler for the logical data model design which transforms the words in the document into a relational data model that a database system can understand. The objective of the logical model is to capture the business logic. A code developer or fake data modeler will usually try to use a logical model as a physical model, put actual data into it, and then connect the applications. At the end of the project, they will discover that the entire system does not perform. Logical models are NOT for holding data nor serving data using applications. They solely capture business logic. If you hear someone say they don't have to do a physical model because the logical and physical models are the same thing, you need to get rid of them. Logical models are for capturing and understanding logic, physical models are for ACID compliance, performance, implementation of the logic from the logical model, and constraining actual data to the documented requirements.

The BRD must include at least the following aspects from the analysis. The business analyst should know all of the following aspects of a BRD and cannot be a business analyst in title only.

  1. Project scope – project boundaries, problems to be addressed, ROI
  2. Executive Summary – with measurements for success, focus, and goals
  3. SMART Objective - specific, measurable, achievable, realistic and time-dependent.
  4. Needs Statement
  5. Business Process Analysis
  6. Functional requirements
  7. Personnel requirements
  8. Financial statement
  9. Schedule - with timelines and deadlines (based on actual analysis, not emotions nor arbitrary deadlines)
  10. Stakeholders
  11. Cost-benefit analysis
  12. Risk analysis
  13. Number of expected concurrent end-users
  14. Expected data sizes
  15. Expected performance and response times
  16. Expected data consumption interface type (Web-based, Cognos, etc.)
  17. QA / Test Strategy & Unit Test Strategy
  18. QA Performance Test Strategy
  19. Expectations and assumptions
  20. Conceptual Data Model


How Business Requirements are Transformed into Data Models

This is the part where the “magic” happens. This skill is part of a neural pattern that naturally exists in specific types of people with specific ways of thinking. The only way to find such a person is to look for someone who is already like that, a data modeler. Usually, people naturally capable of data modeling will ask the same exact questions and produce the same exact logical results even if they have never met. To demonstrate, I will display a short list of business requirements, create a conceptual model, logical model, then a physical model. Then I will change the requirements and repeat the process to show how the data model changes based on the requirements.


Requirements

  1. Capture customer contact data. (Note: This is a typical request given by someone less experienced with software development. The requirement is too simple with very little detail. However, again, it is very common to get this type of request. At the end of the project, whoever gave this request will expect far more than what is in the requirement.)


Conceptual Data Model

In this case, the requirement is simple enough to define the conceptual data model as a set of data fields.

First Name

Last Name

Address

City

State

Zip Code

Phone



Logical Data Model



The logical model is much like the conceptual model, except it is in UML format with a primary key and table name defined. The primary key is an object code compatibility feature that also uniquely identifies each row of data. The primary key also makes the data model compatible with cluster scale-ability via APIs such as EJB, and Serializable Java Enterprise Java Beans. The primary key in this case is an automatically generated integer that gets set when data is inserted. Note that the zip code is an integer value and compare how that definition changes as the requirements change in the next iteration of this exercise. This model is in 1NF, or one level with no normalization.


Physical Model



In this case, the physical model looks identical to the logical model because it is so simple. If the requirements change, it is possible and likely that the physical model will be structurally different from the logical model because it is designed for performance and requirements, while the logical model’s sole mission is to capture business logic.


Data Definition Language – DDL

The DDL is the code actually runs on the relational database management system to create the table and constraints inside a particular database schema. In this case, the code is Oracle-specific, but can be easily rendered to any database.

CREATE TABLE Customer(

iDCustomer NUMBER(38, 0) NOT NULL,

FirstName VARCHAR2(65),

LastName VARCHAR2(65),

Address VARCHAR2(65),

City VARCHAR2(65),

State VARCHAR2(30),

ZipCode NUMBER(38, 0),

Phone NUMBER(38, 0)

);

ALTER TABLE Customer ADD

CONSTRAINT PK1 PRIMARY KEY (iDCustomer);

Requirements Set 2

Any of the following requirements would cause drastic changes to the structure of the data model. In this example, all of them will be considered in the design.

  1. Capture customer contact data.
  2. Reduce data size. Due to over one million customers with the same name, and other commonly used types of data. For example, There are 44,935 people named John Smith in the United States. Other sets of names are also identical in the database. Additionally, street names are identical across most cities. For example, there are 19,495 cities and they all have a main street. Main Street may be in the database millions of times in the address column. (physical model only)
  3. Customers can have more than one address of different types which are home, work, shipping, credit card billing, and temporary forwarding.
  4. Addresses can be in foreign countries such as Canada.
  5. Customers in foreign countries can have two first names, two last names, or middle names.
  6. Applications will handle names differently based on the country of origin.
  7. Track customer name changes and name type, such as maiden name.
  8. Customers have one and only one unique alphanumeric account number.
  9. Provide filtering capabilities by name, account number, date of birth, and phone number. (physical model only)
  10. Address data must be GIS-compatible. (More details added later)
  11. Provide a way to track changes in address over time.
  12. A customer can be a company or have a company name associated
  13. A customer can have multiple email addresses
  14. Phones can be associated with an address, but it is optional.


Conceptual Data Model 2

In this case, the requirement is simple enough to define the conceptual data model as a set of data fields.

Customer Account Number (38 character limit)

First Name

Last Name

First Name Secondary

Last Name Inner

Middle Names

Customer Name For Address (What does the customer call this place)

House Number (GIS compatible)

Street Name (GIS compatible)

Address2

Address3

Unit_Apt Number

Building Number

City

State

Zip code

Phone

Phone country code

Phone Extension

Work mail stop

Country

email

Date of Birth (this would be encrypted in a real live system)

Address types home, work, shipping, credit card billing, temporary forwarding


Logical Data Model 2

Copyright Hanabal Khaing 2024 Logical Data Model with Many to Many Relationships


This logical model is in 3NF, third normal format with many to many relationships that show how a customer can have many versions of their name, such as a name change after marriage, multiple email addresses, multiple phone numbers, etc. Again, this model is solely for capturing the logic from the written requirements. The logic from this model is used to create the physical model, which is designed for performance.

Physical Data Model 2


Copyright Hanabal Khgaing 2024 - Physical Data Model


This physical model is in 2NF, multi-dimensional format. One change in data types is that the zip code is now alpha-numeric instead of numeric only to accommodate global zip codes. The structure is drastically different from both the original 1NF design and the 3NF logical design upon which it is based. The dimensions are usable as filters as requested in the requirements. For BI tools, such as IBM Cognos for advanced predictive analytics, each dimension can be used as a filter. This strategy is known as an extended star schema and is part of the Ralph Kimball strategy. Additionally, data in one section of the model can be used to filter all other sections in a dashboard or report. For example, one could filter all customers born between certain dates in a certain area and create a targeted advertising campaign. It would also be very easy to connect sales data to this model and create sales filters by location, highest paying customers by location, products purchased, etc.

In following articles I will show how to generate base code for a CRM application from this model. From start to finish, after the customer data model is complete, a CRM can be created in approximately 12 to 24 weeks, depending on features and complexity.

Later, I will add data governance explanations to the structure to this model as well.





Thank you,

Hanabal Khaing

May your data always have integrity.

要查看或添加评论,请登录

Hanabal Khaing的更多文章

  • Complex and Correct vs Simple and Wrong

    Complex and Correct vs Simple and Wrong

    Eight years ago, a company hired me to fix a telecommunications system by updating the data model. The fix cost…

  • How People Steal Millions from Coworker 401K

    How People Steal Millions from Coworker 401K

    Have you ever taken clothes out of the dryer, matched up all the socks, but had one sock left over? How did that…

  • How People Steal a Million Dollars from the Data Modeling IT Budget

    How People Steal a Million Dollars from the Data Modeling IT Budget

    How Do Data Models Either Prevent or Enable IT Budget Theft Real, theft-deterrent Data models can only be created…

    1 条评论
  • How to Spot a Fake Data Model

    How to Spot a Fake Data Model

    Why is the Data Modeler and your Data Model More Important than the CEO, all C-Level Staff, and the Board of Directors?…

  • The 30 Plus Skillsets of a Data Modeler

    The 30 Plus Skillsets of a Data Modeler

    The Major Skillsets of a Data Modeler The total skillset count is at minimum 36 and may exceed 60 total skillsets…

  • Data Governance BIM & MDM

    Data Governance BIM & MDM

    Data Governance is the methodical macro and micro management of data storage and flow between countries and companies…

  • Why are over 800,000 Data Jobs Always Open?

    Why are over 800,000 Data Jobs Always Open?

    I could answer the question, "Why are 800,00 Data Jobs Always Open," with one sentence. MOST, not all, of the resources…

  • UNDERSTANDING SAP HANA

    UNDERSTANDING SAP HANA

    First I would like to note that SAP HANA, the platform, versus SAP HANA S/4, the replacement for the SAP ERP / SAP ECC…

  • Canonical Data Model Data Dictionary Mapping

    Canonical Data Model Data Dictionary Mapping

    The purpose of a canonical model is to provide inter-operability between multiple systems at the data level. Once the…

  • Asset Valuation Alert System for Real Estate & Securities Investments

    Asset Valuation Alert System for Real Estate & Securities Investments

    One of the most frequent requests I get as a data modeler is to integrate external unstructured "Big Data" with…

社区洞察

其他会员也浏览了