How To Tell if a Data Modeler is a Real Data Modeler
Hanabal Khaing
Senior Enterprise Data Modeler, Data Gov, CDO, CTO, Law & BRD to Multidimensional legal compliance real-time anti-fraud ERP Data Model, 60 SKILLSETS, $53,000,000+ annual, Bank/F500/Gov/AI Data fix $4 BILLION Min Deposit
The job title "Enterprise Data Modeler" is counter intuitive. Data Modelers do not "model data." In fact, one should hire a data modeler before the data exists to make sure it is collected properly, stored in the correct format according to business rules, and is trustworthy. Data modelers create, or model, the structure that holds the data based on documented business requirements, then generate up to 90 percent of the code required for applications, including cluster, scalability, and high availability capabilities at the code and database level. Professional data models are also encryption and security ready.
Unfortunately, the vast majority of people claiming to be data modelers are not real data modelers and do not possesses the 36 to 50+ skill sets of an actual data modeler. Most do not bother to learn data modeling because it is time consuming and can take as many as 15 years of on the job experience and training to learn. Because it is so time consuming, there is no degree that teaches the full set of skills for data modeling. One could get a PHD faster than one could learn data modeling.
A single ERP data model can take five to eight years or longer to complete depending on the business requirements and complexity. Most students in their early teens and twenties hear the word "years" and run in the opposite direction. Most people in IT do not have the patience of a data modeler and very few can execute a five-year or longer project.
To tell if your "data modeler" is real, simply ask, "What is data modeling," and what is "temporal data modeling?" If he or she says, "It's modeling data," he or she is not a real data modeler. If he or she says, it's creating tables in a database, he or she is not a real data modeler and has little to no understanding of what it is actually. Table creation in an actual database does not usually occur until the implementation phase which occurs near the end of the project. Before implementation is analysis which should be at least 50% of the project, followed by design, which should be approximately 40% of the project. There are cases, such as ERP and medical systems, in which the analysis is 85% of the project due to the complexity of business requirements. However, most do not have the patience to plan well before acting. Failure to plan it a plan to fail.
One should also know the difference between a Data Architect, a Data Modeler, a Data Analyst, an Application Code Developer or Code Programmer, and a Database Administrator. Those roles are not the same and are not usually interchangeable. Of all of the roles, the Data Modeler is the most difficult to learn because it encompasses 36 to 50 or more skill sets. Although the modeler contains all of the other data roles, one can't simply hire a modeler and tell one person to do four or more jobs due to time constraints and the fact that slavery is illegal and unethical.
It is important to find a real data modeler to avoid issues with customer data theft and identity theft, interruptions in supply chain, internal fraud in the accounting system, exploding data sizes due to duplicate data, data integrity issues, performance issues, interoperability issues, and integration and consolidation issues. If one has all of those issues or most of those issues, one must consider if one has a real data modeler, or if some other resources are interfering with the data modeler preventing him or her form doing his or her job. It is extremely common for entire departments to "attack" the data modeler because Data Modeling is a Data Governance Role and most IT professionals unwisely fail to comply with and heavily resist Data Governance Policies. If one has a data modeler or data modeling group, absolutely NOBODY should be changing or creating databases outside that group. If you hear the words "We don't have time," then follow the advice of who ever says those words, your project will have a 93 percent chance of failure due to an incomplete analysis, logical data model, and a professionally designed physical data model.
I recall working for a major manufacturer whose employees used to say, "We used to have a data modeling group, but we got rid of it because it was taking too long to complete projects." Today, that entire IT organization has been shutdown and outsourced with a total global job loss of count of over 26,000 jobs. Removing the professional data modelers also removed the quality controls and performance for the data. The lack of analysis, design, and QA produced literally dozens of failures in productions every single day. Because management was too inexperienced to figure out that they could not eliminated data quality and data governance to give projects the appearance of speed, the entire IT division failed.
I had also worked for a major manufacturers of network systems who was also a major telecommunications company who did the same thing. Today, that entire company is completely out of business and gone! Does anyone remember Nortel?
Any department or company that does not take data seriously will eventually fail and be out sourced or the company will go out of business. No industry is immune to data issues. In the US alone, over 500 banks have failed due to data issues that would have prevented issues such as fraud, over extended risk, etc. Even one hundred year old investment banks, such as Bear Stearns, were not immune to the aforementioned issues. Due to very poor data systems, no proactive management, poor risk analysis data, Bear Stearns went out of business and was eventually purchased by JP Morgan.
Even with new laws for CCAR, Comprehensive Capital Analysis and Review, which is designed to prevent the financial markets from failing, most banks are still not proactively making good management decisions based on real-time data combined with historical data because they never hired a real data modeler. The top excuse for not hiring a data modeler is cost, which is far below the billions companies lose due to poor data systems. As of 2021, bad data and poor data systems is costing companies an average of 33 percent of their revenue and an annual combined total of 3 TRILLION dollars. However, still, they won't invest one to thirty million dollars to fix their data, because most people truly don't understand it.
The Major Skillsets of a Data Modeler (Total skills count is at minimum 36 and may exceed 56 total skills combined into one resource to create a data modeler which is far greater than what is now called a "full stack" developer.)
- Interpretation and transformation of business requirements, processes, and rules into conceptual, logical, and physical database designs. This skill is a combination of Business Analyst skills, Data Analyst Skills, and DBA skills with a way of thinking and designing systems that is unique to data modelers. (Minimum skill count 5 - UML Data Model Design, Business Collaboration, ETL, SQL, Technical Writing, extreme patience and planning skills)
- Full Business Analyst skills with the ability to collect and understand business requirements with the skill to collaborate with other Business Analysts. (Minimum skill count 4 - Multicultural Communications skills, Problem Solving, Negotiation, Critical Thinking)
- Data Architecture Design; Full Data Architect skill set is required for data base design because use, growth, and performance must be considered in the database design. (Minimum Skill count 4 - "Big Picture" Hardware design, cost analysis for hardware purchase, long-term reports on hardware needs based on use and growth, implementation of hardware systems design)
- Database Administration; Full DBA skill set is required for performance tuning the data model design and recommending performance parameters for the physical database. (Minimum skills count 10 - Communications skills, SQL, knowledge of database theory, database engine design, knowledge of specific database types, knowledge of multiple types of queries, client-server model, operating systems, storage technologies, networking, and maintenance with includes recover, fail-over, clustering, etc.
- Object Oriented Code programming skills; This is required for generating and editing the base code for the physical data model. Data Modelers should be able to generate a complete, working interface to the physical database with separation between application code and interface code. The top language is currently Java, followed by C++. (Minimum skills count 3 - Object Oriented Code Design, UML and Technical writing, writing code, unit testing. )
- Web Development skills; used for generating and editing interface forms for data entry, and reading from the physical database. (Minimum skills count 3 - HTML, XML, graphics and graphics integration, JavaScript)
- Data Analytics (Minimum 5 skills - Scripting languages such as Python, Matlab, R, and SAS, statistics, reporting skills, charts and graphs, desktop tools such as excel and tableau, SQL)
- Business Intelligence; This can include custom made products, Cognos, Hyperion, etc. (Minimum skill count 3 - BI Tools, SQL, Communications and Presentation of Reports)
- Multiple Operating Systems; Multiple specialized types of Linux, Solaris UNIX, HPUX UNIX, IBM AIX. (Minimum Major skills count 4, major skills have internal required skills. For example, UNIX administration also requires knowledge of shell scripting, perl, disk tools such as the dd utility, etc. Therefore when one indicates UNIX as a skill, there are usually over a dozen skills within that one skill.)
- Multiple Databases with solid knowledge for use and administration for the top three enterprise databases which have a query result cache and in-memory database features; Oracle, DB2, PostgreSQL. A Data Modeler must also understand the evolution of databases and be familiar with all forms of databases; file system databases, such as Hadoop and 1970's legacy database systems, tabular databases, which are now commonly known as NoSQL databases such as Excel, Access, and 1970's legacy databases upon which they are based, Columnar databases, which were replaced by relational databases by the 1980's, relational databases, which are the most commonly used type of database, and relational in-memory databases with both columnar and row based indexing, which are the best type of database for real-time reports and business intelligence. (Minimum Major skills 8)
- QA, Data Governance, and Policy Writing Skills (Minimum skills count 3) Business requirements documentation, technical writing, UML, policy writing.
- Team management, peer review, practical exercise and testing for job screening, staffing profiles, project and portfolio management, electronic document management. (Minimum skills count 7)
- Application server configuration for clustering, high availability, and scalability (61). This could include Glassfish/iPlanet/SunOne, IBM WebSphere, BEA WebLogic, Oracle 10g application server, Wildfly / JBoss, etc. (62)Application deployment to a cluster.(Minimum skills count 2)
So, what does such a person cost to employ?
Full-time rate from 1999: $85 / hour
Full-time for 30-year permanent employee, which almost never occurs: $90 - $250 / hour or up to $500,000 / year
Contractor C2C (regular small to medium project): $250 - $500 / hour or $500,000 to $1,000,000
ERP/SCM Contractor C2C (large projects 5000+ tables): $3800 - $5000 / hour or up to $10,000,000 (includes enterprise level hardware for a commercial data modeling station)
The average cost to train a new data modeler is $2,000,000.