Advanced Temporal Data Modeling for High Speed Currency Systems, Financial Transaction Systems, & Real-Time Banking
Hanabal Khaing
Senior Enterprise Data Modeler, Data Gov, CDO, CTO, Law & BRD to Multidimensional legal compliance real-time anti-fraud ERP Data Model, 60 SKILLSETS, $53,000,000+ annual, Bank/F500/Gov/AI Data fix $4 BILLION Min Deposit
Temporal Data Modeling is the integration of temporal data types into the tables within Enterprise Data Models for the purpose of synchronizing object-oriented application states with persistently stored data in real-time in order to enable real-time OLTP. It should be an integral part of every data governance policy. However, most professionals have never heard of Temporal Data Modeling, nor data governance, but anyone who has traded stocks, used online banking, or traded currency has unknowingly used it. I know, because I have been subcontracted to covertly and confidentially repair temporal issues for bank data models dozens of times over 27 years. Knowledge of Temporal Data Modeling and real-time OLTP would enable development teams to write enterprise-level code that avoids deadlocks, and data integrity issues, and allows synchronized data in relation to in-memory application transaction states. To enable real-time banking, applications must be capable of real-time data processing and transaction states, which requires a multi-dimensional data model.
First consider that an application must be capable of using the speed, encryption capabilities, and multi-threading features of a CPU, and also consider how fast a computer CPU runs in time. Current CPU speeds are measured in gigahertz, abbreviated GHz, or one billion cycles / second. A modern 5 GHz Sun/Oracle M8 32-core SPARC CPU operates at five billion processing cycles per second multiplied by 32 cores. It can simultaneously process 256 threads per CPU IN NANOSECOND speed levels! The CPUs also have 32 DAX engines, Data Analytics Accelerators, for In-Memory Query Acceleration combined with a query result performance and scalability cache, In-Line Decompression, and Java streams acceleration by the use of Encryption instruction accelerators in each core with direct support for 16 industry-standard cryptographic algorithms. A single blade server can have eight CPUs and process 2048 threads simultaneously. Since Java and C++ both have internal encryption capabilities, the enterprise server can run everything encrypted from database, to network, to graphical user interface and do it far faster than older CPUs could operate without encryption.
With a well-designed multi-threaded object-oriented application, an Enterprise Data Model, and a well-tuned in-memory database, a single blade server can handle over two million concurrent users with sub-second response times over 2048 threads which can respond in .0001 seconds or faster while encrypted. A thread is the path followed when executing a program using a high-level multi-thread capable programming language such as Java or C++. For example, all Java?programs have at least one?thread, known as the main?thread, which is created by the?Java?Virtual Machine at the program's initial activation, when the main() method is invoked within the main?thread. The program can then launch multiple threads to do multiple things at the same time, making a single 8-CPU multi-thread blade server, with 256 cores, 8 or more threads per core, and one millisecond or faster response time, greater than or equivalent to 2,048,000 nodes configured for distributed computing with a usually single-threaded file system database, such as Hadoop, which usually have a one to three second response time per thread if the data is perfectly exported from a well-designed data model.
Enterprise appliances can have 36 CPUs, 1152 cores, and 9216 thread configurations, with response times as fast as less than one millisecond even from a relational in-memory database with trillions of unique records. It would take over 9,000,000 x86 distributed computing nodes with Hadoop or other file system databases to nearly reach the capacity of one Enterprise Appliance or Enterprise Blade Server. The appliance would cost approximately three to six million dollars if a duplicate server is used as a fail-over, but the 9,000,000 node servers would cost approximately 900 million to over 13.5 billion dollars, for the hardware alone, plus the cost of several dozen data centers instead of just one or two data centers, bringing the cost to approximately 20 billion / year for distributed computing, a fact that Google has discovered the hard way with approximately 900,000 x86 blade servers. However, the distributed computing system would only have the capacity to serve approximately one user per server with a one to three-second response time if the data was exported from a well-designed database; 900,000 end-users concurrently, or 1.1 to 3.2 billion users/hour, depending on response time and network latency. If the data was exported to Hadoop without proper data modeling, the response time will usually be greater than three minutes per user; about 50,000 concurrent users. The advantage of the distributed computing system is that it can be grown over time at a low initial cost. The Enterprise Appliance would require millions of dollars immediately, but the long-term cost would be much lower. The Enterprise appliances could serve over 9.2 million end-users / second or faster, which is far greater than the population of the world over one hour of service with full encryption. Of course, there would be fluctuation in capacity for either system based on the complexity of the applications and services provided. However, the 9,000,000 distributed nodes would have less computing capacity than a single 8-CPU SPARC RISC Enterprise Blade Server.
Imagine your mom as one 256-thread RISC CPU, but with 256 arms moving at five billion kilometers / second while cleaning the house, taking care of the children, doing the laundry, cooking food, and working 10 hours/day on a full-time job all simultaneously, but she gets it all done in one one-thousandth of a second, then folds her arms and smiles as if it were nothing. That's how fast an application can be when paired with an Enterprise Data Model with Temporal Data Modeling features and a multi-threaded object code-capable application on enterprise-level hardware.
Why? The application needs to know when a row of data was created or updated down to a fraction of a second to tell if the transaction state came before or after the record was updated. Based on that information, the application can be programmed to update its own instance of the object that represents the data or the persistently stored data. If it does not have the information, it cannot know what to do. Also, if it is not object-capable and multi-threaded, it cannot utilize the capabilities and speed of the CPU and it will run as if it's on an old computer from 40 years ago no matter how much power the physical hardware makes available.
Imagine you own a hospital network with 100 hospitals. At 1:00 PM, a nurse tries to get medication for a patient who was treated at five different hospitals. An automated prescription dispenser system supplies the prescription based on patient data, prescriptions, medical history, etc. Data in the system is updated showing a fatal mix combination between the medicine the patient is about to receive, a medication they may have received a day before, and the medicine a different medical specialist at a different hospital prescribed and entered into the system at 12:59:502032 PM, less than 30 seconds before the nurse gets there. The data further shows that the patient must be given the medications 48 hours apart at specific times and at which times the system releases the other medication. When the nurse goes to get the set of pills, the system will prevent the prescription dispenser from releasing the medication because it would be fatal when mixed with the other medication the patient has already received.
The system can also send an alert and prevent a patient's death. After getting the alert the doctors may simply restructure the times, validate and update the data, or change the prescriptions. If the data model cannot track multiple doctors, medications, prescriptions, and treatments for the same patient with time logs down to a fraction of a second for when the patient receives medication versus when the fatal mix records were updated, it may not be able to prevent the death of the patient by determining if the patient had received the medication at a specific time and what their previous and future treatments and medication will be. The system also legally protects the doctor because the fatal mix information may not have been known or released by the pharmaceutical companies until 30 seconds after the patient was dead. If the system only recorded the day the information was released, the doctor may seem guilty of a crime.
Therefore, a real-time OLTP level of precision would be very useful to insurance companies that provide medical malpractice insurance and hospital networks. With Temporal Data Modeling, the system would always track that information to a fraction of a second. The reason the system must operate in fractions of a second is that there are literally millions of medications in different dosages and billions of patients using medication. Many things will happen simultaneously across many locations. A system delay or incomplete information could be fatal. A nurse could also get medication 10 seconds before a fatal mix update and almost give the patient the medication before the system sends a real-time OLTP alert to multiple people such as the doctor, front desk, or a real-time monitoring screen in the patient's room. Because medical errors are the third leading cause of death in the US, and they cause between 250,000 and 440,000 fatalities annually, a real-time OLTP electronic medical records and prescription drug dispensary system may eventually become legally mandatory. Imagine a doctor yelling, "Stop that nurse!" and running to tackle him or her after getting an emergency page that their prescription may kill a patient.
Similarly, imagine you are trading currencies as an investment. If the value of the currency is changing at the level of a fraction of a second because hundreds of millions of people are trading it, your system must be able to also operate in fractions of a second to get the best prices. If you bid $100 on a foreign currency at 1:00 PM, but the price had gone down to $50 at 12:59:903202, less than one-tenth of a second before 1:00 PM, the trading system should buy at $50 and save $50 in your account. A less advanced trading system would buy at $100 and you would lose $50 instantly.
Real-time OLTP is a requirement for most modern IT systems such as bank systems, investment systems, health care systems, currency systems, etc. Additionally, e-commerce has become so widespread that logistics inventory systems also require temporal data modeling to monitor sales versus inventory in real time. Otherwise, a vendor could sell more products than there are physically in inventory or lose sales because the inventory level was not replenished at a faster rate than the sales. Have you ever gone to a store for a product and found an empty shelf where the product should be, but no product, or seen a web page message indicating an item is "out of stock"? That's because that store's logistics system does not have Temporal Data Modeling features combined with a real-time OLTP-capable application that connects to internal and external supply chain management systems. I keep waiting for companies to figure out that all supply chain systems need to be interoperable. I once worked for a company that got a two hundred million dollar order for parts they had not made yet because the SCM system did not track customer usage versus inventory in real time. After that, my temporal changes were approved. Although I told multiple managers about the issues two years before the incident happened, no change was approved until there was "concrete proof," in their words, that it was needed. It is always needed, no matter what. It should be a mandatory requirement in every data governance policy.
Unfortunately, many companies and government supply chain system and their suppliers may not even have Temporal Data Modeling nor real-time OLTP at all. In case you're wondering, if you're implementing Hadoop as your transaction system, you don't and cannot have these features because they are not supported on distributed file system databases. Hadoop does not support data types for columns, primary keys, foreign keys, etc. Hadoop was designed only for archiving data, not OLTP and thousands to trillions of transactions. Only Relational Database Management Systems, RDBMS, with in-memory capabilities, such as in-memory database, in-memory query, and a query result cache, support advanced Temporal Data Modeling and Enterprise Data Modeling. There are only three databases with those features; Oracle 9i or higher, IBM DB2/UDB, and PostgreSQL. While others have temporal data types, the lack of an object code-compatible query result cache and in-memory capabilities make the system too slow to operate as a real-world, real-time OLTP system. Real-world systems routinely have more than twelve billion to trillions of unique records. Systems such as Microsoft SQL Server and Analysis Services have limits that are based on their core open-source technology which often have limits from the 1960's to 1980's. If one reads the documentation, one would find warnings like the following.
Quote from Microsoft about Analysis Services: "Warning: With the restriction that no single column in the table can have more than 1,999,999,997 distinct values." (note that most companies have more than two billion distinct values)
Quote from Microsoft about Analysis Services: "The following are theoretical limits. Performance will be diminished at lower numbers."
Unfortunately, companies routinely lose hundreds of millions to billions of dollars in a single quarter because their IT staff does not have an Enterprise Data Modeler and advanced code developers who understand Temporal Data Modeling and real-time OLTP respectively. Nor do they understand long-term IT strategies. Even in the cases in which companies have said resources, they often do not have a Data Governance VP, CTO, or CIO who runs QA via management policies, but also exists to enforce policies that require Temporal Data Modeling and real-time OLTP. Most companies also make the mistake of placing their data modeling resources under the management structure which must comply with data governance policies. Data Modelers should always be in a VP position and never be under the same management structure as the code development team or any team that must comply with Data Governance policies. It would be like placing QA, Quality Assurance, under the management of a developer whose code can be rejected by QA.
Beware that most IT teams will seek to remove the data modeler and other QA resources from the company so they don't have to comply with the policies. If they don't comply, no one will know until after a major failure because the data modeler or Data Governance VP is not there to point to the issue and enforce the policy. Most CIOs are managers only without deep hands-on IT skills and are not usually able to detect policy violations nor malicious attacks on QA resources. Additionally, the team that failed will keep secret as to why they failed and will scapegoat the physical hardware, the network team, the vendor technology, or the DBA and system administration teams. Legal would call that management structure a conflict of interest. IT is not immune to the usual employee dramas in which lies, deception, and high-school-style clique alliances which are formed to get promotions while sabotaging the most competent employees.
That behavior also contributes to the 93 to 95 percent IT project failure, and cancellation rate, 68 to 70 percent failure and 25 percent cancellation, and the usual 62 billion dollars companies pay annually in discrimination lawsuits by the EEOC. Some single settlements were over 250 million dollars! One of the top types of suits is age discrimination. Keep in mind that it takes about 15 years to become a master Physical Data Modeler, so, most of those resources will be at least 35 to 50+ years old, which is at least 10 years older than most IT workers unless their training began around age 16 or 17 as mine did. Many of the top data modelers are currently over 70 years old with nearly 50 years of experience. Data Modelers routinely have experience far beyond the level of a Ph.D., CIO, or CTO. No degree covers complete conceptual, logical, and physical data modeling because the time to master the skill is longer than most degree programs. Even most computer science curricula only spend about one hour on data modeling if any at all. Don't believe it? Look up a few CS degree courses and search for "Data Modeling." Most CS courses only teach basic database system knowledge with no or very little data modeling.
The following is the only database course in the MIT Computer Science curriculum because it focuses primarily on designing and manufacturing physical hardware computers. The course does not cover conceptual, logical, nor physical data modeling with a concentration on multi-dimensional modeling which is required for real-time OLTP. It only covers the normalization aspect of data modeling which is the most basic skill used in logical data modeling. A logical model is usually third normal, 3NF, or higher to capture business logic, but the data model that actually holds real data and is designed for performance is 2NF extended star-schema multi-dimensional. People with CS degrees routinely try to use a logical model as the physical model while it's in 3NF or higher format. Because the course does not teach the transformation of requirements into conceptual, logical, and physical models, the CS degree students often don't know the difference between a logical and physical model nor when to use each type of model. Many of them have never heard of a conceptual model. The results are extremely slow performance, zero object code compatibility, poor application performance under light to moderate end-user loads, zero Temporal Data Modeling, and poor data integrity, even with advanced databases such as Oracle and DB2.
MIT CS DEGREE COURSE 6.814 Database Systems (1.5 hours/week for a few weeks only States: "This is not a course on database design or SQL programming")
Data Modeling requires 5000 hours of training!
Click on the course links and note that the CS course focuses almost exclusively on hardware and code algorithms, ignoring the data, data structure, data security, and data performance. The data modeling lectures only deliver surface knowledge of data modeling but do not teach full data modeling for neither conceptual, logical, nor physical data modeling. The lectures also do not make it clear that the primary objective is to transform business requirements into a data model that enforces the requirements. This proves that real-world experience and specialized job training is far better than any college degree, even one from MIT.
Most organizations have far more complex data issues than hardware and code issues. Even when they believe they have hardware and code issues, the root cause analysis almost always leads back to data structure, data security, data integrity, and data access performance issues. All of those issues should have been pre-solved with a data governance policy.
U (Fall)
12 Units
Topics related to the engineering and design of database systems, including data models; database and schema design; schema normalization and integrity constraints; query processing; query optimization and cost estimation; transactions; recovery; concurrency control; isolation and consistency; distributed, parallel and heterogeneous databases; adaptive databases; trigger systems; pub-sub systems; semi-structured data and XML querying. Lecture and readings from original research papers. Semester-long project and paper. Students taking the graduate version complete different assignments. Enrollment may be limited.
==END MIT SECTION==
领英推荐
Additionally, because advanced coding and Temporal Data Modeling take significant time and extreme skill, and most IT projects are staffed with manufacturing style manages to cut labor costs, most people in IT are there not because they love IT work, but for money, most IT projects are rushed with arbitrary time constraints and do not allow time to do the job correctly with formal SDLC, business requirement analysis, design, and implementation in that order. Additionally, most IT resources who are in IT for money don't care about the fine details of application and data model design. They just want to get something EASY done, get paid a ton of money, and go home at 4:30 PM or earlier. Again, the current failure and cancellation rate for IT projects is 93 to 95 percent and is directly caused by the aforementioned issues. Beware the word "EASY" and the phrase "We don't have time." If you hear them, your project WILL FAIL!
One way to "fix" those issues is the implementation of Business Information Modeling, BIM, systems to predict and visually model the effects of decisions before decisions are made. Anyone with a Ph.D. should have learned the terms Conceptual, Theoretical, and Analytical Frameworks. BIM software combines those three frameworks to analyze business decisions, business rules, business processes, and business requirements to predict an outcome with probability statistics. Additionally, companies should have a BIM that contains historical decisions and results to prevent the mistakes of the past from being repeated in the future and to augment the analytical framework metrics. For example, a BIM would show financial and employee retention losses if the aforementioned Temporal Data Modeling and real-time OLTP resources are not hired or arbitrarily laid off for short-term cost-cutting gimmicks due to high salaries or declining stock prices. A BIM can be connected to project management systems and combined with staffing profiles to show exactly what employees do and the company-wide impact. CEOs may ask, "How much money is that employee making for the company and how much does he or she cost?" Without a BIM, data-driven project management, and staffing profiles, that question would be nearly impossible to answer. So, most people rely on their "feelings" to answer the question. Try counting the number of times you hear the word "feel" at work. I'm sure CIOs and CEOs would agree that actual data and metrics are better than "feelings."
Usually, despite "feelings," the long-term company losses are millions of times greater than the salaries of critical resources that are laid off purely due to seemingly high salaries, which are not usually actually high in comparison to housing and living wages required for most areas, or to boost stock prices. Unfortunately, almost all companies and governments are managed for the short term with no thought of long-term survival. That may be because people have the same mentality. Similarly, most people run their lives for the short term with no thought of the 2.1 million dollars they may need to retire at age 65. Most people need to save $50,000 to $100,000 / year to retire at age 65 but do not negotiate their salary nor update their skills to compensate.
Unfortunately, most don't realize that the average person alive today will live approximately 35 years beyond retirement age, due to advances in medical stem-cell technologies, and have an average of $5000 / month in expenses due to rising living costs, taxes, commercial health care system, poor public and national mass transit, rising energy costs, natural disasters, etc. Most people cannot withstand the commercial health care system in which the monthly prescription payments can be at least $800 for people over age 65. Additionally, there are single pills that can cost over $2500, and other medications, such as Soliris, which costs approximately $409,500 per year. Have you ever multiplied $2000 or $5000 / month in expenses by 35 years? (($5000*12 months)*35 years) = $2,100,000. The $800 / month in medication expenses alone would cost $336,000 over 35 years. Even if one plans to somehow just die at 80 years old, fifteen years of medication would still cost $144,000, which is far more than most have saved for retirement. The monthly $5000 figure is for someone completely healthy. One would need to save over $20,000,000 if one is in poor to moderately good health. Like IT systems, especially data systems, life requires long-term planning and foresight.
However, when companies realize the value of Enterprise Data Modeling and Temporal Data Modeling, data modeling jobs pay over $250,000 to $300,000+ in areas such as Washington DC for full-time employees according to glassdoor.com, although the title is usually mislabeled Data Engineer or Data Architect. Most assume that the words Engineer or Architect mean that the resource has more skill, but the Data Modeler always has approximately 38 to 56+ times more skills than an Engineer or Architect, including all the skills of a Data Architect. Most managers and sometimes CIOs have never researched the skill sets of a Data Modeler and don't truly understand what they are. A data modeling contractor could cost over one million dollars/year, but be worth every penny and more.
Now, let's get technical! How does one make a Temporal Data Model? First one must know how to make standard conceptual, logical, and physical data models with business logic encoded into the model. Depending on one's learning capacity, those skills will take about a year to master conceptual data modeling, two to five years to master logical data modeling, and about 15 years, including the five years of conceptual and logical modeling, to reach mastery of physical multi-dimensional temporal data modeling for multiple industries, all major databases, and multiple operating system strategies, including the 38 to 56+ skill sets required for business requirements collection, understanding business requirements, and transformation of requirements into a data model.
Additionally, one must learn to make ACID-compliant data models for ACID-compliant relational database management systems, RDBMS. ACID stands for Atomicity, Consistency, Isolation, and Durability. For example, all tables must have primary keys, indexing, and foreign keys to constrain data according to business requirements. The structure must match the purpose. For example, a business intelligence model should be in the second normal star schema format, while a data warehouse in the William Inmon strategy can be in the third normal or higher format with data marts in the second normal star schema format.
Atomicity means that changes are accepted as all complete changes or nothing. Even if the disk fails or some other part of the system fails, the system only records complete changes and never partial data or partial changes.
Consistency means the system must have relationships in the data that enforce consistency. For example, a foreign key from customers to orders and orders to payments will ensure that all orders are associated with a customer and with a payment. An order placed without a customer or payment should be rejected automatically, which is also an anti-fraud feature. If you look at your database diagram and see no relationships/foreign keys, no primary keys, and no unique indexes, it is wrong and will definitely produce massive data issues.
Isolation requires that more than one person can execute transactions on the same data at the same time without conflicts with full transaction logging using optimistic locking as opposed to pessimistic locking. In optimistic locking, end-users can read a data record that is being updated, but the changes in progress are only visible to the person making the changes. When the user making the changes commits the changes, they then become visible to everyone on the system. In pessimistic locking, which can be manually enabled, no one can read the data section that is being updated until the update is complete and unlocked. Multiple users should be able to read and update the system at the same time without affecting each other.
Durability means that the system can recover all committed transactions even if there is a major failure. Most of the durability features are built into the RDBMS, but the data model must also be compliant with data modeling strategies for full compatibility. It is possible to create a bad data model that duplicates, corrupts, and loses data no matter how good the durability of the RDBMS may be.
The images below show a sample ACID-compliant star schema data model with a time dimension, version column, and a locking-transaction timestamp column. Remember that each table must have a matching application object capable of multiple multi-threaded instances in memory. You should be able to right-click to save or view the image in full size.
Once one has mastered the 38 to 50+ skills of a data modeler, specifically physical data modeling and multi-dimensional modeling, one must ensure that each table has a timestamp column with at least six digits in the fractions of seconds. The locking-transaction timestamp in the time dimension can be used to filter all transactions for a specific time by simply clicking on a time in a BI tool such as Cognos. The column can also be used to generate charts and graphs. One must also know how to use timestamp functions for every type of database. The following shows how to select the timestamp from Oracle with multiple time zones. Most databases default to 6 digit second fractions. Oracle can be extended to 9 digits, nanoseconds, using TIMESTAMP(9) in the table creation DDL. The .927950 and .366065 are fractions of a second.
ALTER SESSION SET TIME_ZONE = '-5:0';
ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';
SELECT SESSIONTIMEZONE, CURRENT_TIMESTAMP FROM DUAL;
SESSIONTIMEZONE CURRENT_TIMESTAMP
--------------- ---------------------------------------------------
-05:00 04-MAY-2018 02.17.56.927950 PM -05:00
ALTER SESSION SET TIME_ZONE = '-8:0';
SELECT SESSIONTIMEZONE, CURRENT_TIMESTAMP FROM DUAL;
SESSIONTIMEZONE CURRENT_TIMESTAMP
--------------- ----------------------------------------------------
-08:00 04-MAY-2018 10.18.21.366065 AM -08:00
To automate the timestamp, simply define the table with a timestamp update trigger. The trigger simply calls the CURRENT_TIMESTAMP function automatically when the record is updated. The DATETIME?data type in Oracle has a default of?NULL?unless defined with the?NOT NULL?attribute. Also, note that you may have to manually update this if you use a tool such as Erwin or ER/Studio for data modeling. Check if your version has the capability to define a trigger and function as a default value for a timestamp. If not, simply type it in the DDL before executing it.
CREATE TABLE t1 (
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
CREATE TABLE t2 (
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
dt DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
After you have ensured that each table has a timestamp with an update trigger, you must also capture the changes using a change data capture extended star. Change data capture simply captures which changes were made, where they were made, by whom, when, and why using a built-in security mechanism that should also be encoded in the data model. I use this in conjunction with group and role level security in my models to enable CRUD, create, read, update, delete, and function level security limited by group and role for each row of data. The CDC also can be designed to back up the changed data to an alternate schema with a version number so the entire database can roll back individual changes based on version numbers. Obviously, there must be a version number column with an auto-increment trigger on each table and a trigger that moves data to a backup schema before it is updated. Next, connect the CDC, and change data capture star, to the data dictionary so that the code developers can use the [alt] text from the data dictionary to automatically explain, in the graphical user interface, what was changed and where it is in the database. If your database does not have an internal data dictionary with alt text for interfaces, you may need to augment your staff by hiring a more experienced Enterprise Data Modeler / Data Governance VP, CTO, or CIO.
Most importantly, one must ensure that all fact tables have at least one-time dimension for the temporal filtering of the data. The design of the time dimension is heavily affected by the business requirements. All requirements must be documented before the model can be completed. Data modeling is the transformation of business requirements into data model designs for databases. If your database exists, your business requirements document should also exist. The extended star CDC will allow filtering across the entire database, but the time dimension will allow filtering directly on a specific star's fact table with less object code and the design of the dimension could be different for different stars. Both the extended star CDC and time dimension are compatible with business intelligence tools such as Cognos, Hyperion, etc.
Below is a sample of code in Java for using JPA, Spring Framework, and transaction management to synchronize application transactions with the database. Note that you will need to build an EJB to match each table in the database then use your TableObjectName.timestamp column get() method combined with a try/catch/finally block with if/then/else logic to compare the transaction state timestamp to the timestamp in the data. Also be sure to pay close attention to your object-oriented design, OOD. Specifically, note mutable objects versus immutable objects. For example, a String, in Java is an immutable object, which means you are not supposed to change it once it has been set. If you need a loop or a variable that changes its value multiple times over the life of an application, use StringBuffer, the mutable object. Otherwise, your applications will be mysteriously slow and leak memory. This can get far more complex, but this base code is enough to get started. Note that you must match the Java or C++ functions in milliseconds or nanoseconds with the timestamp precision defined in the database which could be 1 to 9 digits. In this case, you would use System.currentTimeMillis() for six-digit fractional timestamps and System.nanoTime() for nine-digit fractional timestamps.
@Aspect
public class AspectTimeStampTransactions() {
@Pointcut("execution(* org.springframework.*.AbstractPlatformTransactionManager.getTransaction(..))" )
public void newTransaction() {}
@Before("newTransaction() && args(transactionDefinition,..)")
public void beforeStartOfTransaction(TransactionDefinition transactionDefinition) {
// here goes you code to check database timestamp in a table using an EJB
long start = System.nanoTime();
//long start = System.currentTimeMillis() USE THIS FOR MILLISECONDS
System.out.println("Transaction " + transactionDefinition.getName() + " started at " + start );
}
}
Good luck and may your data always have integrity!
Hanabal Khaing
Enterprise Data Modeler, Data Governance SVP, CTO
?????
--
6 年Very informative article - thank you.