Why EDC systems generate too many queries
Triggered by a post from Brad Hightower I would like to explain the background to why some EDC systems generate what may be considered too many Queries.
Not all queries are bad or have a great cost impact. If a query remains open for less than 30 seconds, then typically it is doing its job to clean data at source of entry. Resolution is measured in seconds.
When a lot of queries might be considered bad is when they stay open either because they cannot be closed as they are raised, or, they are created in some kind of batch operation - Brad Hightower 's point.
The reasons for too many queries are broken down into either the Product, or the implementation of the product. First of all, a few words on what queries and edit checking are. Skip this if you are an edit check/query pro.
1. Background
1.1 What are edit checks and queries?
In the context of an EDC system, a query is a question raised about the correctness of a clinical data value. Queries are raised either manually or through edit checks. Manual queries are typically raise by CRA Monitors or Data Managers and less commonly by Medical Monitors and Pharmacovigilance. Edit checks are prepared - either in programming languages like Javascript/VBscript (InForm) or configured through a point / click interface (Rave/Veeva). Some products have a hybrid between the 2 methods. Some software allow queries to be raised to an organisation (e.g. site), user, but more typically to user role(s). Some products allow queries to be forwarded providing a means of triaging or adjudication.
1.2 How do edit checks work?
For those less interested in the tech part, skip to Study Build
An example of a Javascript query on a Date of Birth to check the patient is over 18 at the date of consent (using functions for date processing pre-built within the EDC product) might be;
dobPlus18Years = addYear(BRTHDTC, 18);
return dtLessThan(dobPlus18Years, ICDTC);
The example uses CDISC standards for date of birth (BRTHDTC) and date of consent (ICDTC).
Edit checks are typically true/false booleans. Depending on the EDC product, false means an edit check failure and a query with a pre-defined text phrase is sent to the stated user(s).
With true/false booleans, if any of the referenced values change to correct the result, then the query automatically closes.
Some edit checking tools check data across forms and visits. Some do not.
Most edit checking tools support multi-variate checking meaning multiple values can be compared in an expression.
Some edit checking tools support wildcard field reference edit checking meaning a single definition of an edit check can be re-used across visits or instances of a form.
Some edit check tools check data as it is entered (or at least as soon as the user tabs out of the field). Others only check data when the form is submitted.
1.3 Study Build
When an EDC system is setup, typically a set of definitions are prepared - forms, fields, edit checks, visits, constraints etc. They are tested and then release as a version of 'metadata'. This metadata is then used to 'drive' the product. When a patient is added, the metadata is examined to determine what visits are shown, what forms appear in each visit and the fields/edit checks & contraints that apply for each form. Many EDC firms pride themselves on their speed of build performance. However it is as much about determining requirements, and testing the implementation addresses real requirements where time is spent.
2. Product
2.1 Edit check engine limitations
Some EDC tools simply produce too many queries due to the difficulty in differentiating between incorrect data and missing data. The best method of handling this is for an EDC product to handle missing data differently than incorrect data. Edit checks should not handle missing data. Other 'incomplete' indicators supporting by batch handling do this better.
Related to the above, good edit check implementations default to firing edit checks when data is recorded rather than in empty data. Checkbox overrides cater for situations where blanks should be assessed.
2.2 Protocol Updates / Changes / Migration Management
This single capability tends to be the differentiator between good and bad EDC systems. How do they handle changes, or rather, how do they handle the deployment of new versions of an eCRF study definition against pre-existing data. What should happen is that it is possible to change the eCRF and for these changes to be applicable to new and potentially existing patients. Some systems handle the deployment of different versions of an eCRF per country, site and patients, though ideally the functionality within the EDC solution allow for the control of changes within an eCRF for all sites/patients.
The poorest systems re-fire all edit checks - raising many new queries. The better systems perform difference analysis and only re-fire edit checks when a change is detected. This can be complex and even impossible if the system is not architected correctly.
2.3 Locks and Signatures
Strictly speaking this is on specific to queries and edit checking, but it is a problem area when deploying changes. The system needs to be clever around when forms & fields may become unlocked or unsigned as a result of a change. Inappropriate re-signing of patients as a result of a minor change can be very frustrating for PI's.
2.4 Delayed data submission
One method of reducing the number of unnecessary queries is to allow data to be 'saved' but not 'submitted'. This is a feature of Veeva EDC and some older EDC solutions. There is a case for this. Rather than firing off queries as a result of partially completed data, the data can be saved without submission. I see the potential value, and, Veeva's UX handling of this is pretty good.
However, I think this complicates the data flow. If a product has missing data handling separated from edit checking, then in my experience this artificial delay of Saved but not Submitted is not required. I prefer full multi-variate cross form edit checking that runs as soon as data is entered.
3. Implementation
3.1 Insufficient dynamic/constraints
Some EDC products supports what are called either 'Constraints' or 'Dynamics'. In some products they are boolean expressions that run in a similar fashion to edit checks except rather than raising a query, they instead hide/show or disable/enable a field, section, form, visit or group of visits.
The better implementations tend to reduce the required number of edit checks by around 50%. The edit checks are not required because the user will never reach the fields that might otherwise fire an edit check and raise a query.
A simple pseudo example;
constrain <hide> <PregTest> if <gender != "F">
The Pregnancy Test question would be hidden if the subject is not Female.
On a side note here, I have seen difficulties in the design / specification of dynamics when working with an offshore resourcing model. Edit checks are more straightforward, but it can be difficult for a sponsor or requirement owner to describe dynamics adequately. An interactive or conference room design/build process generally works better for this.
3.2 Too Many Edit Checks
I have heard countless statements like '70% of edit checks never raise a query'. My response tends to be great - they weren't needed... but if you took away these 70% would the data be as good...? It might only take 2-3 minutes to interactively create an edit check so maybe the cost of these are negligible. If the EDC systems slows down due to them, it is a poor EDC system.
That said, sometimes we do have too many edit checks that raise too many queries. In my humble experience, this is poor eCRF design often where the effective use of dynamic/constraints would have avoided to situation.
Last point on this - risk based / end-point considered edit checking should be considered. A few systems, some supported by AI are emerging that can tell us where edit checking is most needed. Eventually it will actually apply this to automate edit checking.
3.3 Incomplete study deployments
One of the most common reasons for the generation of a large number of queries is a phased EDC implementation. Sometimes there is insufficient time between the protocol being finalized and the date the EDC system needs to be ready for First Patient In. A workaround is to deploy the EDC configuration (also called the eCRF) with no, or a limited number of edit checks. Some time later once the eCRF is complete, it is deployed therefore causing the new edit checks to fire against the
3.4 Incorrect change handling
Even if an EDC system is good a handling protocol updates and eCRF amendments, sometimes the implementation teams carry out the migration in the wrong way. This may be down to inexperience, poor training or time pressures. All changes should be tested with new patient data and pre-existing patients for all change scenarios. Unexpected query generation should be handled.
4. The future of Query management and edit checking
I wholly believe that the way we check data will be massively impacted through the application of AI. Initially it will be 3rd party systems, but eventually these will be embedded into general data capture platforms. In the meantime, applying the right tools using the right methods will achieve good data and happy sites.
A great take on balancing data cleaning with reducing unnecessary queries. Smart eCRF design, dynamic constraints, and thorough testing during amendments are really important in improving workflows and site satisfaction. Also, AI-driven solutions will only enhance this further!
Medidata Rave Certified Clinical Database Programmer|Technical Designer|Viedoc|Ex-Novartis| Ex-Zifo
1 周Thanks for this post. Good to always hear from the end user sides. While designing edc studies, we have seen sponsors reluctance in dynamically populating a question inside a CRF. I don't know how true it is for a site person, from what we heard we perceived that something popping up based on a response is not preferable for site user. This is one of the major reasons for creating many edits within a CRF. From the edc design aspect, we make sure not more than one query fires on a question for a single scenario. Also, reconciling data within the edc also mandates writing cross CRF queries which also accounts for multiple queries. When we write and program a query, we think that it would aid in clean data collection and help sites capturing accurate, reliable data.