登录查看更多内容

Xceptor Developer Interview | Q & A | Volume - 3

Shivaram Nomula

RPA/IPA Developer-Consultant |Freelancer|Xceptor|BluePrism|Alteryx|Duco|Power Automate|Tableau|Knime|AutomationAnywhere|ETL|I build workflow processes using business process automation tools to automate repetitive tasks

发布日期: 2024年10月27日

+ 关注

#021

In-Memory and In-Database Processing

Xceptor provides two methods of executing Input Formats:

In-memory processing executes Input Formats in memory on the application server.
In-database processing executes Input Formats within the database.

Although in-memory and in-database processing are broadly the same, they will produce different results in certain cases.

In-Memory vs. In-Database Processing:

Key Differences:

- Text Data Handling:

- Case Sensitivity: In-database mode typically uses case-insensitive collation, treating uppercase and lowercase values as equal. In-memory mode is case-sensitive.

- Trailing Spaces: In-database mode ignores trailing spaces, while in-memory mode considers them part of the value.

- Sorting Data: In-memory processing sorts output based on the first occurrence of each group value. In-database processing may result in SQL Server sorting data in ascending order, but this is not guaranteed.

- Data Types:

- Numeric Data: In-database uses SQL Server’s decimal(28,10), while in-memory uses .NET’s decimal, which supports more significant digits.

- Date/Time Data: SQL Server’s datetime type and .NET’s DateTime type have different storage limits.

- Data Type Conversions: Automatic type conversions between non-string types may differ in formatting between in-database and in-memory modes.

Enrichments and Functions:

- NOW Calculation: In-database, this value is consistent across rows; in-memory, it may vary slightly for each row.

- Function Limitations: Some functions are exclusive to either in-memory or in-database modes.

#022

The <dataLoad> element is used to specify how input files are loaded into the database when using a data format.

Insert is the standard database load technique.

However, it is suggested to use 'auto' as the default method for <dataLoad> element which performs an insert with fewer than 100 rows; otherwise uses bulk copy.

#023

Xceptor Managed Translation Tables

When the Translation Table's data source is managed by Xceptor, it means that the data for the translation table is stored in a generic database structure, specifically the 'XCTranslationTableEntry' table. This structure is flexible and can accommodate all the data for Translation Tables managed by Xceptor. While this makes configuration simple, the trade-off is that the underlying database structure is not optimized for performance based on the specific translation table structure.

In their analysis of the most expensive database queries in a production environment, Xceptor frequently found that the XCTranslationTableEntry table was responsible for many of the worst-performing queries.

Large or poorly configured Translation Tables are a commonly found source of performance bottlenecks.

A significant performance benefit is available by replacing the Xceptor-managed Translation Table with a Data Set. Data Sets have the advantage of storing their data in an associated database table, allowing performance optimization through the addition of appropriate indexes to the database table.

Large Data Sets

Data Sets are the component that Xceptor uses to store data in database tables. They take the structure of an Xceptor Data Format and map it onto an underlying database table. This allows for optimization, enabling efficient data storage and retrieval. The types of database columns can match the underlying Data Format, and indexes can be created on the table to enhance search efficiency.

OCR Analysis

The OCR integration allows Xceptor to extract the text from the images and then submit the text for processing as per normal documents.

Install options with Xceptor:

Web OCR - Use-cases where there is heavy use of OCR, to allow OCR component to run on its own dedicated server.
Direct Integration - This set up is recommended for simple Xceptor environments with only one web application server.

The documents requiring OCR analysis are highly CPU-intensive, taking approximately 5 to 10 seconds per page per CPU core to process. For this reason, it is recommended to isolate OCR processing on dedicated server hardware in instances where a large number of OCR documents are received.

Mapping

Data Mapping is the process by which you connect the captured fields from the Input Format to the correct target fields in the Internal Format.

This allows you to move all appropriate data into a single Data Format for ease of manipulation and output.

#024

Xceptor has three key timeout settings that need to be configured to ensure that timeout failures occur in the correct order, allowing processes to run effectively and as expected.

The timeouts should cascade as follows:

Message Hub serviceTimeout > Web Application executionTimeout > Database commandTimeout.

Database: commandTimeout - This timeout indicates how long, in seconds, any given SQL command is allowed to run before it is terminated.
Web Application: executionTimeout - This timeout specifies the maximum number of seconds that a request is allowed to execute before being terminated.
Message Hub: serviceTimeout - This timeout indicates how long the Message Hub will wait for a message to be processed.

In a given scenario, assuming that the key timeouts have been configured in the correct order, the reason for terminating a running process in Xceptor could be any of these three timeouts; the answer would depend on the context of the process being referred to:

commandTimeout - If the process involves SQL commands, the answer would be command Timeout.

executionTimeout - If it involves a web application request, it would be execution Timeout.

serviceTimeout - If it involves message processing, it would be service Timeout.

However, the most direct answer related to terminating a running process in the Xceptor application is execution Timeout, as it specifically pertains to processes running within the web application.

#025

Xceptor REST Connectivity offers no-code integration of RESTful API services between Xceptor processes and external systems. It comprises two main features:

1. The REST Integration Framework: This allows Xceptor processes to call external REST APIs provided by other systems.

2. The REST Gateway: This enables other systems to call Xceptor's own REST API through the Xceptor gateway.

The REST Integration Framework provides no-code integration of RESTful API services exposed by external applications into Xceptor processes.

The Xceptor workflow is built using Process Orchestration rather than Message Processors.

#026

For the translation tables managed by Xceptor as the data source, Xceptor stores their data in a generic database table, namely 'XCTranslationTableEntry.' This table has a default of 100 columns, so Xceptor does not allow the creation of a translation table with more than 100 fields.

#027

Dashboards - User Configurable Dashboards, commonly abbreviated as Dashboards, allow users to create and view summary screens that contain the data that matters most to them, typically presented as charts and tables.

For Xceptor versions earlier than 4.13.10, users were required to update certain back-end configurations to create dashboards. However, starting with version 4.13.10, Xceptor introduced a UI component that allows users to implement dashboards without the need for back-end coding.

#028

Firstly, create an input channel for the shared drive folder path, ensuring that the folder has subfolders named ERRORS and PROCESSED. Use this as the input channel, and the action "Deliver Input File" can be used to generate the output files.

Deliver Input File

Delivers the original input file. Additionally, rules and logic can be used to create a dynamic filename or a dynamic delivery location.

#029

Data Change Events are used to trigger an event on a Data Item. A Data Change Event can occur when a user alters data in the Data Item through manual updates on the front end.

In Xceptor, a Data Change Event can be triggered in the following ways:

Via Back-end: Events can be initiated programmatically through back-end configurations.
Frontend Manual Updation: Users can trigger events by manually updating data through the user interface.
Auto Update by Another Process: Events can also be triggered automatically when another process updates the data item via back-end coding configuration.

Based on this, the correct answer would be "All of the Above," since all these methods can potentially trigger a Data Change Event in Xceptor. However, since we don't have the option for "All of the Above," it's worth noting that "Via Back-end" and "Auto Update by Another Process" involve extending the default functions provided by Xceptor, as they require back-end changes.

If you had to choose one, "Frontend Manual Updation" is the most direct way to trigger a Data Change Event.

If custom Data Change Event functionality is required, a new implementation of IDataChangeEventRunner should be created.

#030

Download Sites

Download Sites are used to capture website information or access files from a local file system or an FTP server.

Adding a Download Site prompts Xceptor to download data from a specified location at specified times. This data can then be used in a Message Processor/Process via a Web Channel.

Download Sites can access websites (https://, https://), FTP servers (ftp://, ftps://, sftp://), or the local file system (file://) to download files or perform web scraping.

Download Sites can be triggered to retrieve input from a specified location on a scheduled basis.

Each Download Site is assigned a specific time at regular intervals, along with a set number of retries. This allows the system to re-run the data capture if the initial attempt is unsuccessful.

#031

Message Hub handles connections to the input channels, picking up messages and passing them to the Xceptor application. It waits for a response from the Xceptor application regarding the result of the process before moving the message to either Errors or Processed.

#032

Web Application: executionTimeout - This timeout specifies the maximum number of seconds that a request is allowed to execute before being terminated.

#033

The sheet name of an output format can be set to a field by surrounding the field reference [FieldName] with curly braces, like this: {[FieldName]}.

Xceptor will then take the first row result and use it as the sheet name. In the case of a filter, the first result from the filter will populate the sheet name.

#034

No, an Excel file with the required highlight formatting must be used as a template to emphasize cells in the output.

#035

No, Xceptor won't have an issue if the input has multiple columns with the same name; it updates the captured field names with a numerical suffix based on their sequence.

For example, if there are multiple columns with "Field" as their field name, Xceptor captures them and updates their field names to Field1, Field2, and so on. See the attached screenshots below for your reference.

#036

Validation is an Enrichment type that uses rule-based checks on Data Items to ensure they conform to specific data requirements.

Validation Enrichment is used to report a customized error notification in the Input Activity based on business logic.

When to Use Validation Enrichment:

To reject an input if it meets or does not meet certain conditions.
To mark Data Items with actions required to help resolve errors.

The Error Message is used to enter an error notification message that will be displayed when data fails validation based on the Selector Type, which determines whether the data identified by the selector is valid or invalid.

#037

Download Sites use the credentials provided in the Username and Password fields to log into sites that require network-level security, where a dialog box prompts for credentials before any content is displayed.

#038

There are two optional tags (namely activeFrom and activeTo), that can be defined when configuring an input channel to ensure it processes inputs only during a specific time period.

activeFrom: Defines the time of day from which the channel is active
activeTo: Defines the time of day until which the channel is active.

#039

The execution of processing rules in a scenario where multiple message processors have received multiple files is influenced by the priority assigned to each rule. The key points to consider are:

1. Priority Levels: Each processing rule within a message processor is assigned a priority value. The higher the value, the higher the priority of that rule. For instance, a rule with priority 10 will execute before a rule with priority 1.

2. Message Processor Hierarchy: When an input arrives, the message processors are evaluated based on their highest priority rule. The highest priority rule within a Message Processor determines the priority of that Message Processor, along with all its associated processing rules.

3. Execution Order:

If multiple message processors receive the same input, the one with the highest priority will process its rules first.
After the rules of that message processor are executed, the next highest priority message processor will run its rules, and so on.

4. Multiple Files: If multiple files are being processed by different message processors, the execution order still depends on the highest priority rules across those processors.

For example, if Message Processor A has a priority 10 rule and Message Processor B has a priority 5 rule, then Processor A will execute its rules first for any file it handles, regardless of other processors.

5. Rule Selection Options: Additionally, if a message processor is configured to run the "highest applicable" rule, it will select the rule with the highest priority that matches the input criteria.

Given these points, the answer to whether priority in processing rules affects the process run when multiple message processors have received multiple files is:

Depends on Priorities Across

This means that the specific priorities assigned will determine the order of execution, which may vary based on how those priorities are set across the different message processors handling the inputs.

#040

The METADATA() function retrieves values for metadata stored within fields. This function is only available for Document Input Formats and can be used when running the format in memory.

To get the last page number of a PDF document:

MAXVALUE (GETNUMERICMETADATA([FieldName],"Page"))+1

Page: Page number from which the text in a field is extracted. Page number values start from 0; for the rows extracted from page 4, the METADATA() function returns 3.
MAXVALUE: Returns the maximum value of a field across all rows. Return type string.

Click on the links below to check out the previous installments of the "Xceptor Developer Interview | Q & A" series:

Xceptor Developer Interview | Q & A | Part - 1

Xceptor Developer Interview | Q & A | Part - 2

Michael McCauley

4 个月

Very useful

Vijayaramaraju V

Senior