SAS Visual Text Analytics
Text Mining (Text Analytics)
Text mining is the process of exploring and analyzing large amounts of unstructured text data aided by software that can identify concepts, patterns, topics, keywords, and other attributes in the data. It's also known as text analytics, although some people distinguish the two terms; in that view, text analytics is an application enabled by the use of text mining techniques to sort through data sets. Text mining has become more practical for data scientists and other users due to the development of big data platforms and deep learning algorithms that can analyze massive sets of unstructured data. Mining and analyzing text help organizations find potentially valuable business insights in corporate documents, customer emails, call center logs, verbatim survey comments, social network posts, medical records, and other sources of text-based data. Increasingly, text mining capabilities are also being incorporated into AI chatbots and virtual agents that companies deploy to provide automated responses to customers as part of their marketing, sales, and customer service operations.
SAS Visual Text Analytics
SAS Visual Text Analytics in SAS Viya is a web-based text analytics application that uses context to provide a comprehensive solution to the challenge of identifying and categorizing key textual data. In SAS Visual Text Analytics, there are only a few below analysis nodes to build and automate models (based on data and that can be in any format structured or non-structured may be a news feed with fully unscrambled feeds):
- Concepts:- Enables you to extract predefined concepts or create additional custom concepts that you can discover in a document or set of documents
- Text Parsing:- Finds all the terms that are in your document collection
- Sentiment:- Determines whether documents express positive, neutral, or negative attitudes
- Topics:- Groups similar documents in a collection into related themes (example motorcycle accidents, computer graphics, or weather patterns)
- Categories :- Labels documents based on their content (example “motorcycle + accident + dead†or “weather + heavy + rainâ€)
You can then customize your models to realize the value of your text-based data
SAS Visual Text Analytics in SAS Viya combines the visual programming flow of SAS Text Miner with the rules-based methods of categorization and concept extraction in SAS Contextual Analysis. These capabilities, along with document-level scoring for each component, are combined in a single user interface. Using SAS Visual Text Analytics in SAS Viya, you can identify key textual data in your document collections, build concept and categorization models, and remove meaningless textual data by default, words that provide little or no informational value (stop words) are excluded from the topic analysis. A default stop list is included and automatically applied for several languages. Examples of these words in English include the articles a, an, and conjunctions such as and, or, and but. Other terms that are specific to your document collection but provide little or no value due to their low frequency are also identified and excluded.
EXAMPLE
The best example I can show you is to get to know from the news or to create a survey of pathology lab “who are suffering from the disease of tumor in the liverâ€
SAMPLE TEXT: “The liver parenchyma is compressed by a cream-colored 3.5cm tumor. This liver tumor is solitary with well-defined margins.â€
CONCEPT DEFINITION RULES
CONCEPT_RULE:(SENT,â€_c{liver}â€,†tumorâ€)s
C_CONCEPT:_c{liver} tumor
RESULTS
The power of analytics
If I want to create a report that shows data derived from entity extraction of surgical pathology reports from output table of SAS VISUAL TEXT ANALYTICS for a survey of a pathology lab, in a year how many patients get tested over here and their report includes tumors in different body site. Data are filtered by selecting one of the bars in the horizontal bar graph. The example illustrates data from malignant tumors.
This example shows how SAS Visual Analytics can be used to explore relationships between tumor diagnoses and affected tissues. In this figure, tumor behavior is color-coded. Red nodes are associated with a diagnosis of adenocarcinoma, osteosarcoma, or fibrosarcoma (malignant behavior). Blue nodes are associated with a diagnosis of mast cell tumor (uncertain behavior). Yellow nodes are associated with both types of behavior. Arrow thickness represents relative tumor size.
VISUALIZATIONS
SAS Visual Analytics provides an excellent reporting platform for exploring and visualizing data extracted from medical reports. The capabilities of the SAS Visual Analytics software are beyond the scope of this article, but below we have included the example of a report that might be used to understand data derived using entity extraction. The figure below is a report driven by the bar graph, demonstrating the frequency of tumor diagnoses segmented by behavior. Selecting one of the bars filters the remaining graphics to reveal data about tumors with either benign, malignant, or uncertain behavior. In this context, uncertain behavior refers to tumors with the potential to transform from benign to malignant. Data associated with the malignant category are shown in Figure
AUTOMATING THE DOCUMENT SCORING PROCESS
SAS visual text analysis Studio provides the means for users to create concepts and their associated definition rules. Once the rules are compiled, the project is uploaded to a server that scores the detailed description of text documents via SAS visual text analytics. Visual text analytics results are output as a tab-delimited flat file or proper data in tabular format. We used the SAS Data Integration ETL process that automated scoring and improved overall performance.
CONCLUSION
The purpose of the Health Outcomes Analysis Text Analytics Project was to recover key elements of unstructured medical records and convert them to a structured format for data analysis. We accomplished this by linking extracted text to standardized terms that are aggregated or associated with similar or synonymous findings. We used SAS visual text analytics to perform named entity extraction after creating custom concept definition rules. With the aid of subject matter experts and SAS visual text analytics, the Health Outcomes Analysis Text Analytics Project achieved a high level of precision and recall when tested on multiple types of pathology reports.