Hilite and Find in Tables in KNIME
In this article I am going to cover two features that I really feel like I could be using more to make my life easier as a KNIME user, I think they can help you too!
For this article I am using KNIME 4.2.1 on macOS Catalina and a movie titles dataset from IMDB found here. I also mention a little bit about Regular Expressions, to learn more about them I highly recommend this course.
These features are accessible when you view tables in KNIME and they allow for some great control and interactivity with your data while saving you adding additional and unnecessary nodes to your workflow.
The first is the Find feature.
After reading in the dataset I wanted to find my favorite move, The Matrix in the list. Previously I may have dropped some kind of filter node into my workflow and searched that way, but I really just want to see where it is in the table view of the dataset.
Right click on the executed File Reader node and select File Table. In the menu bar at the top click Navigation -> Find. You can also invoke this by pressing ?+F in macOS or Ctrl+F in Windows.
You will see a few options around where to search and ways you can search. I just want to search the data so I uncheck the Search RowID and Column Names options. I enter the words 'The Matrix' as my search string and click OK. The table is searched, and the resulting matching cell is selected (filled with a darker color than the rest of the table showing the result).
Now you may have noticed that your search window disappeared the second you clicked OK to start the search. There may be more results for 'The Matrix' in this dataset and to iterate through them, simply press F3. As you press F3, each cell that meets the search criteria will be selected.
Now for something with more title variation. Let's search for 'Star Wars' and write a regular expression as the search string.
Here is the Regular Expression I wrote: '.*Star Wars.*'
For this example, I have written a very basic search string and de-selected 'Case Sensitive' to maximize my search results. As I press F3, each title that contains Star Wars in the dataset is found and I can take a look at the resulting selected cells.
Now, say that I am really interested in just 'Star Wars: Episode IV - A New Hope' and 'Star Wars: Episode V - The Empire Strikes Back' but want to filter my dataset to all of Star Wars and also keep track of Episode IV and V.
One way to do that is through highlighting. You may notice as you configure nodes that there is an option to enabling highlighting (a good example is in the Joiner node's setting). This comes in really handy when you want to spot check your work to ensure expressions for instance are having the desired effect.
In the menu next to Navigation you will see Hilite. I select the cell I want to highlight and from the Hilite dropdown select Hilite Selected. Pressing F3 takes me to the next Star Wars entry which is Episode V and I highlight it in the same way. I close that window, and using the same Regular Expression in the Row Filter node, I filter my dataset to just Star Wars titles.
When I look at the output table of the Row Filter node, I have 3034 results. While my highlighted results are on top this may not always be the case. To only see the highlighted cells, select Hilite from the menu bar, then Filter -> Show Hilited Only. This isolates the data to my highlighted Episode IV and V entries.
As you work through datasets, you may want to highlight more cells. Just open the output and highlight the cells, you will should not need to re-execute your nodes. To clear all highlighting select Hilite -> Clear Hilite.
Additionally, you will find some other really helpful nodes that deal specifically with highlighted data. I highly encourage you to take a look at them!
These are two powerful features with countless applications. In very large datasets where you are applying text replacements, expressions, or even math and date manipulations you may have specific cells you want to spot check. Leveraging highlighting not only allows you to have multiple cells highlighted that will stay highlighted throughout your workflow (just make sure you enable it in nodes that give you the choice), but also lets you filter down to only those highlighted rows without the addition of another node to check your work.
I hope this was informative! If you have questions or want to see other specific examples of KNIME in action please let me know!
As always, download KNIME here and happy KNIME-ing!
Senior Research Specialist, Computational Chemistry at SciLifeLab, Karolinska Institute
4 年Thx, very useful!