When Excel Isn't Enough...
This is the application dependency network diagram generated by the supplied script...

When Excel Isn't Enough...

Everyone uses Excel, some better than others. I've written more VBA than I care to recall, and I learned a lot about the Microsoft productivity suite application models. I still use Excel for a lot of things, but sometimes it just isn’t enough.

When I need greater control, interactivity, visualization, more complex statistical analysis, repeatability, or just better ways to explore data, then I use "R".

If you aren't familiar with it:

"R” is widely used in data science by statisticians and data miners for data analysis and the development of statistical software. R is one of the most comprehensive statistical programming languages available, capable of handling everything from data manipulation and visualization to statistical analysis.

Source: https://builtin.com/data-science/r

"R" can be difficult to understand/learn whether you're new to programming or experienced in a procedural or object-oriented language.

While you can write procedural "R", if you're doing that in most cases then you're not leveraging its full power. It is very powerful and takes very few instructions once you start to utilize its ability to manipulate matrices.

For those interested in learning to use R, I’m going to introduce some “R” basics while giving you a very simple but useful tool to explore application dependencies.

Application Dependency Network Diagram

You can do something similar with just tables of data with filters, etc. but I find a graphical interface like this to be much better for me and the way I like to process information.

First, you need data.? You can’t do anything without it.

I have defined two entities: 1) Application 2) Dependency

Application

Application as I chose to define it for this demo has the following attributes:

ID, Name, Purpose, Hosting Type, Arch Type, Programming Language, Data Store Type, Alternate Name, Acronym, Bus Owner, Tech Owner, Arch Owner

For our simple network diagram, we’ll only use ID and Name to represent the NODES.

Dependency

Dependency as I chose to define it for his demo has the following attributes.

ProducerAppID, ConsumerAppID, IntegrationType

Naturally the ProducerAppID and ConsumerAppID are foreign keys to the primary key in our Application list.

These represent our EDGDES.

Test Data

I didn’t have the datasets I needed and I didn’t want to spend the time needed to manually create “realistic” data, so I had ChatGPT 3.5 do it for me. ??

Note: I did not include the CSVs I used, so you'll need to create or generate your own. Update: I changed my mind and embedded my test files here, but you should play with ChatGPT to create test data.

ChatGPT Prompt #1:

"Generate a CSV list of 50 fictitious business applications at an insurance company.? Each application should have the following fictitious attributes ID, Name, Purpose, Hosting Type, Arch Type, Programming Language, Data Store Type, Alternate Name, Acronym, Bus Owner, Tech Owner, and Arch Owner."

It did a pretty good job, but it used roles instead of a person’s name for the various owner columns.? No problem…

ChatGPT Prompt #2:

"This is great but can you provide pretend people names for the owners?? Also, don't use the same name for each owner type."

Perfect. Now I need the dependency list based on the application list ChatGPT just generated. Save it as applications.csv in your R project directory.

ID,Name,Purpose,Hosting Type,Arch Type,Programming Language,Data Store Type,Alternate Name,Acronym,Bus Owner,Tech Owner,Arch Owner
1,PolicyManager,Manage insurance policies,Cloud,Microservices,Java,Relational Database,PolicyMgmt,PM,John Thompson,Sarah Rodriguez,David Mitchell
2,ClaimsProcessor,Process insurance claims,On-premises,Monolithic,C#,NoSQL,ClaimProcessor,CP,Emily Wilson,Michael Patel,Alice Turner
3,UnderwritingTool,Assess insurance risks,Cloud,Serverless,Python,Document Database,UnderwriteTool,UT,Brian Davis,Megan Taylor,Chris Parker
4,CustomerPortal,Provide online customer services,Cloud,SPA (Single Page Application),JavaScript,Key-Value Store,CustomerServicePortal,CSP,Olivia White,Christopher Hill,Emma Martinez
5,AgentDashboard,Support for insurance agents,On-premises,Client-Server,.NET,Relational Database,AgentTool,AT,Alexander Lee,Grace Carter,Liam Rodriguez
6,FinanceAnalyzer,Analyze financial data,Cloud,Data Warehouse,SQL,Data Warehouse,FinanceAnalysisTool,FAT,Chloe Wright,James Robinson,Ava Stewart
7,ComplianceTracker,Track regulatory compliance,Cloud,Event-Driven Architecture,Java,Kafka,RegulatoryCompliance,RC,Isaac Phillips,Ella Adams,Logan Turner
8,QuoteGenerator,Generate insurance quotes,On-premises,Monolithic,Python,Relational Database,QuoteGen,QG,Samantha Harris,Nathan Foster,Abigail Taylor
9,CommunicationHub,Facilitate internal communication,Cloud,Microservices,Node.js,Message Queue,InternalCommsHub,ICH,David Turner,Grace Wilson,Liam Parker
10,PolicyAnalytics,Perform policy analytics,Cloud,Big Data,Scala,Hadoop,PolicyInsights,PI,Aria Hill,Benjamin Stewart,Victoria Turner
11,ClientOnboarding,Onboard new clients,On-premises,Client-Server,Java,Relational Database,OnboardClient,OC,Ethan Adams,Lily White,Logan Mitchell
12,RiskAssessmentTool,Assess insurance risks,Cloud,Microservices,C#,NoSQL,RiskAnalyzer,RA,Aiden Turner,Scarlett Martinez,Mason Robinson
13,MobileClaimsApp,Submit claims via mobile,Cloud,Mobile App,Swift,SQLite,MobileClaims,MC,Mia Parker,Elijah Foster,Emily Thompson
14,PolicyRenewal,Manage policy renewals,On-premises,Monolithic,Python,Relational Database,RenewalManager,RM,Noah Wilson,Ava Robinson,Oliver Mitchell
15,DocumentRepository,Store policy documents,Cloud,Distributed File System,Java,Document Database,DocRepo,DR,Emma Harris,Lucas Turner,Lily Mitchell
16,CustomerAnalytics,Analyzing customer data,Cloud,Big Data,Scala,Hadoop,CustomerInsights,CI,Aiden Foster,Chloe Taylor,Benjamin Turner
17,QuoteComparisonTool,Compare insurance quotes,On-premises,Client-Server,C#,Relational Database,QuoteComparator,QCT,Logan Taylor,Abigail Mitchell,Aria Robinson
18,ClaimsAnalytics,Analyze claims data,Cloud,Big Data,Python,Hadoop,ClaimsInsights,CI,Mason Robinson,Aria Adams,Chloe Foster
19,WorkflowAutomation,Automate internal workflows,Cloud,Workflow Engine,JavaScript,Document Database,WorkflowAuto,WFA,Ava Turner,Ethan Harris,Grace Parker
20,SecurityAuditTool,Conduct security audits,On-premises,Client-Server,Java,Relational Database,SecurityAudit,SA,Oliver Turner,Scarlett Foster,Liam Thompson
21,PolicyComparisonApp,Compare insurance policies,Cloud,SPA (Single Page Application),JavaScript,Key-Value Store,PolicyComparator,PCA,Lily Adams,Logan Thompson,Emma Robinson
22,AgentManagementSystem,Manage insurance agents,On-premises,Client-Server,.NET,Relational Database,AgentManager,AMS,Lucas Turner,Scarlett Adams,Ethan Robinson
23,DataQualityChecker,Check data quality,Cloud,Microservices,Python,NoSQL,DataQualityCheck,DQC,Sarah Taylor,Alexander Foster,Grace Adams
24,ClientFeedbackApp,Collect client feedback,On-premises,Client-Server,Java,Relational Database,FeedbackApp,FA,Nathan Turner,Aria Martinez,Ella Foster
25,PolicyApprovalSystem,Approve insurance policies,Cloud,Workflow Engine,C#,Document Database,PolicyApproval,PA,Megan Robinson,Noah Adams,Christopher Foster
26,QuoteRequestPortal,Request insurance quotes,On-premises,Monolithic,JavaScript,Relational Database,QuotePortal,QRP,Victoria Turner,Lucas Adams,Logan Foster
27,CommunicationGateway,Integrate communication channels,Cloud,Message Broker,Node.js,Message Queue,CommGateway,CG,Benjamin Mitchell,Emma Turner,Liam Harris
28,PolicyCancellationTool,Cancel insurance policies,On-premises,Client-Server,Java,Relational Database,PolicyCancel,PC,Ella Adams,Christopher Robinson,Aiden Thompson
29,ClaimApprovalSystem,Approve insurance claims,Cloud,Workflow Engine,C#,Document Database,ClaimApproval,CA,James Turner,Samantha Adams,Victoria Harris
30,DataVisualizationTool,Visualize insurance data,On-premises,Client-Server,JavaScript,Data Warehouse,DataVizTool,DVT,Grace Robinson,Oliver Adams,Mia Thompson
31,PolicyQuoteCalculator,Calculate insurance quotes,Cloud,Serverless,Python,NoSQL,QuoteCalculator,QC,Elijah Thompson,Sarah Foster,Lily Robinson
32,ClientNotificationApp,Send client notifications,On-premises,Client-Server,.NET,Relational Database,ClientNotify,CNA,Mason Foster,Ava Adams,Benjamin Thompson
33,RegulatoryReporting,Generate regulatory reports,Cloud,Microservices,Java,Kafka,RegReportTool,RRT,Liam Harris,Aiden Adams,Scarlett Robinson
34,PolicyRenewalReminder,Send policy renewal reminders,On-premises,Client-Server,JavaScript,Relational Database,RenewalReminder,RR,Victoria Adams,Ella Robinson,Ethan Harris
35,DataBackupSystem,Backup and restore data,Cloud,Distributed File System,Python,Document Database,BackupSystem,BS,Christopher Foster,Mia Adams,Aria Thompson
36,AgentTrainingPortal,Provide training for agents,On-premises,Client-Server,Java,Relational Database,TrainingPortal,TP,Ava Robinson,Logan Adams,Samantha Harris
37,PolicyExpiryTracker,Track policy expirations,Cloud,Serverless,JavaScript,NoSQL,ExpiryTracker,ET,Benjamin Thompson,Chloe Adams,Lucas Robinson
38,MobileUnderwritingApp,Underwrite policies via mobile,On-premises,Mobile App,C#,SQLite,MobileUnderwrite,MU,Mia Harris,James Adams,Aria Foster
39,DataIntegrationHub,Integrate data from various sources,Cloud,Message Broker,Python,Message Queue,DataHub,DH,Sarah Adams,Liam Robinson,Emma Foster
40,ComplaintsManagement,Manage customer complaints,On-premises,Monolithic,Java,Relational Database,ComplaintsManager,CM,Logan Harris,Ella Adams,Christopher Robinson
41,PolicyCancellationPortal,Cancel policies online,Cloud,SPA (Single Page Application),JavaScript,Key-Value Store,CancelPortal,CP,Ella Robinson,Aiden Adams,Victoria Foster
42,CustomerFeedbackAnalyzer,Analyze customer feedback,On-premises,Client-Server,C#,Relational Database,FeedbackAnalyzer,FA,Scarlett Harris,Liam Adams,Aria Robinson
43,DataGovernanceTool,Enforce data governance policies,Cloud,Microservices,Java,NoSQL,GovernanceTool,GT,Ethan Foster,Samantha Adams,Logan Harris
44,AgentPerformanceTracker,Track agent performance,On-premises,Client-Server,JavaScript,Relational Database,PerformanceTracker,PT,Lucas Robinson,Aria Adams,Benjamin Harris
45,PolicyDocumentGenerator,Generate policy documents,Cloud,Serverless,Python,Document Database,DocumentGen,DG,Ava Harris,Ethan Adams,Sarah Robinson
46,CommunicationAnalysisTool,Analyze communication patterns,On-premises,Monolithic,C#,Data Warehouse,CommAnalysisTool,CAT,Scarlett Robinson,Liam Adams,James Harris
47,ClaimSettlementSystem,Settle insurance claims,Cloud,Workflow Engine,Java,Document Database,ClaimSettle,CS,Liam Harris,Emma Adams,Ella Robinson
48,PolicyRecommendationEngine,Provide policy recommendations,On-premises,Client-Server,JavaScript,Relational Database,PolicyRecEngine,PRE,Ella Robinson,Logan Adams,Benjamin Harris
49,DataPrivacyCompliance,Ensure data privacy compliance,Cloud,Event-Driven Architecture,Python,Kafka,PrivacyCompliance,PC,Christopher Harris,Aria Adams,Samantha Robinson
50,QuoteApprovalSystem,Approve insurance quotes,On-premises,Monolithic,C#,Relational Database,QuoteApproval,QAS,Emma Robinson,Logan Adams,Victoria Harris
        

ChatGPT Prompt #3:

"Generate another csv using the IDs from the previous one to reflect application dependencies.? The new file will have the columns: ProducerAppID, ConsumerAppID, IntegrationType. Each application from the first list should have 0-5 dependencies with other applications in the first list.? Sometimes the application will be a consumer and sometimes a producer."

I was a little worried it wouldn’t “get it”, but it did!

Save this one to a file named dependencies.csv in your R project directory.

ProducerAppID,ConsumerAppID,IntegrationType
1,2,REST
1,3,MessageQueue
1,6,Database
2,4,REST
2,5,Database
2,7,MessageQueue
3,8,REST
3,10,MessageQueue
4,9,REST
4,11,Database
4,14,MessageQueue
5,13,REST
5,15,Database
5,16,MessageQueue
6,17,REST
6,19,Database
6,20,MessageQueue
7,21,REST
7,23,Database
7,26,MessageQueue
8,27,REST
8,29,Database
8,30,MessageQueue
9,31,REST
9,33,Database
9,34,MessageQueue
10,35,REST
10,37,Database
10,38,MessageQueue
11,39,REST
11,40,Database
11,42,MessageQueue
12,43,REST
12,44,Database
12,45,MessageQueue
13,46,REST
13,47,Database
13,49,MessageQueue
14,50,REST
14,1,Database
14,3,MessageQueue
15,5,REST
15,6,Database
15,7,MessageQueue
16,8,REST
16,9,Database
16,11,MessageQueue
17,12,REST
17,13,Database
17,14,MessageQueue
18,15,REST
18,16,Database
18,17,MessageQueue
19,18,REST
19,19,Database
19,20,MessageQueue
20,21,REST
20,22,Database
20,23,MessageQueue
21,24,REST
21,25,Database
21,26,MessageQueue
22,27,REST
22,28,Database
22,29,MessageQueue
23,30,REST
23,31,Database
23,32,MessageQueue
24,33,REST
24,34,Database
24,35,MessageQueue
25,36,REST
25,37,Database
25,38,MessageQueue
26,39,REST
26,40,Database
26,41,MessageQueue
27,42,REST
27,43,Database
27,44,MessageQueue
28,45,REST
28,46,Database
28,47,MessageQueue
29,48,REST
29,49,Database
29,50,MessageQueue
30,1,REST
30,2,Database
30,4,MessageQueue
31,6,REST
31,8,Database
31,10,MessageQueue
32,11,REST
32,12,Database
32,14,MessageQueue
33,16,REST
33,18,Database
33,20,MessageQueue
34,21,REST
34,22,Database
34,23,MessageQueue
35,24,REST
35,26,Database
35,28,MessageQueue
36,29,REST
36,30,Database
36,31,MessageQueue
37,32,REST
37,34,Database
37,36,MessageQueue
38,37,REST
38,38,Database
38,39,MessageQueue
39,40,REST
39,42,Database
39,44,MessageQueue
40,45,REST
40,46,Database
40,48,MessageQueue
41,49,REST
41,50,Database
41,1,MessageQueue
42,2,REST
42,3,Database
42,5,MessageQueue
43,6,REST
43,7,Database
43,9,MessageQueue
44,10,REST
44,11,Database
44,13,MessageQueue
45,14,REST
45,15,Database
45,17,MessageQueue
46,18,REST
46,19,Database
46,21,MessageQueue
47,22,REST
47,23,Database
47,25,MessageQueue
48,26,REST
48,27,Database
48,29,MessageQueue
49,30,REST
49,31,Database
49,33,MessageQueue
50,34,REST
50,35,Database
50,37,MessageQueue        

I just created a very nice set of synthetic data in just a couple of minutes.?

Note to self: Remember this experience!!!

Originally, I was going to break down “R” basics in this article, but I wasn’t sure it was worth my time given I didn’t know if people would be interested enouugh to engage.? As a result, I’m going to just provide the simple commented script for now.?

Depending on response to this article, I may or may not delve into more “instruction”.

App Network Visualization in R Script

Here's the simple R script I called "app-network-demo.R".

You'll naturally need to change the file paths based upon your install/project location.

rm(list = ls()) # Clears Environment

# Install the 'readr' package, if not already installed
if (!requireNamespace("readr", quietly = TRUE)) {
  install.packages("readr")
}
# Install the 'visNetwork' package, if not already installed
if (!requireNamespace("visNetwork", quietly = TRUE)) {
  install.packages("visNetwork")
}

#Load R Packagees
library(readr)
library(visNetwork)

# File paths to your CSV files provided by our test data synthesizer: ChatGPT
app_file_path <- "D:\\R\\Reading Data from Excel\\applications.csv"
dep_file_path <- "D:\\R\\Reading Data from Excel\\dependencies.csv"

# Read the CSV file 
apps <- read_csv(app_file_path)
deps <- read_csv(dep_file_path)

# Nodes
nodes <- as.data.frame(apps[1:2])  # Loads the first two columns (i.e. ID and Name) as data frame for visNetwork
colnames(nodes) <- c("id", "label") # Renames the columns to those expected by visNetwork

# Edges
edges <- as.data.frame(deps) #Loads the data into a data frame for visNetwork
colnames(edges) <- c("to","from","label") # Renames the column to those expected by visNetwork

#visNetwork(nodes, edges) # Unformatted View but commented out...

# Formatted Version of our graph, there are a lot more options.  Check them out!!!
# The syntax below uses pipes (ie. %>%) to send the output of one function/method to another
visNetwork(nodes, edges) %>%
  visIgraphLayout() %>%
  visNodes(
    shape ='dot',
    color = list (
      background = "#0085AF",
      border = "#013848",
      highlight = "#FF8000"
    ),
    shadow = list(enabled = TRUE, size = 10)
  ) %>%
  visEdges(
    arrows = list(to=list(enabled = TRUE, scaleFactor=1)),
    shadow = FALSE,
    color = list(color = "#0085AF", highlight = "#C62F4B")
  ) %>%
  visOptions(highlightNearest = list(enabled = T, degree = 1, hover = T),
             selectedBy = "group") %>%
  visLayout(randomSeed = 11)
        

The resulting diagram after you “Run” the entire script is interactive within the RStudio Viewer.

  • You can zoom in or out using your mouse wheel.
  • You can move the viewport by left click, hold and drag.
  • You can select specific nodes and edge. Selecting a node allows you to drag it around the viewport.
  • Double clicking a node highlights all dependencies with other nodes and their edges, while greying out the rest of the “network”

Try it out. Hearing about it is NOT the same as experiencing/interacting with it.

If there is enought interaction with this post, I'll look to enhance this script and show how you can use Shiny to provide additional interactivty that can be run OUTSIDE of RStudio.

R/RStudio Installation & Versions used

1) Download and install R

??????????? cran.rstudio.com

??????????? R-4.3.2 for Windows

?2) Download and install RStudio

??????????? 2023.09.1+494

??????????? Released: 2023-10-17

3) Rtools43

?? ???????? https://cran.rstudio.com/bin/windows/Rtools/




Nancy Chourasia

Intern at Scry AI

10 个月

Great share. The challenges in obtaining real data for training AI systems, particularly for DLNs, suggest the usage of synthetic data as a viable alternative. Synthetic data is generated using Generative Adversarial Networks, Diffusion Models, GPTs and computer simulations. Two prevalent techniques for creating synthetic data involve DLNs, specifically GANs and Diffusion Models. GANs, despite limitations, offer realistic data generation across various domains, from image-to-image translation to self-driving car training and retail applications. Diffusion Models, introduced as an alternative in 2020, have demonstrated high-quality synthetic image generation. Since 2018, GPTs have also been used to generate synthetic data. However, since synthetic data has only 90% resemblance with real data, it still needs to be improved substantially. Nevertheless, progress in this field is encouraging with one option being training an AI model first with synthetic data and then fine-tuning this partially trained model on a smaller set of real data. This approach proves valuable when dealing with limited-sized original datasets, and it reduces the need for labor-intensive manual labeling. More about this topic: https://lnkd.in/gPjFMgy7

回复

Not only can you create test data with the free ChatGPT 3.5 , but try having it write some "R" scripts for you. I did not use it to write my script, I only really used it for synthetice data in my example. However, this got me thinking about WHEN RStudio would have integrations with generative AI/LLM like many other IDEs have in recent months. Well, it looks like it's already there...not a surprise. Check this out: https://www.youtube.com/watch?v=rZB0_53Cm1w Disclaimer: I have not played with it yet.

回复
Krol (Kevin) Mathias, PhD

Sr. Manager, Data & AI practice

1 年

This is a real good application of generating test data and working through a network discovery model ??

Kelvin Meeks

Consulting Architect/CTO - Leadership in Enterprise Architecture and Software Engineering Innovation (US Army Veteran)

1 年

Highly recommend Dr. Robert Kabacoff's Manning Publications Co. book, "R in Action, 3rd Edition: Data analysis and graphics with R and Tidyverse" for folks that want to continue exploring the R programming language. https://www.amazon.com/Action-Third-Robert-I-Kabacoff/dp/1617296058/

Robert Blaszak

Director, IT Delivery Consultant

1 年

Putting a bookmark on this one!

要查看或添加评论,请登录

Byron Estes的更多文章

社区洞察

其他会员也浏览了