Unlocking Advanced Data Analysis for SWMM5 with ChatGPT's Beta Version of GPT-4: A Possible Game-Changer for Students and Beginners
Robert Dickinson
Autodesk Water Technologist for Storm Sewer and Flood | Expert in ICM InfoWorks ICM SWMM/Ruby | 18 Years at Innovyze/Autodesk | 52 Years with EPASWMM TAC for CIMM.ORG SWMM5+
What is GPT4 with Advanced Data Analysis and when was it added? On August 28, 2023, OpenAI introduced improvements to version 4 of GPT (chatGPT Plus) incorporating advanced data analysis. This exploratory project uses SWMM5 and the beta version of GPT-4 to determine if the model can understand the formats of both the INP and RPT files, and combine them together without requiring extensive manual assistance. We will investigate various prompts in this LinkedIn article,
Prompt 1/ What Files Can you Upload to Chat GPT?
It can read and process the following file formats:
Text Files (.txt): Plain text files.
CSV Files (.csv): Comma-separated values.
JSON Files (.json): JavaScript Object Notation format.
Excel Files (.xls, .xlsx): Microsoft Excel spreadsheets.
XML Files (.xml): Extensible Markup Language files.
YAML Files (.yml, .yaml): YAML Ain't Markup Language files.
HTML Files (.html): HyperText Markup Language files for web content (for table extraction).
Markdown Files (.md): Text-based markup language files.
SQL Files (.sql): SQL query files.
You upload the files by clicking on the plus and choosing a file from your PC or elsewhere.
Prompt 2/ Does it understand the File I uploaded?
My Prompt - Can you summarize the sections in the inp file?
Success! Here's a summary of the sections in the INP file along with the number of lines in each section:
Prompt 3/ Does it understand How to Fix data problems?
Yes, it can fix various text or CSV files on the fly. Which eliminates you having to conform to the data formatting needs.
It appears that the INP file contains non-UTF-8 encoded characters, which is causing the UnicodeDecodeError when attempting to read the file.
To address this, I'll attempt to read the file using a different encoding, such as ISO-8859-1 (also known as Latin-1), which is a common encoding for many files and can handle a wider range of characters.
Let's try reading the file again using the ISO-8859-1 encoding.
Prompt 4/ Does it understand more that me?
Yes, I had two versions of the Greenville model: USA and SI Units. When i saw meters for length I asked it why the mistake. Turns out it was my mistake and I was politely told why it was meters in the inp file I uploaded.
The flow units specified in the [OPTIONS] section of your INP file are LPS (Liters Per Second), which is part of the metric unit system. In this case, the lengths for links, such as conduits, are likely to be in meters.
Prompt 5/ Does it understand the column headings in the SWMM5 inp files?
Yes, it does for both nodes and links.
Certainly! Here are the typical variables (or headers) for links and nodes in a SWMM5 INP file:
For Links (Conduits):
Name: Identifier for the conduit
FromNode: Start node for the conduit
ToNode: End node for the conduit
Length: Length of the conduit
Roughness: Manning's roughness coefficient
InOffset: Offset height at the inlet
OutOffset: Offset height at the outlet
InitFlow: Initial flow rate in the conduit
MaxFlow: Maximum flow rate in the conduit
For Nodes (Junctions):
Name: Identifier for the junction
Elevation: Elevation of the junction
MaxDepth: Maximum depth for water at the junction
InitDepth: Initial depth of water at the junction
SurDepth: Surface depth at the junction
Aponded: Ponded area for the junction
Prompt 6/ It understands the column headers but can it make some meaningful statistics and graphs so I can understand better?
My Prompt - show me histograms of the node invert elevation and node max depth. This is a common student ot beginning SWMM5 user issue. Not understanding the need for inverts or forgetting to import the inverts. The inverts then are zero.
Yes, it can make histograms and statistical tables that find data errors.
Here are the histograms showing the distributions of node invert elevations and node max depths in your SWMM5 INP file: Left Histogram: Shows the distribution of node invert elevations. The units are in meters. Right Histogram: Shows the distribution of node max depths. The units are also in meters. Both histograms are divided into 10 intervals for a clearer view of the data distribution. If you have any more questions or need further details, feel free to ask!
Prompt 7/ Can it supply information not read from the SWMM5 inp file but derived from the inp data?
My Prompt - Now make a mean, min and max CFL value where CFL is length / (velocity + (9.8*Depth)^1/2) assuming a velocity of 1, 3 and 5 m/sec
The Courant-Friedrichs-Lewy (CFL) time step plays a pivotal role in the SWMM5 (Storm Water Management Model 5) simulation process. It serves as the foundation for the variable time step employed during the simulation, exerting a significant influence on two crucial aspects: the speed of the simulation and its continuity error. In essence, the CFL time step is a balancing act that optimizes both the computational efficiency and the accuracy of the simulation results.
Additionally, the CFL time step governs the time intervals used to generate summary tables. It's worth noting that if the CFL time step is set to a very small value, the number of time steps between the reported graphical intervals may be exceedingly large, which could impact the granularity and readability of the summary tables.
Yes, it can make estimates of the CFL time steps based on the velocity assumption and the pipe length and maximum diameter. The CFL values have been successfully calculated for velocities of 1 m/s, 3 m/s, and 5 m/s. Here are the mean, max, and min values for each set:
For a velocity of 1 m/s:
For a velocity of 3 m/s:
For a velocity of 5 m/s:
Prompt 8/ Fantastic, but how does that compare to the values actually used by SWMM5 during the Simulation?
Very well, the CFL estimate is based on the full depth and the CFL during the simulation is based on the current velocity and time step depth. Chat GPT has given us a good starting point for the routing time step.
Prompt 9/ Can we also read the text report file of SWMM5?
My Prompt - Show me all of the output tables in the file I just uploaded that are bracketed by asterisk lines.
Yes, it both reads the data and finds the tables. Note this includes the input file echo summary.
Here are the output tables in the report file that are bracketed by asterisk lines:
Element Count
Pollutant Summary
Landuse Summary
Raingage Summary
Subcatchment Summary
LID Control Summary
Node Summary
Link Summary
Cross Section Summary
Shape Summary
Transect Summary
Analysis Options
Rainfall Dependent I/I
Runoff Quantity Continuity
Runoff Quality Continuity
Groundwater Continuity
Flow Routing Continuity
Quality Routing Continuity
Highest Continuity Errors
Time-Step Critical Elements
Highest Flow Instability Indexes
Most Frequent Nonconverging Nodes
Routing Time Step Summary
Subcatchment Runoff Summary
LID Performance Summary
Groundwater Summary
Subcatchment Washoff Summary
Node Depth Summary
Node Inflow Summary
Node Surcharge Summary
Node Flooding Summary
Storage Volume Summary
Outfall Loading Summary
Link Flow Summary
Flow Classification Summary
Conduit Surcharge Summary
Pumping Summary
Link Pollutant Load Summary
Prompt 10/ We can read the output file but can we make something new that is not in the SWMM5 output or input?
My Prompt - Now find the full flow from the cross section summary and the maximum flow from the link flow summary and make a new output called qOverQfull which is maximum flow divided by full flow and then show me a ten bar histogram with the mean, max and min of the new output
Problem, ChatGPT cannot find the right columns so i have to add "in the link flow summary table - the maximum flow is the third column with a header that starts with maximum. In the cross section table the Full flow is the last column and has the word full" Suggestion, CSV files may be better. My internal solution is a new prompt - Okay, can you save what I uploaded to you as a csv file?
Finally, I can extract the data (Q full is large due to the fact there are rivers. " I've successfully captured the "Full Flow" values from the eighth column in lines 728 to 950 of the rpt file. Here are the calculated statistics:
Prompt 11/ It looks like I have to to specify the start, stop and number of columns for each section.
"The data appears to have a complex header spanning multiple lines, which likely caused the issue when creating the Pandas DataFrame. The actual data starts from the eighth line, with the header information on lines 6 and 7."
End Result - my means of Maximum flow, q/Q and d/D
Column 3 (Maximum Flow_LPS)
Column 6 (Max_Veloc_m_per_s)
Column 7 (q/Q)
Column 8 (d/D)
Prompt 13/ It seems I need to give you more information for the RPT file.
My Prompt - Show me the starting line of each section that has a line of **** around the name of the section, the estimated number of columns for each section, and the link number where the section data starts and ends.
Answer but without the columns
Sections with Asterisks in the Report File
==========================================
Section: Element Count
- Start Line: 24
- End Line: 26
- Estimated Columns: None
Section: Pollutant Summary
- Start Line: 35
- End Line: 37
- Estimated Columns: None
Section: Landuse Summary
- Start Line: 49
- End Line: 51
- Estimated Columns: None
Section: Raingage Summary
- Start Line: 59
- End Line: 61
- Estimated Columns: None
Section: Subcatchment Summary
- Start Line: 69
- End Line: 71
- Estimated Columns: None
Section: LID Control Summary
- Start Line: 106
- End Line: 108
- Estimated Columns: None
Section: Node Summary
- Start Line: 300
- End Line: 302
- Estimated Columns: None
Section: Link Summary
- Start Line: 480
- End Line: 482
- Estimated Columns: None
Section: Cross Section Summary
- Start Line: 722
- End Line: 724
- Estimated Columns: None
Section: Shape Summary
- Start Line: 953
- End Line: 955
- Estimated Columns: None
Section: Transect Summary
- Start Line: 993
- End Line: 995
- Estimated Columns: None
Section: Analysis Options
- Start Line: 1383
- End Line: 1385
- Estimated Columns: None
Section: Highest Continuity Errors
- Start Line: 1526
- End Line: 1528
- Estimated Columns: None
Section: Time-Step Critical Elements
- Start Line: 1536
- End Line: 1538
- Estimated Columns: None
Section: Highest Flow Instability Indexes
- Start Line: 1542
- End Line: 1544
- Estimated Columns: None
Section: Most Frequent Nonconverging Nodes
- Start Line: 1551
- End Line: 1553
- Estimated Columns: None
Section: Routing Time Step Summary
- Start Line: 1561
- End Line: 1563
- Estimated Columns: None
Section: Subcatchment Runoff Summary
- Start Line: 1578
- End Line: 1580
- Estimated Columns: None
Section: LID Performance Summary
- Start Line: 1619
- End Line: 1621
- Estimated Columns: None
Section: Groundwater Summary
- Start Line: 1815
- End Line: 1817
- Estimated Columns: None
Section: Subcatchment Washoff Summary
- Start Line: 1857
- End Line: 1859
- Estimated Columns: None
Section: Node Depth Summary
- Start Line: 1899
- End Line: 1901
- Estimated Columns: None
Section: Node Inflow Summary
- Start Line: 2082
- End Line: 2084
- Estimated Columns: None
Section: Node Surcharge Summary
- Start Line: 2266
- End Line: 2268
- Estimated Columns: None
Section: Node Flooding Summary
- Start Line: 2279
- End Line: 2281
- Estimated Columns: None
Section: Storage Volume Summary
- Start Line: 2286
- End Line: 2288
- Estimated Columns: None
Section: Outfall Loading Summary
- Start Line: 2314
- End Line: 2316
- Estimated Columns: None
Section: Link Flow Summary
- Start Line: 2332
- End Line: 2334
- Estimated Columns: None
Section: Flow Classification Summary
- Start Line: 2578
- End Line: 2580
- Estimated Columns: None
Section: Conduit Surcharge Summary
- Start Line: 2811
- End Line: 2813
- Estimated Columns: None
Section: Pumping Summary
- Start Line: 2823
- End Line: 2825
- Estimated Columns: None
Section: Link Pollutant Load Summary
- Start Line: 2839
- End Line: 2841
- Estimated Columns: None
Prompt 14/ Let us revisit the INP file based on what we learned.
My Prompt -find all sections that start with a [ and give me the section name, the starting line of the section and the number of lines in each sections, the column labels start with a ; and are on the first line after the section name
Yes, it works but with a lot of work by me,
Detailed Sections with First Line Column Labels in the INP File
=============================================================
Section: TITLE
- Start Line: 1
- Line Count: 4
- First Line Column Labels: Project Title/Notes
Section: OPTIONS
- Start Line: 5
- Line Count: 38
- First Line Column Labels: Option Value
Section: EVAPORATION
- Start Line: 43
- Line Count: 6
- First Line Column Labels: Data Source Parameters
Section: TEMPERATURE
- Start Line: 49
- Line Count: 8
- First Line Column Labels: Data Element Values
Section: RAINGAGES
- Start Line: 57
- Line Count: 7
- First Line Column Labels: Name Format Interval SCF Source
Section: SUBCATCHMENTS
- Start Line: 64
- Line Count: 34
- First Line Column Labels: Name Rain Gage Outlet Area %Imperv Width %Slope CurbLen SnowPack
Section: SUBAREAS
- Start Line: 98
- Line Count: 34
- First Line Column Labels: Subcatchment N-Imperv N-Perv S-Imperv S-Perv PctZero RouteTo PctRouted
Section: INFILTRATION
- Start Line: 132
- Line Count: 34
- First Line Column Labels: Subcatchment Param1 Param2 Param3 Param4 Param5
Section: LID_CONTROLS
- Start Line: 166
- Line Count: 37
- First Line Column Labels: Name Type/Layer Parameters
Section: LID_USAGE
- Start Line: 203
- Line Count: 190
- First Line Column Labels: Subcatchment LID Process Number Area Width InitSat FromImp ToPerv RptFile DrainTo FromPerv
Section: AQUIFERS
- Start Line: 393
- Line Count: 6
- First Line Column Labels: Name Por WP FC Ksat Kslope Tslope ETu ETs Seep Ebot Egw Umc ETupat
Section: GROUNDWATER
- Start Line: 399
- Line Count: 34
- First Line Column Labels: Subcatchment Aquifer Node Esurf A1 B1 A2 B2 A3 Dsw Egwt Ebot Wgr Umc
Section: SNOWPACKS
- Start Line: 433
- Line Count: 8
- First Line Column Labels: Name Surface Parameters
Section: JUNCTIONS
- Start Line: 441
- Line Count: 160
- First Line Column Labels: Name Elevation MaxDepth InitDepth SurDepth Aponded
Section: OUTFALLS
- Start Line: 601
- Line Count: 10
- First Line Column Labels: Name Elevation Type Stage Data Gated Route To
Section: DIVIDERS
- Start Line: 611
- Line Count: 5
- First Line Column Labels: Name Elevation Diverted Link Type Parameters
Section: STORAGE
- Start Line: 616
- Line Count: 25
- First Line Column Labels: Name Elev. MaxDepth InitDepth Shape Curve Type/Params SurDepth Fevap Psi Ksat IMD
Section: CONDUITS
- Start Line: 641
- Line Count: 231
- First Line Column Labels: Name From Node To Node Length Roughness InOffset OutOffset InitFlow MaxFlow
Section: PUMPS
- Start Line: 872
- Line Count: 13
- First Line Column Labels: Name From Node To Node Pump Curve Status Sartup Shutoff
Section: ORIFICES
- Start Line: 885
- Line Count: 7
- First Line Column Labels: Name From Node To Node Type Offset Qcoeff Gated CloseTime
Section: WEIRS
- Start Line: 892
- Line Count: 8
- First Line Column Labels: Name From Node To Node Type CrestHt Qcoeff Gated EndCon EndCoeff Surcharge RoadWidth RoadSurf Coeff. Curve
Section: OUTLETS
- Start Line: 900
- Line Count: 6
- First Line Column Labels: Name From Node To Node Offset Type QTable/Qcoeff Qexpon Gated
Section: XSECTIONS
- Start Line: 906
- Line Count: 233
- First Line Column Labels: Link Shape Geom1 Geom2 Geom3 Geom4 Barrels Culvert
Section: TRANSECTS
- Start Line: 1139
- Line Count: 110
- First Line Column Labels: Transect Data in HEC-2 format
Section: LOSSES
- Start Line: 1249
- Line Count: 227
- First Line Column Labels: Link Kentry Kexit Kavg Flap Gate Seepage
Section: CONTROLS
- Start Line: 1476
- Line Count: 53
- First Line Column Labels: None
Section: POLLUTANTS
- Start Line: 1529
- Line Count: 10
- First Line Column Labels: Name Units Crain Cgw Crdii Kdecay SnowOnly Co-Pollutant Co-Frac Cdwf Cinit
Section: LANDUSES
- Start Line: 1539
- Line Count: 7
- First Line Column Labels: Sweeping Fraction Last
Section: COVERAGES
- Start Line: 1546
- Line Count: 35
- First Line Column Labels: Subcatchment Land Use Percent
Section: LOADINGS
- Start Line: 1581
- Line Count: 4
- First Line Column Labels: Subcatchment Pollutant Buildup
Section: BUILDUP
- Start Line: 1585
- Line Count: 16
- First Line Column Labels: Land Use Pollutant Function Coeff1 Coeff2 Coeff3 Per Unit
Section: WASHOFF
- Start Line: 1601
- Line Count: 16
- First Line Column Labels: Land Use Pollutant Function Coeff1 Coeff2 SweepRmvl BmpRmvl
Section: INFLOWS
- Start Line: 1617
- Line Count: 11
- First Line Column Labels: Node Constituent Time Series Type Mfactor Sfactor Baseline Pattern
Section: DWF
- Start Line: 1628
- Line Count: 142
- First Line Column Labels: Node Constituent Baseline Patterns
Section: HYDROGRAPHS
- Start Line: 1770
- Line Count: 41
- First Line Column Labels: Hydrograph Rain Gage/Month Response R T K Dmax Drecov Dinit
Section: RDII
- Start Line: 1811
- Line Count: 59
- First Line Column Labels: Node Unit Hydrograph Sewer Area
Section: CURVES
- Start Line: 1870
- Line Count: 112
- First Line Column Labels: Name Type X-Value Y-Value
Section: TIMESERIES
- Start Line: 1982
- Line Count: 6204
- First Line Column Labels: Name Date Time Value
Section: PATTERNS
- Start Line: 8186
- Line Count: 35
- First Line Column Labels: Name Type Multipliers
Section: REPORT
- Start Line: 8221
- Line Count: 7
- First Line Column Labels: Reporting Options
Section: TAGS
- Start Line: 8228
- Line Count: 2
- First Line Column Labels: None
Section: MAP
- Start Line: 8230
- Line Count: 4
- First Line Column Labels: None
Section: COORDINATES
- Start Line: 8234
- Line Count: 176
- First Line Column Labels: Node X-Coord Y-Coord
Section: VERTICES
- Start Line: 8410
- Line Count: 181
- First Line Column Labels: Link X-Coord Y-Coord
Section: Polygons
- Start Line: 8591
- Line Count: 5460
- First Line Column Labels: Subcatchment X-Coord Y-Coord
Section: SYMBOLS
- Start Line: 14051
- Line Count: 5
- First Line Column Labels: Gage X-Coord Y-Coord
Prompt 15/ Fantastic, but do i have to copy and paste all of this information?
No, you can save the work to a document, csv or text file. You have to download it soon, however, or it seems to go away.
Director of Engineering at Xylem Inc.
1 年what a "human intelligence" way of controlling "artificial intelligence"!
Senior Managing Director
1 年Robert Dickinson Fascinating read.?Thank you for sharing.
Infrastructure Management Consultant | Freese and Nichols, Inc. | Carnegie Mellon
1 年Absolutely loved it! I’m getting my subscription today!
Chief Executive Officer
1 年marvellous