Use a watched folder to Convert JSON to another format
Ferry van der Vorst
Helping organisations to get the most out of technology to support their people and processes
When a system outputs a JSON format, and the information needs to be sent to users or teams, the output might not be easy to read.
Take for instance the output of security and vulnerability scans, like Trivy scans, where the output could be a Static Analysis Results Interchange Format (SARIF) file, which is a standardised JSON format. The output of a Trivy scan could look similar to the structure below.
{
"version": "<sarif version number>",
"$schema": "<sarif schema>",
"runs": [
{
"tool": {
"driver": {
"fullName": "Trivy Vulnerability Scanner",
"informationUri": "<trivy artefacts & documentation>",
"name": "Trivy",
"rules": [
{
"id": "<rule-id>",
"name": "<type>",
"shortDescription": {
"text": "<summary>"
},
"fullDescription": {
"text": "<description>"
},
"defaultConfiguration": {
"level": "<rule level>"
},
"helpUri": "<url to security rule>",
"help": {
"text": "<help text>",
"markdown": "<help text (markdown)>"
},
"properties": {
"precision": "<precision>",
"security-severity": "<severity value>",
"tags": [
"<tags>"
]
}
},
...
],
"version": "<trivy version>"
}
},
"results": [
{
"ruleId": "<rule-id>",
"ruleIndex": "<index number (int)",
"level": "<rule level>",
"message": {
"text": "<result info>"
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "<file>",
"uriBaseId": "<uri base>"
},
"region": {
"startLine": "<start line of the error (int)>",
"startColumn": "<start column of the error (int)>",
"endLine": "<end line of the error (int)>",
"endColumn": "<end column of the error (int)>"
}
},
"message": {
"text": "<file>"
}
}
]
},
...
],
"columnKind": "<column type>",
"originalUriBaseIds": {
"ROOTPATH": {
"uri": "<file>"
}
}
}
]
}
While there are Viewers and online converters available, and it is possible to include libraries to convert SARIF to other formats, it is also possible to create a Power Automate flow that checks a folder (e.g. a SharePoint folder or a OneDrive folder) for the creation of a new file (like an uploaded file) and convert the content to a format that can subsequently be shared, filtered, analysed, and so on, meaning users without the right tooling or knowledge can convert the file and result outputs can be standardised.
Approach
Note that while this example seems very specific, a similar process can be used for other complex JSON files. For instance, it would be possible to change the trigger, create a different output, like a document, have a different JSON structure altogether with or without filtering, and so on.
In this article, a process is described to:
- Set a trigger to watch a SharePoint folder for new files with extension .sarif or .sarif.json and when added retrieve the content of the file
- Parse the JSON, so the date can be more easily used in subsequent actions
- Retrieve the "rules" and "results" arrays from the JSON object (see sample above) apply a filter, so the rule and result information can be combined and output as a single row in a CSV file
- Flatten JSON objects, so the CSV output becomes easier to read (no brackets, etc in the text due to nested arrays and objects)
- Enhance the output with the additional info in the original JSON object and output the content as a CSV in the watched folder with the same name as the original file
Implementation
The Trigger
For this scenario, a Cloud flow is created that checks if a file is created in SharePoint (or in a Teams folder under Files) using the SharePoint action, named "When a file is created (properties only)". The folder selected will be the watched folder.
The output provides header and body information, but the information needed can be derived from the following keys:
{
"headers": {
...
},
"body": {
...,
"{Link}": "<full URL to the file>",
"{Name}": "<file name without extension>",
"{FilenameWithExtension}": "<file name with extension>",
"{Path}": "<folder path after the site name (no file name)>",
"{FullPath}": "<folder path & file with extension (excl. site name)>",
...
}
}
Creating Variables
For this flow, four variables are initialized using the Initialize Variable action:
- Variable "Content", which is an Object variable, and will be used to store the content of the file that was created
- Variable "Rules", which is an Array variable, and will be used to store the content of the Rules array (as per the structure above ['runs'][0] -> ['tool'] -> ['driver'] ->['rules'])
- Variable "Results", which is an Array variable, and will be used to store the content of the Results array (as per the structure above ['runs'][0] -> ['results'])
- Variable "Output", which is an Array variable, and will be used to store the content of the output of the reformatted array containing the flattened information of the combined Rules and Results objects.
At the moment, the variables will be left empty as the JSON is to be parsed first and the variables only need to be populated if the file format is identified (e.g. the content is JSON).
Set a condition
Since the aim is to only convert sarif files, which can have the extension .sarif or .sarif.json, a condition action is added where a check is made to assess if the "File Name With Extension" (triggerOutputs()?['body/{FilenameWithExtension}']) ends with either extension.
All subsequent actions will be added to the If Yes branch of the condition, while the If No branch will be kept empty (do nothing).
Getting the file content
The first step in the Yes branch is to obtain the file content. This is done in 2 steps:
- Get file content using path
- Extracting and converting the action content from within the file and store this in the Content variable.
The file content is obtained via the SharePoint action "Get file content using path". The File Path is set to the Full Path key from the trigger action (triggerOutputs()?['body/{FullPath}']).
While the Site Name can be selected via a dropdown, a custom function is used to extract this instead. This means that when the watched folder changes, it would only need to be changed in the trigger and the flow still works as expected.
replace(
triggerBody()?['{Link}'],
replace(
triggerBody()?['{FullPath}'],
' ',
'%20'
),
''
)
Note: When creating the function for the site address in the SharePoint actions, there is a replacement that replaces a space with a %20. This is because the output of the Link uses %20 in the path, while the FullPath uses spaces.
The output body of the step would look something like this:
{
"statusCode": 200,
"headers": {
...
},
"body": {
"$content-type": "...",
"$content": "..."
}
}
The only value that is of interest for this flow is the value of $content. The value itself is base64 encoded.
To store the value in the Content variable, the following function is used, where we get the value of $content, convert it from base64 to a string and turn it into a JSON object in the format as was shown at the top of the article:
json(
base64ToString(
body('Get_file_content_using_path')?['$content']
)
)
After this, the content of the Variable is parsed, and the schema is generated using a Trivy file as an example by copying the JSON sample payload in the Generate Sample section and generate the schema.
Retrieving the Results and Rules Arrays
Now the content of the file is parsed, the Results and Rules arrays are retrieved.
The Rules Array can be retrieved and stored in the Rules variable via the following function:
body('Parse_JSON')?['runs'][0]?['tool']?['driver']?['rules']
And the Results array can be stored in the Rules variable as follows:
body('Parse_JSON')?['runs'][0]?['results']
Note that this works due to a run only containing one object with the rules and results for a specific run. If there were more objects in the "runs" array (i.e. multiple runs with each their own rules and results set), the flow would be more complex as each run and result would have to be looped through, and all subsequent steps would also be looped for each run.
Creating a new array with the rules and results for the output
Now all the information required is available, the data can be modelled for output. This is done in a for each loop, where each Results (from the Results variable) is looped through.
The first step in the loop is to filter the Rules array to return the rule where the id of the rule matches the ruleId of the result. As the id of the rule is unique, each ruleId of a result has an association with a single rule.
领英推è
Next, the Rules and Results Arrays are being flattened. This is done via Compose actions, here named "Flatten Rules Array" and "Flatten Results Array" respectively.
Note that while flattening the arrays is not required, it will assist in making the output CSV file more readable. If not flattened, the CSV file populate the values similar to the below on the left (with braces, quotes, etc), while the same location field would be similar to the layout on the right (each nested value in a separate column):
The Flatten Rule Array Compose action has a Dynamic function:
The flatten action essentially copies the info of a nested object into a new object and subsequently removes the nested object. As this is done with multiple objects, the nesting of the function might look quite daunting. If preferred, this could be done via multiple separate Compose actions also.
For example in the item below:
removeProperty(
addProperty(
items('Apply_to_each_Result'),
'outcome',
items('Apply_to_each_Result')?['message']?['text']
),
'message'
)
A property named outcome is added to the result object (items('Apply_to_each_Result') as one item is returned in the For Each loop) with the value of the text key within the message object (items('Apply_to_each_Result')?['message']?['text']) and subsequently the message object (which includes to text key/value pair) is deleted.
The Tags key in the flatten function is converted to a string and subsequently the brackets and quote marks removed. It would have been possible to target individual tags (e.g. body('Filter_Rules_array_by_RuleId')[0]?['properties']?['tags'][0], body('Filter_Rules_array_by_RuleId')[0]?['properties']?['tags'][1], etc) and combine them, but this will make it more prone to missing info as it might not be clear how many strings will be in each Tags array.
Finally, the severity is extracted and in this example, it was always the 3rd tag (body('Filter_Rules_array_by_RuleId')[0]?['properties']?['tags'][2]). This was done as Severity might need to be filtered on by the end user.
The output of an object in the Flatten Rules Array would be something like:
{
"id": "<rule-id-1>",
"name": "<type>",
"helpUri": "<url to security rule>",
"summary": "<summary>",
"description": "<description>",
"configurationLevel": "<rule level>",
"helpText": "<help text>",
"precision": "<precision>",
"securitySeverity": "<severity value>",
"ruleTagsTags": "<tag1>,<tag2>,<tag3>",
"severity": "<tag3>"
}
The Flatten Result Array Compose action has the following function and follows the same process as that for the Flatten Rules Array action.:
The output of an object of the Flatten Results Array action would look something like this:
{
"ruleId": "<rule-id-1>",
"ruleIndex": 1,
"level": "<rule level>",
"outcome": "<result info>",
"location": "<file>",
"locationUri": "<file>",
"locationBaseId": "<uri base>",
"startLine": 1,
"startColumn": 1,
"endLine": 1,
"endColumn": 1
}
Note: While the flattening of info in the specific flattening actions worked for this specific file, keys, array order in nested arrays within objects, etc might differ and might need to be checked and tested before running it.
Now the objects of the Rules and Results are flattened, they can be joined via the Join action, here named "Join Results and Rules". The From function is as follows:
createArray(
outputs('Flatten_Rules_Array'),outputs('Flatten_Results_Array')
)
and will be joined with a comma (,).
Finally, the output from the Join action will be appended to the Output variable. Since the Join joins the two literal objects, and since the aim is to make it a single object, the output will be converted to a string, the substring },{ will be replaced with a comma and the result will be converted back to a json object.
The corresponding function will be:
json(replace(string(body('Join_Results_and_Rules')),'},{',','))
This is the end of the loop, and since the loop runs through all results, the Output variable will contain an array of objects that subsequently comprises the flat key/value pairs of the Results and Rules objects.
Note that depending on the number of rules and results, this loop can take some time to complete when running.
Create the output
In this example, the intent is to create a CSV file in the same folder where the file was added. The first step is to create a CSV table from the information in the Output variable, using the Create CSV Table action.
However, since there is also information in the original JSON file that was not contained in the Rules and Results array, an additional Compose action is added to combine this information with the CSV table generated:
Note that since the CSV comprises Comma Separated Values, the format is represented in the Compose action, so for example rootPathUrl would be in the first column and the associated value in the second column.
As can be seen, some values are represented as a field value from the Parse JSON action, e.g. $schema and version, while others use a function. THe functions are all xpaths to the relevant information as they cannot be selected under Dynamic Content. for instance, the fullName value can be retrieved via xpath :
body('Parse_JSON')?['runs'][0]?['tool']?['driver']?['fullName']
based on the following structure:
{
...
"runs": [
{
"tool": {
"driver": {
"fullName": "Trivy Vulnerability Scanner",
...
}
}
}
]
...
}
Now all the content is available, the final step is to create the actual CSV file and send it to the required folder:
The Path is the same path as that from the file that was originally retrieved, the file name uses the name part of the file that was retrieved (file name without extension) with the extension .csv, and the File Content is the output form the last Compose element.
Similarly to when the content was retrieved, the Site Address uses:
replace(
triggerBody()?['{Link}'],
replace(
triggerBody()?['{FullPath}'],
' ',
'%20'
),
''
)
Or can be selected from a dropdown.
The overall flow will end up being something like:
Usage
Its use is pretty simple. The user with the Sarif file drops the file in the watched folder and the process converts the file automatically. After a short while (depending on the size of the file), a CSV file will appear in the folder. The output will look something similar to the following (based of the original JSON file):
Notes
- In this example, it is assumed that the original JSON object has a single object in the "runs" array, which is why no loop was required for the original results and rules arrays. Instead e.g. body('Parse_JSON')?['runs'][0]?... could be used to target the first run object (['runs'][0]) If there are more runs, this would add complexity as the runs need to be looped through, etc.
- For the Site Address in the SharePoint related steps after the trigger, it would have been possible to select the site address from a dropdown. However, in this case a function was used. This was done so it is easier to move the watch folder to a different location as only the site address and folder in the trigger needs to be changed instead of in all SharePoint actions.
- When creating the function for the site address in the SharePoint actions, there is a replacement that replaces a space with a %20. This is because the output of the Link in replace(triggerBody()?['{Link}'],replace(triggerBody()?['{FullPath}'],' ','%20'),'') uses %20 in the path, while the FullPath uses spaces. As the Link returns full URL to the file (site address + full path) and the Fullpath is the path after the Site Address only, removing the FullPath from the URL leaves the Site Address.
- In this example, a separate array is created for results and rules. The main reason is that the ruleId for a result is essentially linked to the id of a rule. So filtering the id of a rule on the ruleId means only one rule is returned in the filter action. If no filtering/mapping is needed, it might not be needed to add additional variables, filters, etc in.
- When building the flow, some adjustments might be needed as the input might differ (like key names, JSON structure, tag order and key availability). Always test with a representative dataset, make sure the JSON is parsed with the representative dataset and adjust the flow accordingly. Areas where issues might most likely arise are the flatten actions and the compose action to enhance the CSV as they use specific xpaths and key names.