Mining Jira with Jupyter: Extracting and rendering hidden process information with Jira API and Python
For many of us, Jira is the "Source of Truth" for all Planning, Development, Quality Assurance and even the Triage and Resolution of support and feature requests.
We define Processes and build Workflows, sometimes in tandem. Performing Retrospectives on these to gauge alignment or find fault in the resulting processes-at-scale is hard to do, and a roadmap to perform one isn't clearly defined. This article alone won't change that, but I do hope to fan the flames of Process Discovery and Mapping for Jira.
Here are the problems as I saw them:
Before continuing, I'd like to acknowledge the work that inspired this article. By combining the approaches in these articles, I have been able to make and share some eye opening discoveries about the status of my Jira Projects, Issues and the dynamics of the teams working there.
Process Mining in Jira by Kjell Tore Guttormsen
Using Jupyter Notebooks to Access Jira by Michael March/Isos Technology
Reading and Visualizing Data from Jira by Sergei Dmitriev
Tools
As our careers advance and we face different challenges in Computer Science and Information Technology, we often solve the problem with software. The advancement of the science and software you discover on one project may weave its way into a future challenge and offer new capabilities. That's what happened here.
Disco: While following up on the state of Customer Journey Mapping tools like XESame, I came across the Process Mining article above. In it, Kjell discussed their challenge, some findings and a tantalizing diagram produced with Fluxicon's Disco software based on Jira data like: issue key, summary, assignee, date of status transition, status, department, and a few other properties.
Jupyter: The Disco article discussed using SQL queries to extract the necessary data, something that would make some Admins seethe. Not an option here, either, and while there are several apps for Jira like Midori Better Exporter, which can export the desired changelog data, it's not enough to carry a recommendation to Production. I prefer to use the API and the Python Jira API wrapper delivers on the general utility and appeal for other use cases. IPython/Jupyter notebooks have been around for almost a decade now, and I've always looked for a practical application for them, and lo!
领英推荐
!pip install jira
More Jupyter & hello to Gephi: Michael's article (though lacking a Notebook to deploy) offers a good example of how to get started using Jupyter and Jira together. Tip: If you're looking for a quick way to go from 0 to Jira data in a Pandas Dataframe, consider his Docker method.
If you have a Jupyter environment (even if you're a Cloud customer) you can step up your analysis by exploring Sergei's excellent step-by-step example (there are a few sneaky code edits to make that aren't discussed in the article - I will try to comment on them below) to get up and running with a Jira data extract containing the desired changelog data:
# Read data from Jira with changelog
jira_search = jira.search_issues(jql, startAt=block_num*block_size, maxResults=block_size,
fields="issuetype, created, resolutiondate, reporter, assignee, status",
expand='changelog')
# Get information from changelog
history_assignee = []
histories = issue.raw['changelog'].get('histories', None)
if histories is not None:
for history in histories:
for item in history['items']:
if item['field'] == 'assignee':
# Get history author, previous assignee, new assignee
history_author = history.get('author', None)
if history_author is not None:
history_author = history_author['key']
history_assignee.append([history_author, item['from'], item['to'], datetime.strptime(history['created'][:19], "%Y-%m-%dT%H:%M:%S")])g
By using?search_issues?method we can read not only fields, but also, for example, changelog. For doing this, parameter?expand='changelog'?must be passed. Changelog is interesting in terms of statuses or assignees history. Suppose, we would like to know assignees history for a given issue (who made change, previous assignee, new assignee, change date and time)
This change and history data in CSV can be used not only in Disco, but the example here uses the NetworkX library to generate a GraphML file which can be in the (frankly very entertaining) Gephi tool, where it is possible to explore and filter these data from Jira in very insightful ways not currently possible (or desirable) in the app:
Looking ahead
The appoach shows a lot of promise as a way for Admin and Consulting teams to deliver and share tools with a lot of value, particularly for situations where out-of-the box software either is cost-prohibitive or too limited.
We often "meet users where they are" which can mean Excel & CSV files and Jira Filters.
Even if you only ever pull Issues out into a table or make a call to your favorite REST APIs, you can save the notebook, copy it, commit it to a git repository, share it with a friend, post it to Slack, email it, blog about it or forget about it until next year when you might have otherwise forgotten how you built it (or the Confluence page wasn't sufficient).
Speaking of Confluence, did you know you can also view Jupyter Notebooks there?
Additional Links
Heat + Pressure + Data = ??
3 年Here is a GitHub repo containing the GraphML example with the edits made that I mentioned. You can also refer to this pasted-together version of Sergei's code if you want to learn along. Happy hacking! https://github.com/wjkennedy/JupyterNotebooks/blob/master/JiraJupyterPython-Gephi_sdmitriev.ipynb