From cow paths to cobbled road: A Pythonic data journey
I spent a good chunk of the past few months gathering & analyzing developer productivity data for an AI-assisted software delivery initiative. These data come from all kinds of sources, from ad hoc CSV exports to opinionated spreadsheets from the program manager to APIs such as GitHub REST API.
The former data (the ones closer to the CSV side of things) are really quite diverse. They represent in Agile-speak, the cow paths. I.e. people tend to use convenient methods that they are familiar with whether or not it is effective or efficient.
Sometimes, there are only cow paths. For example, when a business is just starting operations, or when a workflow is put together urgently.
While cow paths are natural in the early stages and can inform design, it is important that the actual road does not just "pave over" the cow path because a well-designed road should consider factors beyond just getting from point A to point B.
Handling cow paths with Python
One thing is obvious, Python is the language to go for when dealing with data cow paths. First thing I did:
import pandas as pd
xl = pd.ExcelFile('resources/pre.xlsx')
df = xl.parse('Form1')
Now that I have the data in a DataFrame (with 3 lines of code!), it is time to think about what to do with it. Again, at this juncture, the temptation to start "paving over" the cow path will be strong. "Why don't I just follow the structure of these cow paths?". Don't! It is a recipe for a lot of tech debt... It is important at this point to model right.
This is when Peewee comes to the rescue:
from peewee import *
database = SqliteDatabase('database.db')
class BaseModel(Model):
class Meta:
database = database
class Project(BaseModel):
name = CharField(unique=True)
description = TextField(null=True)
start_date = DateField()
end_date = DateField(null=True)
class Participant(BaseModel):
project = ForeignKeyField(Project, backref='participants')
name = CharField()
role = CharField()
email = CharField(null=True)
As you can see, Peewee and Python allows me to express the model in a clean, readable and easily maintainable way, the Pythonic way.
I also now have a model that is more efficient and effective. With this model, I can start building my cobbled road and it will probably scale well enough for a highway too.
What's more important is that I am writing very little code that isn't relevant to what I actually want to do. That is the beauty of the Pythonic way, it allows me to focus on the domain at hand, not the nitty gritty of aspects. For example, to get data from the DataFrame I created earlier to become persistent data objects, I just iterate and map the cow path fields to my model's naming convention as follows:
for index, row in df.iterrows():
Project.create(name=row['name'], description=row['description'], start_date=row['initiation'], end_date=row['signoff'])
In my opinion, this is why Python won Data. The beauty of Pythonic abstractions is present in many data tools, from libraries such as pandas, to ORMs such as Peewee to notebooks in Microsoft Fabric.
领英推荐
Building the cobbled road with Streamlit
Once you have the cow paths tackled. The proper road construction can begin.
Python has great tools here as well, and what I used is Streamlit. Streamlit pretty much settles the visualization aspects of the "cobbled road" phase with convenient visualizations that are compatible with data frames.
For example, to retrieve my projects' data as a DataFrame and visualize it with Streamlit:
import streamlit as st
import pandas as pd
# ... Peewee related code from earlier
project = Project.select()
project_df = pd.DataFrame(project.dicts())
project_df = project_df.rename(columns=lambda name: name.title().replace('_', ' '))
st.title('Project')
st.dataframe(project_df, hide_index=True)
With those few lines (that could be even shorter without the header formatting which is in itself very readable), we get:
Besides spreadsheet-style grids, DataFrames opens up a whole range of programmatic analysis and visualizations possibilities.
Again, the delight when using Python stems from the Pythonic approach of its surrounding ecosystem of data and related libraries, each of which is amazing in itself for various aspects of "building cobbled roads". I'd just shown a couple here.
Next steps
I am pretty happy with this Pythonic stack for my data app thus far. Next, I am looking at secure and continuous delivery onto Azure App Service to see if the Pythonic experience extends to the cloud as well. If you're interested to delve more into the origins of the Pythonic way and the Zen of Python, here's an interesting easter egg to try out in your Python interpreter ??:
>>> import this