登录查看更多内容

From cow paths to cobbled road: A Pythonic data journey

Vito Chin

发布日期: 2024年6月29日

I spent a good chunk of the past few months gathering & analyzing developer productivity data for an AI-assisted software delivery initiative. These data come from all kinds of sources, from ad hoc CSV exports to opinionated spreadsheets from the program manager to APIs such as GitHub REST API.

The former data (the ones closer to the CSV side of things) are really quite diverse. They represent in Agile-speak, the cow paths. I.e. people tend to use convenient methods that they are familiar with whether or not it is effective or efficient.

Sometimes, there are only cow paths. For example, when a business is just starting operations, or when a workflow is put together urgently.

While cow paths are natural in the early stages and can inform design, it is important that the actual road does not just "pave over" the cow path because a well-designed road should consider factors beyond just getting from point A to point B.

Handling cow paths with Python

One thing is obvious, Python is the language to go for when dealing with data cow paths. First thing I did:

import pandas as pd
    
xl = pd.ExcelFile('resources/pre.xlsx')
df = xl.parse('Form1')

Now that I have the data in a DataFrame (with 3 lines of code!), it is time to think about what to do with it. Again, at this juncture, the temptation to start "paving over" the cow path will be strong. "Why don't I just follow the structure of these cow paths?". Don't! It is a recipe for a lot of tech debt... It is important at this point to model right.

This is when Peewee comes to the rescue:

from peewee import *

database = SqliteDatabase('database.db')

class BaseModel(Model):
    class Meta:
        database = database

class Project(BaseModel):
    name = CharField(unique=True)
    description = TextField(null=True)
    start_date = DateField()
    end_date = DateField(null=True)

class Participant(BaseModel):
    project = ForeignKeyField(Project, backref='participants')
    name = CharField()
    role = CharField()
    email = CharField(null=True)

As you can see, Peewee and Python allows me to express the model in a clean, readable and easily maintainable way, the Pythonic way.

I also now have a model that is more efficient and effective. With this model, I can start building my cobbled road and it will probably scale well enough for a highway too.

What's more important is that I am writing very little code that isn't relevant to what I actually want to do. That is the beauty of the Pythonic way, it allows me to focus on the domain at hand, not the nitty gritty of aspects. For example, to get data from the DataFrame I created earlier to become persistent data objects, I just iterate and map the cow path fields to my model's naming convention as follows:

for index, row in df.iterrows():
    Project.create(name=row['name'], description=row['description'], start_date=row['initiation'], end_date=row['signoff'])

In my opinion, this is why Python won Data. The beauty of Pythonic abstractions is present in many data tools, from libraries such as pandas, to ORMs such as Peewee to notebooks in Microsoft Fabric.

领英推荐

Pandas - Create DataFrame

David Rojas, E.I. 9 个月前

Get a GUI for your Iceberg lakehouse with DuckDB UI…

Serhii Sokolenko ???? 2 周前

The Chance Framework: How to Explain A/B Test Results…

Matt Rosinski 1 年前

Building the cobbled road with Streamlit

Once you have the cow paths tackled. The proper road construction can begin.

Python has great tools here as well, and what I used is Streamlit. Streamlit pretty much settles the visualization aspects of the "cobbled road" phase with convenient visualizations that are compatible with data frames.

For example, to retrieve my projects' data as a DataFrame and visualize it with Streamlit:

import streamlit as st
import pandas as pd

# ... Peewee related code from earlier

project = Project.select()
project_df = pd.DataFrame(project.dicts())
project_df = project_df.rename(columns=lambda name: name.title().replace('_', ' '))

st.title('Project')

st.dataframe(project_df, hide_index=True)

With those few lines (that could be even shorter without the header formatting which is in itself very readable), we get:

Besides spreadsheet-style grids, DataFrames opens up a whole range of programmatic analysis and visualizations possibilities.

Again, the delight when using Python stems from the Pythonic approach of its surrounding ecosystem of data and related libraries, each of which is amazing in itself for various aspects of "building cobbled roads". I'd just shown a couple here.

Next steps

I am pretty happy with this Pythonic stack for my data app thus far. Next, I am looking at secure and continuous delivery onto Azure App Service to see if the Pythonic experience extends to the cloud as well. If you're interested to delve more into the origins of the Pythonic way and the Zen of Python, here's an interesting easter egg to try out in your Python interpreter ??:

>>> import this

Vito's Tech Kitchen

369 位关注者

要查看或添加评论，请登录

Vito Chin的更多文章

Bank statement vision with GPT-4o & Tesseract

2024年8月5日

Bank statement vision with GPT-4o & Tesseract

This technique is inspired by Sau Sheong Chang 's keynote at GopherCon Singapore 2023: (Full video here:…
Quantum musings with LLMs and Mermaid: Bridging text to visual diagrams

2024年5月19日

Quantum musings with LLMs and Mermaid: Bridging text to visual diagrams

Mermaid is a tool that render diagrams such as mind maps, flowcharts, ERDs, Gantt, Timeline and many more with text…

1 条评论
Happy days with Copilot

2023年11月18日

Happy days with Copilot

Hi everyone, welcome to my tech kitchen. In this episode, let's cook some examples of how GitHub Copilot can make your…

2 条评论
GPT-4, my AI, my thinking companion

2023年6月26日

GPT-4, my AI, my thinking companion

Like reading or learning, solo thinking is most effectively performed when the cognitive channel is clear. This is…
Benefits of Argo CD’s declarative approach to GitOps via state synching

2023年5月9日

Benefits of Argo CD’s declarative approach to GitOps via state synching

Argo CD enables GitOps, which is great, but GitOps can be achieved with other means such as a well-tuned pipeline too…
Fun with Cloudflare Zero Trust

2023年4月9日

Fun with Cloudflare Zero Trust

I signed up for Cloudflare Zero Trust recently, to kick the tires :) There is a free tier (up to 50 users!) and it can…

2 条评论
Row-level security & zero trust

2023年3月10日

Row-level security & zero trust

Row-level security (RLS) enables one of the most precise manifestations of the "least-privilege per-request access…
DevOps strategy and practice part 3: Culture of Change

2022年12月20日

DevOps strategy and practice part 3: Culture of Change

This month's Azure Recipe is actually the last part (part 3) of a series on DevOps strategy and practice that I'd…

1 条评论
Create your own "walking tasks" with Azure Boards export/import

2022年11月25日

Create your own "walking tasks" with Azure Boards export/import

Ever heard of walking decks? Well, if you start new projects regularly, why not consider having your very own "walking…
Create your own "walking tasks" with Azure Boards export/import

2022年11月25日

Create your own "walking tasks" with Azure Boards export/import

Ever heard of walking decks? Well, if you start new projects with different teams regularly, why not consider having…

See all articles

From cow paths to cobbled road: A Pythonic data journey

Vito Chin

Handling cow paths with Python

领英推荐

Building the cobbled road with Streamlit

Next steps

Vito's Tech Kitchen

369 位关注者

Vito Chin的更多文章

社区洞察

其他会员也浏览了

Differences Between 'datetime64[ns]' and 'Timestamp' in Pandas

Harnessing the Power of Data Analytics with Atoti: A Python-based API for CSV Input

Building a Decision Tree from Scratch: Gini Impurity Explained with Python

Graceful API Call Failure 101 for Data Scientists

How to build a Docker image and upload it to Docker Hub

Why Rust for Data Analysis and Statistics?

Extracting semi-structured data from JSON Dataset, Parsing, Transforming and Loading to an Excel file with Python Pandas library

Best Way to Use Pydantic in FastAPI: A Detailed Guide

Unveiling the Power of Pandas: Essential Tricks and Attributes

Handling cow paths with Python

领英推荐

Building the cobbled road with Streamlit

Next steps

Vito's Tech Kitchen

369 位关注者

Vito Chin的更多文章

Bank statement vision with GPT-4o & Tesseract

Quantum musings with LLMs and Mermaid: Bridging text to visual diagrams

Happy days with Copilot

GPT-4, my AI, my thinking companion

Benefits of Argo CD’s declarative approach to GitOps via state synching

Fun with Cloudflare Zero Trust

Row-level security & zero trust

DevOps strategy and practice part 3: Culture of Change

Create your own "walking tasks" with Azure Boards export/import

Create your own "walking tasks" with Azure Boards export/import

社区洞察

其他会员也浏览了

Differences Between 'datetime64[ns]' and 'Timestamp' in Pandas

Harnessing the Power of Data Analytics with Atoti: A Python-based API for CSV Input

Building a Decision Tree from Scratch: Gini Impurity Explained with Python

Graceful API Call Failure 101 for Data Scientists

How to build a Docker image and upload it to Docker Hub

Why Rust for Data Analysis and Statistics?

Extracting semi-structured data from JSON Dataset, Parsing, Transforming and Loading to an Excel file with Python Pandas library

Best Way to Use Pydantic in FastAPI: A Detailed Guide

Unveiling the Power of Pandas: Essential Tricks and Attributes