登录查看更多内容

GPT May Help Developers Understand Code Base

Pingping Xiu

Data Engineer Leader @ Caltrans | Data Engineering / AI

发布日期: 2022年12月8日

Getting a unified view of a complex tech stack is hard. Because

Name variations occur

Name variations (for a concept signal) in complex code stack (for example, an ML application) will cause a lot of pain on code understanding regarding to the behaviors of a concept signal. If we want to add “article count” as a new signal, from the input parsing, to training data preparation, all the way to model deployment, we would have to touch a lot of different code stack / modules, and the same article count have different representations in each environment, for example, article_count, articleCount, ArticleCount, ARTICLE_COUNT, articlecount, etc. And often, for compatibility purposes there may be version info such as article_count_v1. Yet, there maybe some mapping logics in config file such that article count is mapped with different name such as KnowledgeArticleVoteCount.?

Name variations cause frictions

The different identifier strings of the same entity may have strong semantic similarities; after all, it improves code readability by referring the same entity in a consistent manner. However, traditional code search does not recognize that so it is tricky to surface all related codes in the same view. The segmented views of the same entities will cause visibility problems and cause difficulties on code understanding.?

For manual code study regarding a certain entity, it may not present a big problem but it will make automation harder: suppose one wants to study many such signals and each signal may have different variations, then how can I tell outlier signals, given that the outlier can only be judged by the emergent properties of a signal across the stack? Without effectively stitching the contexts together under one unified identity, it is really hard.?

The Need for Text Simplification

Suppose we want to study N conceptual signals that spans M code components, and suppose each code components have different ways of referring to the same concept signal, then there are N*M different representations that refers to N unique things.?

To effectively study the N signals, we cannot work with N*M representations; rather, we need to find their connections and find their normative representations.?

GPT helps Text Simplification

I tried Text to command model in OpenAi playground:

article_count normative representation is ArticleCount.?articleCount normative representation is ArticleCount.?articleCountV1 normative representation is ArticleCount.?fileNameV1 normative representation is [FileName.]

HackerRank 3 个月前

How to Use Synthetic and Simulated Data Effectively

Towards Data Science 6 个月前

Data Analytics with Generative AI: A Detailed Guide

Data Science Dojo 1 年前

(Brackets represent the auto-completed text by GPT.)

fileNameV1 is correctly auto-completed to FileName, given my pre-staging text.?

It is straightforward to do the autocompletion for the rest of the N*M representations so that they linked together to become N things, rather than the isolated N*M things.?

When Complexity Fades, Emergent Properties Shine

After one correlates the same thing across contexts, a 360 degree view about that thing can be established.?

This 360 degree view can take many forms:?

A central repo that tracks a signal’s cross-stack life cycle
Flow charts that tracks each signal’s cross-stack life cycle
Or even AI summarization about each signal’s story, based on their cross-stack appearance.?

With that, we can spend more interesting time studying the emergent properties of the full system:?

Understand a signal’s SLA.?
Security review.?
Architectural soundness

Takeaway

Name variations may cause frictions on code readings, and it may block advanced code study throughout the tech stack.
Using GPT correctly, we can cost-effectively eliminate name variations, therefore build connections across silo-ed code base.
Emergent properties only show up with unified view.

要查看或添加评论，请登录

Pingping Xiu的更多文章

Domain Driven Reasoning on ChatGPT Trust

2023年3月20日

Domain Driven Reasoning on ChatGPT Trust

Most people are not interested in formalism since they don't know much about it. Most people are super interested in…
Domain-Driven Semantic Applications on ChatGPT

2023年3月14日

Domain-Driven Semantic Applications on ChatGPT

Abstract This post introduces domain modeling using semantics for building trustworthy ChatGPT applications. The author…

3 条评论
Pingping's Productive Week: Improving ChatGPT Trust and Semantic Governance

2023年3月12日

Pingping's Productive Week: Improving ChatGPT Trust and Semantic Governance

Question: Is ChatGPT answer trustable? Pingping's Answer: Due to the lack of transparency in ChatGPT's internal…

1 条评论
Design Patterns for ChatGPT Governance System

2023年3月10日

Design Patterns for ChatGPT Governance System

Abstract In this article, we introduce four foundational building blocks for semantically modeling a natural language…

1 条评论
Incremental Formalization Strategy for ChatGPT Governance

2023年3月9日

Incremental Formalization Strategy for ChatGPT Governance

Abstract This post introduces a method for gradually integrating new rules into an established rule system that governs…

1 条评论
How Organizations Establish ChatGPT Safe Zone

2023年3月7日

How Organizations Establish ChatGPT Safe Zone

Abstract We explored the possibility of establishing a "safe zone" where ChatGPT could operate within the desired…

2 条评论
Formal Semantics on Prompt Engineering

2023年3月6日

Formal Semantics on Prompt Engineering

Last week, I explored the end-2-end scenario how Formal Semantics will play a role in Prompt Engineering. How to…

1 条评论
"Tree-of-Thoughts" as an alternative to "Chain-of-Thoughts"

2023年3月5日

"Tree-of-Thoughts" as an alternative to "Chain-of-Thoughts"

Abstract Introducing the "Tree-of-Thoughts" approach, which enhances the precision of the current state of affairs in…
Designing a Random Number Generator for Coq: A Practical Solution for A Language Engineering Toolbox

2023年3月3日

Designing a Random Number Generator for Coq: A Practical Solution for A Language Engineering Toolbox

Abstract The proposed approach by the authors involves developing a novel random number generator in Coq that satisfies…

1 条评论
Another step towards ChatGPT Semantic Testing: Manifest /w Subtyping

2023年3月3日

Another step towards ChatGPT Semantic Testing: Manifest /w Subtyping

Abstract Language engineering involves systematically exploring a vast space of valid prompts and gathering insights…

1 条评论

See all articles

GPT May Help Developers Understand Code Base

Pingping Xiu

Data Engineer Leader @ Caltrans | Data Engineering / AI

Name variations occur

Name variations cause frictions

The Need for Text Simplification

GPT helps Text Simplification

领英推荐

When Complexity Fades, Emergent Properties Shine

Takeaway

Pingping Xiu的更多文章

社区洞察

其他会员也浏览了

AI-Powered Search: Embedding-Based Retrieval and Retrieval-Augmented Generation (RAG)

Build RAG applications using only APIs with Postman! ??

??Top ML Papers of the Week

A Complete Guide to Creating and Storing Vector Embeddings!

Building Retrieval Augmented Generation (RAG) from scratch - Feeding my Database Internal articles

?? How to Expand LLMs Memory

Should you use Retrieval-Augmented Generation (RAG) or Train the Model?

???????????? ?????????????????? ?????? ?????? ????????????????????????

Fine-Tuning vs. Prompting vs. RAG: Which to Pick for Your LLM?

OpenAI Introduces Structured Outputs - A Breakthrough for Developers

Name variations occur

Name variations cause frictions

The Need for Text Simplification

GPT helps Text Simplification

领英推荐

When Complexity Fades, Emergent Properties Shine

Takeaway

Pingping Xiu的更多文章

Domain Driven Reasoning on ChatGPT Trust

Domain-Driven Semantic Applications on ChatGPT

Pingping's Productive Week: Improving ChatGPT Trust and Semantic Governance

Design Patterns for ChatGPT Governance System

Incremental Formalization Strategy for ChatGPT Governance

How Organizations Establish ChatGPT Safe Zone

Formal Semantics on Prompt Engineering

"Tree-of-Thoughts" as an alternative to "Chain-of-Thoughts"

Designing a Random Number Generator for Coq: A Practical Solution for A Language Engineering Toolbox

Another step towards ChatGPT Semantic Testing: Manifest /w Subtyping

社区洞察

其他会员也浏览了

AI-Powered Search: Embedding-Based Retrieval and Retrieval-Augmented Generation (RAG)

Build RAG applications using only APIs with Postman! ??

??Top ML Papers of the Week

A Complete Guide to Creating and Storing Vector Embeddings!

Building Retrieval Augmented Generation (RAG) from scratch - Feeding my Database Internal articles

?? How to Expand LLMs Memory

Should you use Retrieval-Augmented Generation (RAG) or Train the Model?

???????????? ?????????????????? ?????? ?????? ????????????????????????

Fine-Tuning vs. Prompting vs. RAG: Which to Pick for Your LLM?

OpenAI Introduces Structured Outputs - A Breakthrough for Developers