My Summer at Sourcegraph
Alex Bildner
CS at Duke University | Prev: Software / Machine Learning Engineer Intern at Sourcegraph
Hi! My name is Alex Bildner and I am a Machine Learning Engineer Intern at Sourcegraph. I would like to share why I joined Sourcegraph and my experience here this summer.
I became familiar with Sourcegraph through an episode of their Dev Tool Time interview series, where they discuss software developer workflows and productivity techniques. I looked into Sourcegraph after watching, and my interest was piqued after learning about the company’s focus on creating cutting-edge developer tools. After meeting with my recruiter Trevor, I was particularly interested in the opportunity presented to make an impact on their newest product Cody, Sourcegraph’s new LLM (large language model) based AI developer assistant. Further, Sourcegraph’s values aligned with those I sought in a company, particularly their developer-first emphasis and the high agency given to teammates.
Throughout my summer internship, I worked on a few Cody projects. The first was programmatically developing a dataset for evaluating Cody’s code generation capabilities. While we iterate on Cody, one of the challenges faced is determining whether our changes improve or degrade Cody’s outputs. The nascency of the personalized code generation space means we lack canonical evaluation methods and datasets. I generated one for the team by programmatically creating synthetic queries similar to those which users might ask Cody, and then pairing those queries with code snippets of varying levels of relevance.
The second project I worked on was developing a machine-learning model to identify which types of context would be useful for Cody based on a user’s query. As our Head of Engineering Steve Yegge explained in his excellent blog post, appropriate and high-quality context is paramount for getting good results from an LLM. Accordingly, a significant problem is determining the proper context. Does the LLM need context from the codebase as it exists now? Does it need context about what changed recently in the repository? Does it need context from docs? Does it need context about who owns a piece of code?
领英推荐
To solve this problem for Sourcegraph, I created a machine-learning model capable of answering these questions (and more) – determining the appropriate context types most relevant to the user’s message and most helpful for Cody. I also integrated this model into Cody, after discussions with other developers on how to best add this functionality into the existing Cody architecture.
The last project I worked on at Sourcegraph was exploring additional data sources we could connect Cody with, including project management tools such as Github issues and Jira as well as knowledge base/code documentation data sources such as Confluence and Docusaurus docs, to name a few. I created a web app that used semantic similarity to display issues in a Github repo most related to a user's query. My work here can be a starting point for providing Cody context from more data sources like Github issues.
I thoroughly enjoyed my experience at Sourcegraph this summer and learned a lot about machine learning and software development. I substantially developed my technical skills: gaining experience finetuning models using Hugging Face, creating my first service using FastAPI, reading and writing Go and TypeScript for the first time, and gaining experience working in collaborative projects across a large software organization. In addition, I also witnessed firsthand what building software in an obsessive customer-first environment looks like. The communication channels at Sourcegraph are open and transparent, and it was incredible to see Sourcegraph teammates, from individual contributors through the company’s cofounders/executives, CEO and CTO Quinn Slack and Beyang Liu respectively, hacking on things, iterating, and shipping: all in the name of making Cody “more loveable” for customers.
I am grateful to have had the opportunity to work at Sourcegraph this summer. I would like to thank my manager Erika Rice Scherpelz, my mentor Rok Novosel, my recruiter Trevor Houghton, my onboarding guide Kemper Hamilton, and all my Sourcegraph teammates who helped me learn and grow this summer.
Director Software Engineering | Team Builder | Customer-minded Innovative Technology Leader
1 年Thank you for taking the time to share your experience with Sourcegraph. As a Senior Engineering Manager, it's refreshing to see managers and companies invest in their people.
Director of Digital Forensics, Incident Response and Special Projects
1 年Wonderful! Congratulations Alex and well done!
Manufacturing Engineer at Precision Resource
1 年Proud of you keep up the good work
Technology Leader, Solution Synthesizer ■ Sourcegraph, Head of Engineering
1 年Alex, it was a pleasure!
CEO at Sourcegraph
1 年It has been awesome having you here, Alex Bildner!