Colimit的封面图片
Colimit

Colimit

软件开发

Brooklyn,New York 79 位关注者

Autofix Failed Builds: AI-powered Root Cause Analysis for CI, turning ? into ?.

关于我们

Colimit is an AI janitor that does the boring work necessary to fix your CI builds, turning ? into ?.

网站
https://colimit.io
所属行业
软件开发
规模
2-10 人
总部
Brooklyn,New York
类型
个体经营
创立
2023

地点

Colimit员工

动态

  • Colimit转发了

    查看Larry Diehl的档案

    PhD | Formal Verification | Neuro-Symbolic AI

    The increased adoption of agentic (or dynamic AI-based decision making) software has opened new challenges for verification. Most folks are relying on integration-test-style "evals" to ensure that agents do what you expect, but a handful of us are looking at more rigorously verifying agentic correctness. The idea is to adapt ideas from formal verification (model checking, proof checking, etc), and see if they can be usefully extended to the more black-box and dynamic nature of agentic software. You can see this as a special kind of Neuro-symbolic AI, applied to verifying agentic correctness. The companies that I'm aware of that are trying to solve this problem include Colimit, Imandra, and Informal Systems. I also know that Erik Meijer is trying to address similar problems at his new startup. It's a small community so far, but if you know others please share them so we all can learn from each other :)

  • Colimit转发了

    查看Ivan Gavran的档案

    formal methods to make software correct; AI to make formal methods usable

    When to trust machines? Here is one loose but interesting analogy between autonomous cars and code synthesizers (cross-posting from https://lnkd.in/d-Cxzwym ): When will self-driving cars be ready enough for roads (without any restrictions)??There were many claims of accomplishing full autonomy, but they always turned out to be false. The main challenge lies in handling very rare scenarios, those never seen in the training data. And yet, how do we decide if a human is ready to drive autonomously? Well, we mostly give candidates a pen-and-paper exam, followed by an hour-long drive around their area. And that’s it, they’re?deemed fully autonomous. The paradox is obvious: contemporary AI systems would pass those exams easily. Why are we still full of doubt about them? It is because our criteria for trusting a human candidate is not based only on the fact that they passed the exam. We trust the candidate because they passed the exam?*and*?because they are human. Humans are predictable enough that we can extrapolate their general driving skill from their behaviour in that short exam. (And as soon as there are signs of unpredictability: e.g., a progressing short-sightedness, influence of alcohol, or similar, we limit the human driver’s autonomy.) **The case of AI programmers** Loosely analogous is the question of when to trust a program written by an AI agent. For a human programmer, we are often happy enough if there are some tests, witnessing that the program behaves correctly under basic scenarios and under predictable edge cases. Should we expect the same from (statistically) generated programs? Definitely not. Like in the case of driving, the tests are not the only reason we trust the program. They just support our idea of how humans typically think, and what mistakes they typically make. This lets us anticipate errors and judge when a set of tests is likely sufficient. An AI agent, on the other hand, could make much weirder and less predictable mistakes. This unpredictability is why AI agents must give us much stronger assurance of their programs’ correctness. For instance, by generating loads of tests. Better yet, by creating formal models (e.g., in TLA+, Alloy, or Quint) for the implementation, and connecting the two through?model-generated tests. In short, AI coders—like AI drivers—must do far more than their human counterparts, because of our scepticism of their failure modes. And the scepticism is fully justified, I believe.

    • 该图片无替代文字
  • 查看Colimit的组织主页

    79 位关注者

    A small but mighty change: We've now got a copy-patch-to-clipboard button if you want to quickly try out Colimit fixes locally. Your clipboard will now have a command that you can paste into your terminal to review, and then hit 'enter' to conveniently apply the patch: cat << 'EOF' | git apply --ignore-whitespace --- a/apps/web/trigger/echo.task.ts +++ b/apps/web/trigger/echo.task.ts @@ -6,7 +6,8 @@ ... EOF

    • 该图片无替代文字
  • 查看Colimit的组织主页

    79 位关注者

    Check out our latest development, trying to improve upon existing DX related to managing which files are in context for devtools, this time with a Git-inspired twist!

    查看Larry Diehl的档案

    PhD | Formal Verification | Neuro-Symbolic AI

    While working on Colimit, we've experimented with different approaches to handle file context management in LLM-powered devtools. The DX of most current solutions feels bolted onto existing IDEs, with clunky interactions. ?? One interesting approach that we landed on is applying Git's mental model to this problem. By separating viewing files (like browsing) from managing context (like Git's staging of changes), we can get two more focused modes of interaction: ??? View mode: See what's in context without risk of accidental changes ?? Edit mode: Add/remove files with familiar Git-like +/- visual indicators and the ability to undo (like un-staging) This separation let's you quickly see which files are important when debugging complex issues, and reduces anxiety related to accidentally deleting important file references from context. Even more interestingly, the set of files you settle on becomes a useful artifact that documents what you needed to understand the problem, something Git history alone doesn't capture (it only contains what ultimately changed). ↓ There's a more detailed blog post with screenshots and a YouTube video linked in the comments if you're interested. I'm also curious about what other similar approaches people have been trying?

    • Git-inspired File Context Management for LLMs
  • 查看Colimit的组织主页

    79 位关注者

    We're excited to announce Colimit's newly exposed Root Cause Analysis reports, Analysis Mode/Depth selection, and Deep analysis mode to debug the gnarliest of issues! This expands Colimit's functionality from autofixing shallow bugs, to now also helping you research why deep complex bugs are happening ??!

    查看Larry Diehl的档案

    PhD | Formal Verification | Neuro-Symbolic AI

    One major lesson everyone learned from the recent popularity of DeepSeek R1's exposed thinking is that transparency matters. At Colimit, we just took that lesson to heart and exposed the entire debugging process that's performed when we generate fixes for your failing CI builds (YouTube video walkthrough in the comments!) It's obvious in retrospect, but Engineers don't just want fixes; they want to understand, debate, learn, and interact. ?? What's New ?? 1?? Root Cause Analysis Reports:? Colimit now shares its internal logic: hypotheses, evidence (supporting/contradictory), confidence scores, and more. Like a code review for AI reasoning, so you can validate its logic or challenge its assumptions. 2?? Deep Analysis Mode: For perplexing bugs like race conditions, CI-only failures, and other layered bugs, Colimit recursively chases dependencies, critiques its own hypotheses, and debates the merits of alternative fixes. 3?? Analysis Mode Selection: Choose "Quick" (e.g., for linter errors), "Standard" (e.g., for most bugs), or "Deep" (e.g., for perplexing behavior). ?? Case Study ?? Recently Anchor.dev @anchor was debugging a failed CI build for 2-days straight (involving shared test-only DNS servers, parallel tests, mutexes, etc.), until Colimit's exposed analysis reports helped them: - Review 3 root cause hypotheses with confidence scores - Rank potential fixes by their merits vs risk/complexity - Paste Slack threads to refine context with pre-existing investigations The Result: They were able to isolate the root cause and fix it the next morning. ??? Engineers Hate Black Boxes ??? This isn't just about automation, it's about closing the gap between "fixed" and understood. It's important to: - Learn why a failing build happened, not just how to patch it - Use the AI's hypotheses as a starting point for your own investigation - Add team context (more logs, prior attempts) to steer the analysis ?? Why This Resonates ?? Tools that hide their logic breed distrust. Colimit's transparency turns fixes into teachable moments: so you debug with the AI, not just delegate to it. ?? Demo video walkthrough in the comments ↓ P.S. Big thanks to the Anchor team for trying out early versions of this tech and giving feedback on the value of visibility into the debugging process.

    • 该图片无替代文字
  • 查看Colimit的组织主页

    79 位关注者

    Support for fixing complex and deep bugs is in the works!

    查看Larry Diehl的档案

    PhD | Formal Verification | Neuro-Symbolic AI

    Feeling pretty proud today—Colimit's improved architecture is taking us from shallow bug fixes to deep bug fixes. A dev was stuck on a bug for two days straight before using Colimit for the first time to fix it the next morning. The bug was a gnarly race condition involving parallel access to a DNS server in CI—threads, forked processes, shared sockets, file descriptors… lots of painful printf debugging. Even worse, it appeared to only fail in CI and worked locally. But Colimit got to the root cause and helped fixed it! Now onto making the new architecture generally available. ??

  • 查看Colimit的组织主页

    79 位关注者

    We keep developers in their flow state, proactively fixing CI in the background while you move on to the next thing! ?? ??

    查看Larry Diehl的档案

    PhD | Formal Verification | Neuro-Symbolic AI

    Ever run into that moment where you’ve pushed your code, grabbed a coffee, and then realized CI failed because of a tiny oversight—like an outdated test expectation? It’s a real flow-breaker to switch back branches, rebuild the project, and fix something trivial. That’s why we built Colimit, an AI janitor for your CI. We focus on an often overlooked “pre-PR” stage, where small mistakes can slip through and cost you time. How it works: - If a GitHub Action suddenly fails, Colimit automatically picks up and implements the menial fix. - No need to interrupt your local flow—just click “Push Fix” in our web UI. - Stay focused on your new task while our AI takes care of the housekeeping. Here’s a quick demo: https://lnkd.in/eikkkzhp Let me know what you think—and if you’re curious, feel free to message me or start a free trial. Happy to answer any questions!

    Colimit: Autofix GitHub Actions

    Colimit: Autofix GitHub Actions

    https://www.loom.com

  • 查看Colimit的组织主页

    79 位关注者

    We just shipped a highly requested feature, being able to add additional context about the bug found or extra instructions for how to fix it, and regenerate the fix ?? And if you haven't seen it, here's a demo of how Colimit failed GitHub Actions autofixer works more generally: https://lnkd.in/e2kncw87

    • 该图片无替代文字
    • 该图片无替代文字
  • 查看Colimit的组织主页

    79 位关注者

    Big news here at Colimit HQ: our first public self-serve product is an autofixer for failed CI builds (initially GitHub Actions)!

    查看Larry Diehl的档案

    PhD | Formal Verification | Neuro-Symbolic AI

    Hi everyone, I've got a big announcement: My startup Colimit recently pivoted from being a bug-finding service to a bug-fixing service for failed CI builds (initially GitHub Actions). Links to the site + demo + discord in the comments :) What follows is a story about how this came about (likes/comments/sharing is very appreciated!): The initial goal was to make model-checking and model-based testing (and even formalization) more widespread by creating an accessible TypeScript-like modeling language, paired with a cloud that executed verification and tests at scale. We found real bugs for our design partners, giving them room to refactor with confidence. At first we wrote the models for them, but when trying to hand off the modeling we learned a hard lesson. Even if modeling in our DSL is approximately the same effort as writing tests, it's still something new to learn. For business uptake, any amount of extra work to be done quickly kills adoption/onboarding, even if the results are great. The answer was clear: Use LLM's to auto-model/auto-formalize: But that gives Colimit as a company even more work and less runway. Luckily, we noticed that during our demos one part of our product lit up people's eyes: The auto-generated "debugging notes", and the associated auto-fixer of the bugs that Colimit's bug-finder identified. With LLM's these days, this part is a lot easier to get off the ground. So we made a hard choice and reversed our roadmap: 1. First auto-fixing as as a service 2. Auto-modeling. Models act as "context compression" of a codebase, and can be checked for logical consistency much more quickly, prior to longer running builds and test runs. 3. Reintroduce bug-finding, now palatable via auto-generated models. The mission is the same, increasing software reliability by popularizing [semi]formal methods, but this prioritization allows us to grow as a business. Our current autofixer integrates with GitHub Actions and I'm excited to announce that we just launched it for Early Access! Right now it's mostly useful for tedious but frequently occurring bugs, like linting failures and misaligned expectations in basic unit or integration tests. As our auto-modeling takes shape, we will expand from shallow bug fixing to deep bug fixing, using formal models as our technical leverage. Please join us by checking out the site, demo, or joining our discord :)

  • 查看Colimit的组织主页

    79 位关注者

    Some demo videos of Colimit in action and some behind-the-scenes content about how it works are now available :)

    查看Larry Diehl的档案

    PhD | Formal Verification | Neuro-Symbolic AI

    ?? Folks have asked me what automatically testing an API with Colimit looks like, so in the spirit of #buildinpublic, here are a couple of videos! ?? If you think this kind of thing is cool, please like/comment/repost :D #testing?#api #rubyonrails ?? First, here is a value-centric demo of finding some bugs in an open source Rails API based on something similar to a Swagger spec: https://lnkd.in/eZdFC3GG ?? Second, if you're into behind-the-scenes content, here's a more rambly video about how our SDK works, our vision for spec-driven-development, and how theorem-prover-based testing complements LLM-based coding: https://lnkd.in/eyQBKh_v ?? Finally, if you think this type of tech could be useful to you, don't hesitate to reach out! We're running a limited capacity white-glove private beta, but also happy to connect with folks that might be interested when GA is ready.?

    Colimit: Testing a Rails API (Part 1)

    Colimit: Testing a Rails API (Part 1)

    https://www.loom.com

相似主页

查看职位