We're thrilled to announce Tilde. We’re building interpreter models and control tech to unlock deep reasoning and command of models, enabling the next generation of human-AI interaction. The bottleneck in AI today isn’t intelligence; it’s communication. Today’s interactions rely on black-box prompting, an inherently lossy method. Imagine trying to optimize a software system without access to its source code—it’s not only inefficient, but absurdly limiting. Our vision at Tilde is to go beyond these constraints with applied interpretability: we’re building solutions to reveal a model’s inner workings, enabling us to directly steer behavior and enhance performance beyond what’s possible with traditional post-training or fine-tuning methods. So far, our interpreter models have already enabled improvements in reasoning with Llama-3.1-8B and more fine-grained control in generation with S?OTA text-to-video models. By advancing a new paradigm of insight and control, we’re creating AI that can tackle tasks genuinely beyond human reach—elevating, not replacing, existing methods. Our team combines unique backgrounds, spanning academia and both small and large AI infrastructure companies, all driven by a shared goal: achieving a deeper understanding of the universe within these models. If this sounds interesting to you, come build the future with us: tilderesearch.com/join
关于我们
Interpretability is the most important problem in the world. We build moonshot, applied interpretability solutions to make maximally safe and performant AI.
- 网站
-
https://www.tilderesearch.com/
Tilde的外部链接
- 所属行业
- 科技、信息和网络
- 规模
- 2-10 人
- 总部
- Palo Alto,CA
- 类型
- 个体经营
地点
-
主要
US,CA,Palo Alto,94305
Tilde员工
动态
-
Over the past few weeks, we've been using this graph theory problem in interviews and figured we'd open it up to everyone here: https://lnkd.in/gXGdMpYW If you solve it, we’ll move you directly to the last rounds of our process! And if you don’t like graph theory, but do like interpretability, we have plenty of other fun problems so feel free to email us [email protected]. We are doing a lot of applied interpretability work like this: https://lnkd.in/gGEEWYQC, which was the first application of SAE-based interventions to a downstream task that outperforms classical baselines
-
?? Can Mechanistic Interpretability Be Useful? ?? At Tilde, we’ve been exploring how Sparse Autoencoders (SAEs)—a mechanistic interpretability tool for surfacing monosemantic, interpretable features in AI models—can directly improve downstream performance. Today, we’re excited to share Sieve, our latest research demonstrating the practical power of interpretability (joint work with Adam Karvonen). Read our post here: https://lnkd.in/gGEEWYQC Our work focuses on a real-world challenge provided by Benchify, a YC-backed code-generation company. The task? Write code to fuzz test Python functions under strict constraints (e.g., no regex usage). Unfortunately, models often fail to follow instructions when operating in long-context environments with multiple constraints. Here’s where SAEs come in: ?? We can use mechanistic insights surfaced by SAEs to intervene and improve instruction following in code generation, demonstrating Pareto dominance over existing techniques like steering and prompting. ???Why this matters ?? This is the first time mechanistic interpretability has counterfactually enabled a downstream task, demonstrating that SAEs can do more than provide insights—they can directly drive performance improvements. By leveraging interpretable features surfaced by SAEs, we: 1?? Conditionally targeted failure modes (e.g., regex usage) without disrupting other behaviors. 2?? Achieved surgical precision, with over 99.9% activation accuracy, preventing interference with unrelated tasks. We delivered ?lightning fast? interventions that require less than 20 minutes end-to-end to implement, with minimal additional overhead. ???Results ?? Using GemmaScope SAEs (for Gemma) and custom lightweight SAEs (for Llama), we compared against strong baselines across thousands of code generations. The results? Conditional SAE-based interventions showed clear Pareto dominance: - 99.9% precision in eliminating regex usage. - Models remained unaffected during standard downstream tasks. This means we can precisely target undesirable behaviors without compromising performance—a breakthrough in controllable AI. ?? In the spirit of open science, we’re sharing all models, data, evaluation results, and analyses. Mechanistic interpretability is often seen as theoretical, but our work shows how it can deliver practical, actionable benefits. This research is an exciting step toward closing the gap between theory and application, empowering fine-grained control over AI models. Let us know your thoughts—we’d love to hear your feedback and ideas!