Rubber Ducky Labs (YC W23)的封面图片
Rubber Ducky Labs (YC W23)

Rubber Ducky Labs (YC W23)

软件开发

San Francisco,CA 1,058 位关注者

Better product metadata in minutes.

关于我们

We built Rubber Ducky Labs to help e-commerce teams effortlessly improve product discovery through better metadata. Our tool enables non-technical users to leverage multi-modal AI on product catalogs—just upload a CSV and start tagging metadata in minutes.

网站
https://www.rubberduckylabs.io/
所属行业
软件开发
规模
2-10 人
总部
San Francisco,CA
类型
私人持股
创立
2022

地点

Rubber Ducky Labs (YC W23)员工

动态

  • 查看Alexandra Johnson的档案

    Founder and CEO of Rubber Ducky Labs (YC W23)

    E-commerce companies: how are you marketing to your customers on St Patrick's Day? ?????? With Rubber Ducky Labs (YC W23), it only takes minutes to find and tag every St Patrick's Day item in your product catalog. You could use this to: ?? Send a push notification linking to a custom St Patrick's Day collection ?? Generate SEO traffic to a St Patrick's Day landing page ?? Create a St Patrick's Day email campaign with personalized product recommendations Consumers love fresh content. Get started early on your next marketing event or occasion, and DM / comment for a demo.

    • 该图片无替代文字
  • Rubber Ducky Labs (YC W23)转发了

    查看Alexandra Johnson的档案

    Founder and CEO of Rubber Ducky Labs (YC W23)

    For Rubber Ducky Labs (YC W23)'s recently launched product metadata tagger, I let AI take the lead with infrastructure. Do you think it went well? It did not. We built a system that uses AI to add tags -- like "St Patrick's Day", "Summer Weddings", or "Beach Vacation" -- to product catalogs. The infrastructure sounds simple: farm out tagging jobs to a queue, and have a lambda function download images, call gemini, and write results to a database. Using AI to write terraform code for our AWS deployment was not so simple. In some of the more egregious mistakes I saw while letting AI take the lead: ?? orphaning an entire production deployment in us-east-1 ?? circling through the same three fixes while debugging networking and IAM ?? continually re-setting our database to allow public access AI isn't ready to take the lead on writing infrastructure code. Unlike on frontend, iterations can take minutes. Networking and security permissions are too nuanced and poorly documented for code-gen AIs to have enough training data too operate correctly. And security vulnerabilities are very costly. What should you do instead? ?? Read every line of code ?? Frequent end-to-end testing ?Let Rubber Ducky Labs handle your product metadata tagging infrastructure I know it's a shameless plug, but if you're interested in building or using something like this, we'd love to give you a demo. DMs open.

    • 该图片无替代文字
  • Quack!!!

    查看Alexandra Johnson的档案

    Founder and CEO of Rubber Ducky Labs (YC W23)

    ?? We just launched! Rubber Ducky Labs (YC W23): Better Product Metadata in Minutes is live, and John McDonnell would love to show you what we've been working on. The Problem E-commerce companies often struggle with inconsistent and unreliable product catalog metadata. Whether data comes from fluctuating merchandising partners or manual labeling, keeping it fresh and relevant is a constant battle. This mismatch creates a scarcity mindset around data—negatively impacting search, recommendations, and overall customer experience. The Solution We built Rubber Ducky Labs to help e-commerce teams effortlessly improve product discovery through better metadata. Our tool enables non-technical users to leverage multi-modal AI on product catalogs—just upload a CSV and start tagging metadata in minutes. And there's a lot of ways that you can use this metadata: ?? Search: Expand culturally relevant keyword coverage. ?? Recommendations: Recommend seasonally appropriate items. ?? Marketing: 100x your whimsical campaign creativity. ?? SEO: Boost rankings with thousands of category pages. If you see potential for your own team,?book a meeting?to see our demo and get started: https://lnkd.in/gpFsCh_t

    • 该图片无替代文字
  • Rubber Ducky Labs (YC W23)转发了

    查看John McDonnell的档案

    AI & Recsys at Rubber Ducky Labs

    One cool feature of LLMs is you can get zero shot labeling based on any english sentence. For example, I can ask ChatGPT "is this dress appropriate for Valentine's day" and expect a somewhat reasonable answer (see pic 1). This is pretty fun to do! I can get an ad hoc label for any category I can think of, and it doesn't have to correspond to obvious features of the image. Lake house weekend? Trip to New Orleans? Get labels instantly, no training set required. Alexandra Johnson and I tagged thousands of inventory items this week for an ecommerce client and had a blast. However, it turns out if you run this sort of prompt on a thousand different dresses, odd patterns pop up. For example, in the Valentine's example, 30% of the time Claude responds with "65". "95", the highest score observed, pops up 15% of the time (see pic 2). Scores like this can be a nuisance because Claude make no distinction in rank among big chunks of items: If I wanted the top 5%, I have to randomly select among the items that got 95. And every single rating was a multiple of 5; Claude made no attempt to use the full range of scores available. I've seen similar patterns on GPT-4. Also, anecdotally, we have seen big differences in score distribution across models, some models are harsh graders and others are more lax. Important questions for working with zero shot classification: ? Is there a way to get a proper ranking instead of just chunky scores? ? What's the best way to evaluate the quality of what we're generating? We don't need labeled examples to generate but do we need them to evaluate? ? What variables matter? Is it better to use a big model or pick a smaller one and save compute? Is CLIP just as good? I'd be curious if anyone out there has experience doing this and I'd love to talk shop if so!

    • 该图片无替代文字
    • 该图片无替代文字
  • Rubber Ducky Labs (YC W23)转发了

    查看Alexandra Johnson的档案

    Founder and CEO of Rubber Ducky Labs (YC W23)

    It’s been a pleasure working with Laura Rose (Bloxham) Barr on the legal side of Rubber Ducky Labs! But it was also exciting to get a chance to dive back into the founding story, the product, and how, almost 15 years after learning about recommender systems, I’m still excited by using that tech to solve problems for consumer companies. Thanks for capturing our conversation!

  • 查看Alexandra Johnson的档案

    Founder and CEO of Rubber Ducky Labs (YC W23)

    How do I get started learning about recommender systems? Glad you asked! Here's a dump of a few articles that John McDonnell and I pulled out the last time we had the chance to talk about this. First, this oldie but goodie Google paper that produces the (by now) well-known image attached to this post, reminding us that machine learning is more than just model building: ?? https://bit.ly/googlepager As a software engineer, I love this blog post from NVIDIA because it lays out the systems architecture for a recommender system: ?? https://lnkd.in/gh_Kz3BK If you already understand a bit about evaluating machine learning models, learning about NDCG and other information retrieval metrics helps you translate from ML -> RecSys: ?? https://lnkd.in/ghdsgkUb) keep This paper from 2003 is the OG in RecSys algorithms: ?? https://lnkd.in/gA7h3ZgU keep Similar topics are covered in this fast.ai collaborative filtering tutorial: ?? https://lnkd.in/gctzuRUz What resources do you keep returning to? Let me know in the comments!

    • 该图片无替代文字
  • Rubber Ducky Labs (YC W23)转发了

    查看Alexandra Johnson的档案

    Founder and CEO of Rubber Ducky Labs (YC W23)

    Do you know how long you'll need to run an A/B test to reach statistical significance on your latest recommender system improvement? I'll tell you how to figure it out! If you've been following along in the last two days of pre-test planning checklist posts, then you have a great idea of 1) what metric you're tracking, and 2) how much lift you want to see. Plug the values from the first two questions into an A/B test calculator, like the one in Evan’s Awesome A/B Test Tools (leave alpha and beta at the default value). This gives you a sample size per variation. ?? (A/B Test Calculator) https://lnkd.in/gXh4ggfA The definition of your metric will tell you what a “sample” is – a user, a session, a search query. Assuming you’re running a standard A/B test with one control and one test variant, you need to observe two times the sample size number of users / sessions / queries before you declare victory. So, go into your analytics tool, and see how many calendar days it took you to gather 2 times the sample size number of users / sessions / queries in the past. That’s how many days you should plan to run your test for. Are you upset with your answer? Will it take more than two weeks to reach significance? Read our blog post to learn what to do about it! ?? https://lnkd.in/g-XX-tSi Want to work with the experts? DM me to talk about leveling up your recommender system!

    • 该图片无替代文字
  • Rubber Ducky Labs (YC W23)转发了

    查看Alexandra Johnson的档案

    Founder and CEO of Rubber Ducky Labs (YC W23)

    Yesterday, today, and tomorrow, I'm sharing how to successfully plan A/B tests, with a focus on ML. Let's dive in with question two of three on the A/B test pre-planning questions. ? What’s the minimum win I need with this metric to make it worth investing in launching this experiment to production? 1%? 5%? 20%? From our experience in ML, we’ve seen teams launch A/B tests on prototypes that will require a significant productionisation effort. When the test doesn’t achieve a big enough win to get leadership buy on actually doing the production work, the team either has to maintain the shitty prototype infrastructure, or see the project die on the backlog. In larger cross-company efforts, we’ve seen changes to recommender systems boost engagement, but lower revenue ever so slightly. You don’t want to deal with an angry ads executive after finishing a test – and you don’t have to! Negotiate with your team and the teams around you BEFORE launching a test. Ask, Are we ok with incurring X cost for Y win? Follow along for the next post tomorrow, or jump ahead to the blog! ?? https://lnkd.in/g-XX-tSi Want to work with the experts? DM me to talk about leveling up your recommender system!

    • 该图片无替代文字
  • Rubber Ducky Labs (YC W23)转发了

    查看Alexandra Johnson的档案

    Founder and CEO of Rubber Ducky Labs (YC W23)

    Over the next three days, I'll show you how to plan A/B tests. Before running a test, there are three key questions you should answer. Today, we'll start with the first one. ? What is the exact metric I want to move, and what is its current value? We’ve talked with multiple teams who cannot measure their primary test metric split out by A/B test variant. Any number of reasons, from inconsistent metric definitions, to missing allocation assignments, to “was it revenue per user or revenue per session?” can lead to confusion. We’ve also seen misalignment between PMs and leadership, or across teams, where two groups disagree on which metric a team should target. This lack of clear goals usually leads to a post-test analysis paralysis – unable to launch yet unable to fix iterate. Go into your database and calculate your control metric and your test metric, or build a dashboard in your analytics tool of choice BEFORE you launch a test. Follow along for the next post tomorrow, or jump ahead to the blog! ?? https://lnkd.in/g-XX-tSi Want to work with the experts? DM me to talk about leveling up your recommender system!

    • 该图片无替代文字
  • 查看Alexandra Johnson的档案

    Founder and CEO of Rubber Ducky Labs (YC W23)

    Founders, software engineers, and product managers: you're probably running A/B tests wrong. I've been there! My CS degree didn't teach me how to properly plan and run A/B tests. Sure, I could get the infrastructure right, create feature flags, and allocate traffic. But I didn't understand basic concepts, like planning out how many weeks a test would run for. Rubber Ducky Labs (YC W23) design partners need to run A/B tests to evaluate recommender systems, so I've gotten to learn a lot about what to do -- and what not to do! This is the guide that I wish I had a decade ago when I was building recommender systems in fashion tech. ?? https://lnkd.in/g-XX-tSi Want to see how this applies to your team? DMs open!

相似主页

查看职位

融资