Our randomized controlled trial journey

Our randomized controlled trial journey

As companies and organizations grow and ossify, they tend to take fewer risks and make decisions more slowly. There's a logic to that—decisions that larger organizations make tend to affect more people's lives.

That said, from the outside it's often mysterious why certain things take so long. Is the problem internal to the company or a product of external circumstances? Startups bring clarity to this analysis. If a startup tries something and it takes a really, really long time the reason is (hopefully) not because decision-making at the company is too slow.

Let's get specific. When I co-founded Once it seemed obvious to me that out of the gate the best way to prove mission-alignment, demonstrate impact, and ultimately build a strong case for sales, was to run a randomized controlled trial. It seemed problematic and somewhat mysterious to me that larger education companies and organizations seemed to wait so long to publish results of rigorous randomized controlled trials.

Since large companies and organizations run so few randomized controlled trials on their products, there's just not a lot of rigorous education product research published. (To be clear there is a lot of rigorous education research produced in universities across the world each year, just not about the products used in the classrooms.) The most famous case in point these days for a large education organization not conducting rigorous product research on its own products is Lucy Calkins' work which was immortalized (in a bad way) by Emily Hanford 's popular podcast Sold a Story. There just wasn't efficacy research behind what had for so long been the dominant approach, curriculum, and materials for teaching US students to read. Those were the materials I taught middle school English with in NYC Department of Education back in the day. Those were the materials everyone I knew taught with.

I had assumptions about why rigorous education product research wasn't happening, particularly at larger companies.

First it's expensive. A randomized controlled trial costs hundreds of thousands of dollars. That's money that could otherwise go into product development or customer support or sales. But that didn't feel persuasive. An arms length relationship between the product company and the researcher seems preferable to a company spending their own resources on this research. All things being equal, wouldn't we prefer that a company isn't paying the third party researcher running their randomized controlled trial? The good news is that there are grant-funded researchers at universities who are looking to do randomized controlled trials at no cost to the product developer. We've been fortunate to work with Susanna Loeb 's fantastic team at National Student Support Accelerator for the last two years.

Second, the exact effect the researchers will find in the randomized controlled trial is uncertain. It may be zero. It could be negative. A lot of curricula and software products have barely discernible effects, often due to variation in implementation. Education is hard. If you're the VP of marketing at a for-profit or non-profit organization, why would you sign up for that headache? Well, if you're Goliath it's not rational to take that risk. But if you're David, there's less to lose.

Third, "the customers [school districts] don't care". "Research is a check box for district purchases", etc. etc. This feels chicken and egg to me. District leaders want to promote student learning. That's their job. If they were given persuasive research showing that option A is a slam dunk and option B will have little to no effect, they'll choose option A. The problem is that there just aren't a lot of slam dunks in the product research out there. District leaders have to read through pages and pages disclaiming and qualifying minimal effects, and when you do that for long enough, research does just become a check box.

I wrote several posts about these problems on LI. I pointed out that in healthcare, an industry, like education, with high stakes and opaque financial processes, regulator-reviewed randomized controlled trials are a requirement for bringing a product to market. Pharma's exorbitant costs notwithstanding, the amount of R&D in healthcare over the last 50 years is orders of magnitude greater than education and the impacts are far greater than what we've seen in education, at least if you use NAEP as a measuring stick for education. While I won't claim that the lack of rigor in education product research is the only, or even the primary, cause of these disparities, it does seem like a cause, and something a startup, unburdened by layers of risk averse management, could tackle.

Wait a second, Jason Becker wrote back on those posts. Education is psychology, it's not biology like medicine. What works in one place might not work in the next place. Results won't scale the way we want them to so randomized controlled trials aren't the measuring stick we believe they could be. A more optimistic take would also point out that product development isn't static. If you plan a product in year 1, develop and optimize it in year 2, run a randomized controlled trial in year 3, and get back the results in year 4, we're looking a long way back in the rear view mirror. That's why we don't prefer to buy computers with specs from 4 years ago even if they were performant 4 years ago.

All that being said, it seemed reasonable to expect that a good intervention would produce a strong effect. Maybe not the strongest effect in every implementation, but a strong effect on average and you could optimize that effect over time.

So we partnered with Susanna Loeb 's team to run our first randomized controlled trial in a large urban district. Susanna's team is the best. It's a privilege to work with all of them. Accelerate made that work possible by funding the implementation in a way that we could not have otherwise done as an early stage startup. We believed that 150 randomly selected students in treatment and 150 randomly selected students in control would be sufficient to power the study.

It quickly became clear to me that random selection is an ethical quagmire in districts. If you randomly select a student in the 50th percentile for treatment and a student in the 13th percentile as a control, and your intervention is effective, you are purposefully not serving your neediest students. Utilitarianism aside, you are now asking a district leader to do, for a year, the opposite of their job, which is to promote student learning, particularly for the students who need it most. So we agreed with the district that we would select treatment and control students from a pool of students who initially tested below benchmark on beginning of year diagnostics. Unfortunately at the individual student level, kindergarten diagnostics are imprecise, so as the randomized controlled trial begins, you still have school staff looking at the students being treated vs the students not, and asking, how were those students chosen?

If Once were a pure software company and certain people at a school participating in a randomized controlled trial looked askance at which students were getting treatment and which ones weren't, we might not have the smoothest implementation, but it wouldn't be a dealbreaker. But we're not a pure software company. At Once, we train school support staff--paraprofessionals, aides, assistants, interventionists--to provide daily one-on-one reading instruction to kindergarten students. If the school isn't 100% on board with which students are being served, those staff members will be frequently repurposed as substitute teachers covering other classrooms or pulled out for other responsibilities. That first randomized controlled trial didn't achieve statistical significance in part because students who were served received only a small fraction of the instructional sessions they should have. We can split hairs and point out that the effect size per day of instruction was actually on par with what we'd expect from high dosage tutoring, but it's less interesting to extrapolate what might have happened if students had received the full year of instruction than to actually run a study on kids who receive a full year of instruction.

So this past year we redesigned the randomized controlled trial to randomize by school and by class instead of by student. I'm very curious to see the results as they come back, but I don't know that this study redesign solved all of the logistical problems of the first one. At the beginning of the year, inspired by the vision of having kindergarten students well above grade level by the end of the year, school leaders want to implement Once with as many students as possible. The researchers take those plans and run the randomization. Then the year begins and a teacher leaves and a paraprofessional needs to cover their class. The students that paraprofessional was teaching through Once are designated as treatment, but aren't actually receiving Once, which muddies the results...

The advice I've gotten from many corners is to stop focusing on randomized controlled trials if they're so hard to implement in our context. If we had taken that path I probably wouldn't have written this post and wouldn't have put Stanford's first randomized controlled trial write-up on our website. But I'm writing this post, and I'm committed to randomized controlled trials (and sharing their results) for three reasons.

First, other results we've seen this year are so incredibly strong. Last month, Dr. Rachel Schechter at LXD Research published a study using i-Ready showing that kindergarteners using Once at three schools that delivered Once to every kindergarten student (168 students) achieved a median percentile rank at the end of this school year that was 29 points higher than it was at the beginning of the year.

I'll say that again. At the beginning of the year, the students' median rank nationally was at the 38th percentile (i.e., they scored worse than 62% of the nation’s kindergarteners), but, by the end of the year, their median rank was at the 67th percentile (i.e., better than all but 33% of the nation’s kindergarteners). If these students had made a year’s worth of growth (which is sadly not the norm for so many American students), their percentile rank would have remained at the 38th percentile. Instead, these students accelerated well past their national peer group.

Second, I believe K-12 education deserves some knock your socks off randomized controlled trial results and I want to serve them up. Yes education is more psychology than biology (harder to standardize, harder to scale), but the impact of education on people's lives is no less than health and is often directly related to health in complex causal relationships.

Finally, I think it's riskier not to run randomized controlled trials than it is to run them. If you run a randomized controlled trial that doesn't hit statistical significance you gain information you can use to design a better trial and build a better product. If you don't run randomized controlled trials you risk building a large company or organization that has lots of happy adult customers but serves students only minimally or not at all. We've all seen that happen in literacy. Personally, I don't want to be part of another chapter of that.

If you're as fired up about this topic as I am, here's how you can help. We've got an entire school year to plan the next phase of our randomized controlled trial. We'll need willing district and school partners. If that's you please let me know. If that's not you it's probably someone in your network. Please share with them. And we'll need financial support for this next phase of research which I predict will be more costly but ultimately yield results that will be worth it. Again, if that's you, please let me know. And again, if that's not you, it's probably someone in your network. Please share!

Doug Roberts

Founder/CEO at Institute for Education Innovation

1 个月

Interested. So what's in it for the districts? We might be interested to help. Please email me [email protected].

回复
Andrew Poggio

EdTech Leader | Researcher | Consultant

2 个月

What an inspiring and candid account of your journey w/ RCTs! It’s clear that your commitment to rigorous research and transparency is setting a high standard in the industry. The challenges you’ve highlighted resonate deeply—especially the ethical complexities and logistical hurdles that often make RCTs so elusive in education. While RCTs undoubtedly provide a "gold standard" of evidence, after 20+ of pursuing this work I find myself leaning heavily on the importance of evaluating programs under less-than-ideal, real-world conditions these days. After all, in the day-to-day of education, variables are rarely controlled, and outcomes are influenced by a myriad of factors. Perhaps there’s equal value in understanding how products perform in these natural, messy environments where educators and students operate. It’s not just about proving what works in theory, but ensuring it works in practice—across diverse contexts and challenges. Your work is a crucial part of any evidence portfolio, and I applaud your dedication to pushing the envelope. At the end of the day, blending the rigor of RCTs with practical, on-the-ground evaluations might just be the comprehensive approach we need to truly drive impactful change in education.

Michael Rutkowski

VP Marketing I Leadership I EdTech I Startup to Post-IPO

2 个月

Ben Gibbs Jacob Wixom check this out. Think you'll find it interesting.

Allison Maudlin, MBA

Marketing and Strategic Partnerships in EdTech

3 个月

This is a must read for districts and edtech vendors looking to partner on RCTs. This is hard. Like, really hard. I've managed a few studies and there were hurdles and bumps every step of the way. Even when we tried giving the product away for free, some districts couldn't navigate the operational processes that RCTs require. And that is completely understandable given the lack of resources and staff districts are dealing with every day. As a marketing leader, as the article points out, it's a risky and expensive endeavor without guaranteed results. Good results are gold. Bad results... well... we don't talk about those. There are excellent research partners that make the processes easier (looking at you LearnPlatform by Instructure) but at the end of the day, it's the relationship between the district and the vendor that makes the difference. This process requires months and months of planning. I woudn't try to do an RCT without six months of planning, minimum. As the article says, if you want to do an RCT, starting planning now for Fall 2025, you and the district need the time.

Karim Kuperhause

Ex-classroom teacher, currently VP of Growth for Hoot Reading. On a mission to change children's lives through literacy.

3 个月
回复

要查看或添加评论,请登录

Matt Pasternack的更多文章

  • How can districts respond to the cyber charter earthquake?

    How can districts respond to the cyber charter earthquake?

    If you read K-12 headlines, and I asked you what's the biggest earthquake hitting schools this year, you might say:…

    4 条评论
  • Tutoring programs are not widgets

    Tutoring programs are not widgets

    Matthew A. Kraft Beth Schueler Grace Falken just released an important meta-analysis of 282 randomized controlled…

    5 条评论
  • Tech and Edtech

    Tech and Edtech

    When I was getting started in education reform, there was Linda Darling-Hammond saying "STOP". When I was getting…

    5 条评论
  • A tale of two studies

    A tale of two studies

    For those of us interested in large-scale social science research, this summer was dizzying. The promise of this…

    2 条评论
  • In defense (praise?) of silver bullets

    In defense (praise?) of silver bullets

    I still remember my first day of my first job out of the classroom in 2007. After 3 years of teaching middle school ELA…

    6 条评论
  • "What did we get for it?"

    "What did we get for it?"

    If you haven't yet listened to Michael Horn, Diane Tavenner and Stacey Childress's most recent Class Disrupted podcast…

    11 条评论
  • Writing Well

    Writing Well

    In business we generally preach concision, concision, concision. Ok, maybe just one concision :) Mostly it's because…

    1 条评论
  • The jig is up

    The jig is up

    It's counterintuitive that K12 education has numerous studies demonstrating programs and products with strong positive…

    12 条评论
  • Is assessment holding back the Science of Reading?

    Is assessment holding back the Science of Reading?

    I've wanted to write this post for a while, but haven't found the right impetus. I'll settle for a recent update from…

    5 条评论
  • Pandemic Lessons Learned Part I

    Pandemic Lessons Learned Part I

    One of the most striking legacies of the COVID pandemic in the US was that the more time students spent in remote…

    4 条评论

社区洞察

其他会员也浏览了