A brief history of Data at Coinbase (a small step towards the Web3 data era following Data 3.0)

A brief history of Data at Coinbase (a small step towards the Web3 data era following Data 3.0)

[Disclaimer] The present document has been adapted from an initial version previously shared exclusively within Coinbase. In order to ensure its suitability for wider consumption, all company internal links and confidential information have been excluded. Additionally, I have incorporated some previously published external content, as well as relevant elements that I believe may be of value to the general public.


Preface

I joined Coinbase as VP of Data in Sep 2018 (after leaving LinkedIn as the Global Head of Data science) at the beginning of a crypto winter, with the aim of “leveraging existing technologies like machine learning and AI, as well as creating data innovations for emerging blockchain use cases to keep up with the ever-changing industry landscape. I look forward to advancing the company’s leadership position in the crypto industry through the power of data and will share key learnings along the way” (full post on Medium). I think in general we have done a reasonable job for the past four plus years. If you are interested in how we got here, here’s the timeline and major events:


2018 - Creation of Data 3.0 vision; a few early key decisions; and ramp up experimentation efforts as the first focus


Snowflake - Upon joining Coinbase, one of my initial actions was to secure a contract with Snowflake as the centralized data warehouse platform, replacing another provider. In contrast, Snowflake offered the innovative concept of cloud-based data compute optimization, with a seasoned team and state-of-the-art technology that I had been familiar with for some time. Under the leadership of our Data platform team, we were able to seamlessly migrate? within a mere two weeks. The outcome was a stark improvement in both performance (e.g. query speed and failure rate) and cost-efficiency, a rare occurrence in the industry, i.e.“getting both sides of the coin”. This served as a precursor to the numerous achievements we would later attain at Coinbase.


Looker - Then very quickly, I was tasked with the decision of selecting a Business Intelligence (BI) platform. I decided to go with Looker, due to its advanced technical capabilities such as Github sync, and the ease of migration to another platform potentially in the future (it was a rare capability a BI platform provided back then). While this decision was met with some controversy, I firmly believed that it would alleviate future difficulties.


Understand Causation vs. Correlation: A/B test/Experimentation - Another thing I noticed immediately after arriving at Coinbase was a lack of standardization in data analysis and its application in decision-making, leading to inconsistent quality in decision-making. Recognizing the crucial role this played in our business growth, I initiated a "Bi-weekly experimentation review" to educate our PMs and EMs in the basics of experimentation and review key product launch decisions. Additionally, I formed a team of three (one full-time, two part-time) to develop a basic experimentation platform, named CIFER (ie. our Causal Inference Framework) , which we launched as an MVP in a matter of months. This allowed us to consistently measure the impact of product experiments and make informed decisions, resulting in enhanced user experience and improved business outcomes. The platform was highly successful, becoming the most visited go link and remaining in the top spot for years.


Data certified dashboards - along with the efforts on the experimentation side, we also need to make sure we look at a solid set of metrics to monitor our business health on a daily basis and understand the drivers for large changes . So we started the effort to have a list of what we call “Data certified” dashboards which mean that they are actively maintained and their data quality checked so stakeholders can read and use the information with confidence (it’s a lot harder than it may look). It’s a jump-start effort to bring focus to a small set of dashboards that we all look at vs. thousands of dashboards that people must search through and guess which may be reliable. At that time, It quickly brought clarity to a chaotic information landscape.?


Born of the Data 3.0 strategy -? with a series of decisions and efforts above, I had the opportunity to look across every aspect of the data stack at Coinbase back then with first hand experience, and after a few iterations, the Data 3.0 strategy came to life, presented and signed off by the leadership team in Nov. The vision is to “Make Data a competitive advantage unique for Coinbase to win in the crypto economy” through “, via the strategy to build cohesive and efficient data systems working seamlessly together to enable “scaling impact through automation and intelligence”, which has been guiding all our efforts around data for the years to come.?


2019 - Demonstrate the power of data, and build the new foundation and architecture, ML major debut at Coinbase?

Soon 2019 came, with all the efforts from 2018 (now looking back, it’s amazing how much we were to achieve in just less than 4 months) and renewed determination, we set our sights on new opportunities to further enhance the value of our business.??


How to communicate business results with data - In our efforts to establish a solid foundation for data analysis and interpretation, we implemented a comprehensive approach that included experimentation, certified dashboards, and a set of common principles for data communication. While the technical aspect of data management was relatively straightforward, we faced a more challenging task in fostering a culture of data-driven decision-making. To achieve this, we created a playbook of principles and examples, and provided live training sessions for product and marketing teams. This approach was met with great success, as evidenced by the rigorous use of data in decision-making throughout the company and external recognition from Brian’s tweet.

???

Fighting Fraud with ML - Combating fraud is a crucial challenge in the crypto industry, as the high volume of money involved attracts fraudulent activity. Our fraud team had been utilizing manual efforts and heuristic-based rules, but the team was overwhelmed and unable to keep pace with the increase in trading volume and influx of fraudsters on our platform. I was brought on as the Head of Payments Risk (in addition to my “normal” job as VP of Data), drawing on my previous experience in fraud prevention and mitigation at eBay. My primary objective was to stabilize the team and introduce more rigor to our operations. I recognized that the key to success in this endeavor was the development of a machine learning model that could detect and prevent fraud in real time. I set this as a priority and convinced our only ML engineer to focus all his efforts on developing our first fraud ML model at Coinbase (who would go on evolving fraud ML at Coinbase into state-of-the-art, see this blog). Through tireless efforts, we were able to launch the model by the end of Q1, resulting in a steady decrease in fraud loss, but also allowed for an increase in traders and revenue on the platform (again, another example of “getting both sides of the coin”). This success put us on par with leading fintech companies such as PayPal and Cash App in terms of fraud loss rate, while requiring significantly fewer resources. Additionally, the project led to the creation of Nostradamus, Coinbase's in-house machine learning platform (and the whole effort took only 1.5 engineers ), which has since been expanded to serve all ML models at Coinbase (this effort also led to the creation of FAL.ai, an AI startup to help you easily create ML models at scale).


Amping up Data Science practices - On the data science front, we implemented several initiatives that greatly improved our “craft”. One such initiative was the data semantics layer, which established a standardized set of YAML files to define all business and experimentation metrics, including ML features (see the blog later published here). This removed a significant amount of operational burden from data scientists, allowing them to focus on more strategic insights that drive the business forward. Additionally, this standardization resulted in timely delivery of critical business insights to our leadership team and across key product operations. It also eliminated discrepancies from duplicated or similarly defined metrics. As a result of this initiative, we were able to conduct in-depth analysis of our product ecosystem, providing data-driven recommendations to improve user experience and grow the business (e.g. Recurring buys, Notifications and Premium subscriptions, etc). This analysis later became a tradition for our data science team, providing our leadership team with a renewed high-level view of our product landscape and influencing critical product strategy decisions. On the experimentation front, we continued our bi-weekly experimentation review series and incorporated education content on experimentation into our engineering bootcamp, ensuring that every engineer learns how to set up, run, and conclude an experiment as part of their onboarding.


2020 - Transform our game from defense to offense on data

Revamping data infrastructure - As we entered the 2020 planning phase, I recognized that it was time to shift our focus from defense to offense and to invest in revamping our data infrastructure to be ready for the future. I initiated Project Renaissance, which involved cleaning up technical debts and optimizing our systems, as well as building a foundation for the integration of streaming capabilities using the Kafka ecosystem. It was challenging to gain support for this project, as many believed that other solutions would suffice and the concept of near real-time data services was not yet well-known at Coinbase. However, we started the project with a small team and were able to create a MVP within a quarter, demonstrating its potential. As we continued to develop and implement more use cases, particularly in the area of machine learning, the value of the project became increasingly apparent. Today, we have a robust ecosystem of Kafka services at Coinbase that are widely used and deeply integrated into many services, yet often go unnoticed, a testament to its smooth and seamless operation (like it or not, it’s almost always a good thing for a key piece of infrastructure being used all the time without being talked about it).


Created an ML brand for Coinbase - The application of machine learning in the Web3 world was a relatively new and uncharted territory (I’d say it is the case even today), and Coinbase was among the first to implement it in critical production usage, specifically in our fraud models, with great success. We continued to expand our use of ML across various internal and external product surfaces, including notifications and marketing campaigns. In order to attract more talent and establish ourselves as a leader in the field, we began to create an ML brand for Coinbase, including a keynote speech at KDD, one of the most prestigious conferences for ML, and publishing a technical blog detailing our efforts to build state-of-the-art ML technology with efficient execution.


Continued to amplify a data culture at Coinbase - Cultivating a strong data culture takes time and consistent investment, and we made efforts on several fronts to continue to strengthen the data culture at Coinbase. We created a "Data Day" as part of the Engineering bootcamp during engineering onboarding to cover various technical topics, and launched analytics training for product managers that focused on consuming data in the right way and self-service. Additionally, we introduced the concept of "Core Metrics" to help leaders focus on the most critical aspects of Coinbase's businesses. We officially launched the "Core Metrics Monthly Meeting" series, which brought together PED leaders and executives to discuss key business trends, major drivers, and action items to facilitate timely decision making and effective collaboration across teams. This greatly improved the operational rigor for PED teams and became a standard practice in running our businesses.?


2021 - Fruition comes, along with upgrades needed as a public company

Data Governance: 2021 marked a significant milestone for Coinbase as we embarked on our journey as a publicly traded company. This event, undoubtedly one of great magnitude, required significant preparation and adjustments to ensure compliance with the various regulations and requirements that come with being a publicly traded company. One of the critical tasks that needed to be addressed was the implementation of robust controls for the management and access of Material Non-Public Information (MNPI) data. Despite the tight timeframe and the lack of prior planning, our team of experts from various departments, including Data, Legal, Compliance, Security, Finance, and Engineering, came together to develop a comprehensive strategy, establish clear milestones, and execute efficiently. The result was the successful rollout of our MNPI policy and controls in time for our public debut, enabling us to confidently navigate the complexities of being a publicly traded company.


Deep Bull Run Insights: As part of our ongoing efforts to shift our focus from defensive to offensive strategies in the realm of data, the Data Science team has undertaken a comprehensive research initiative called "Deep Bull Run Insights". This research delves into the key aspects of the recent bull run that began at the end of 2020, which reached its peak during Coinbase's public debut in 2021, and compares it to previous bull runs. The findings of this study are not only highly informative but also fascinating.


New chapter for ML - smart notifications: On the machine learning front, we achieved a notable breakthrough in optimizing the user experience with ML. By leveraging a machine learning-based intelligence engine, we were able to introduce price alerts for our members that are both timely and unobtrusive, a capability that sets us apart as the only crypto company to have this feature. This accomplishment was made possible through the close collaboration between the ML team and the Growth team and resulted in a significant business impact for Coinbase, while also providing a superior user experience. Additionally, our efforts in fraud detection have continued to mature and advance, with the addition of several new models such as ATO and WBL. These models have played a crucial role in establishing a competitive advantage (“moat”) for Coinbase in the crypto industry, and we remain a clear leader in this area.


Externalizing/decentralizing the Data 3.0 strategy - Lastly, after three years of successfully implementing it internally at Coinbase, we decided to share our Data 3.0 strategy with our industry peers. We published it on both the Coinbase blog and on TechCrunch (and for some reason they decided to add a paywall to it, without sharing any profit to us).?


2022 - Cost optimization and building for the future?

Data resources cost optimization - The year 2022 was a tumultuous one for the crypto and broader markets, as we are all aware. The crypto market, along with the broader economy, experienced a sharp decline in the latter half of the year, entering a prolonged period of difficulty. The rapidity of this downturn was unexpected and caught many by surprise. In response, we acted swiftly and were able to successfully pivot our strategy in order to mitigate the negative effects on our operations. We implemented cost-saving measures on Snowflake, while continuing to onboard new users to our platform without any negative impact on developer experience. Bluesky has been a great partner for our journey here.


Crypto data foundation (CDF) arrived - Ever since my first day at Coinbase, my goal has been to make data a fundamental aspect of the company's mission and strategy, enabling us to provide the most trusted and user-friendly crypto products and services. Through my involvement in various blockchain data projects, particularly Coinbase Analytics (now Tracer), we established the Crypto Data Foundation (CDF) at the end of 2021, with the initial presentation to executive leadership. Our CEO Brian quickly recognized the potential of this initiative and tasked us to collaborate with the Cloud team to bring it to fruition (which resulted in the launch of the Cloud Node product later that year). We quickly realized that this was a monumental undertaking that would require a phased approach, starting with the delivery of solid use cases in production before expanding our efforts more broadly. We focused our initial efforts on empowering the Coinbase NFT marketplace, with all data services provided by CDF. We also began partnering with various teams to address their needs for blockchain data services, which demonstrated improved performance and reduced costs. Gradually, it became clear that we should bring all blockchain data needs under the purview of CDF, regardless of the time it would take.This is a remarkable story of an innovative project that has transitioned from a venture to a strategic initiative with tangible impact in just a year. The driving force behind this success has been a strong motivation to pursue a great vision and a relentless focus on execution.


ML for every product - The year 2022 was also a seminal one for machine learning (ML) at Coinbase. We introduced CoinRecs, a personalization and recommendation engine, in our Feed and Notifications, and rolled out Coinbase Chat, powered by our own chatbot, across all platforms. ML has become a vital component in all of our major product offerings. We also launched the Web3 search MVP for Retail by the end of the year, and integrated our NFT collection floor price prediction into ChainLink's oracle to empower Web3 developers. This is just a small sample of our ML achievements in 2022. The overarching theme is that ML has become a mainstream aspect of Coinbase's products, and we have a distinct advantage over our competitors in the Web3 world as we continue to innovate and build upon the solid foundation we have established.


First company level OKR for Data governance: Data deletion - Lastly, we had a company level OKR from Data Governance around data deletion set from the beginning of the year, for the first time in Coinbase’s history,. I am pleased that we were successful in achieving this goal - all PED teams completed testing and launched their deletion workers in accordance with the data deletion guidance by December 23rd. This is a remarkable achievement for a crypto company, as we are currently the only one in the industry to have reached this level of data governance.?


2023 - Setting industry standards and empowering the next phase of growth for Web3?

Despite all the challenges that 2022 may have brought, I remain steadfast in my belief in the bright future of Web3. I believe that this is the perfect time to focus on building for the future, and the best way to do that is to seize this opportunity to double down on our efforts to empower the next phase of growth for Web3. From a data infrastructure perspective, we see ChainStack, "the data platform for Web3," becoming an industry standard within and outside of Coinbase (e.g. we have open sourced the code for Chainstorage, which is the infrastructure piece for Chainstack) .? I believe that this will not only become the technical standard for data, but also a vital component of Coinbase's business in the coming years. Together, we can overcome any obstacle and pave the way for a bright and successful future.


Postface

If you read the doc up to this point, thank you! I extend my gratitude to you. I do want to wrap it up by mentioning another very important point - I cannot overstate the importance of the talented individuals at Coinbase who have made all of our successes possible. We are truly blessed with "top talent for every seat" and it is this that continues to fill me with optimism and excitement for Coinbase and Web3 as a whole. The people I have had the pleasure of working with on a daily basis are the highlight of my time at Coinbase, and it is their contributions that bring me joy and fulfillment in coming to work each day. Thank you for taking the time to read this brief history of data at Coinbase, I am honored to have been a part of this journey and I will continue to cheer you all on the sidelines as we continue to make progress and achieve great things!?

Yuan Yao

Sr. Merchant Risk Manager at Affirm

2 年

I learned a lot from this, thank you for sharing!

Frank Yang

VP of Machine Learning at Upwork

2 年

Michael, what an amazing journey! Resonated with so many key learnings you have shared. Data is core in shaping product development, strategic decision making, and a company culture! Thanks for sharing the (open) secret that a small but strong team can achieve so much and so fast! All the best with your next adventure!

Tushar Shanbhag

Co-founder & President/CPTO @ Relevvo | Product Executive, AI Pioneer, General Manager | Ex-LinkedIn, Microsoft, Cloudera, VMware

2 年

Wish you the best Michael Li !

Shanshan Liu

Making the World a Better Place

2 年

Cool! Excited to hear more about your next adventures!

要查看或添加评论,请登录

Michael Li的更多文章

社区洞察

其他会员也浏览了