SCIENTIFIC METHOD IN CREATIVE BUSINESS: HOW NETFLIX APPLIES BIG DATA ACROSS BUSINESS VERTICALS
Photo by @ventiviews, Unsplash

SCIENTIFIC METHOD IN CREATIVE BUSINESS: HOW NETFLIX APPLIES BIG DATA ACROSS BUSINESS VERTICALS

This article was produced in support of the growing importance of business analytics in the film and media industry, and how big data applications shaped business models as well as wider industry disruption. The discussion reflects both technical aspects of business analytics and broader reflection in media studies. It is aimed at management-level readers across the film and media industry to generate an understanding of big data applications and demystification of algorithms as absolute knowledge. It offers a refresher on the place of data in film and media and the new industry cycle that technology has unlocked.

The context of the Netflix disruption

Since the dawn of the Lumière brothers’ Cinematographe, technological development has been closely tied to the development of the film and media industry, mainly through advancement in the means of production and broadcasting. The uncertainty of film and media performance remained the key unconquered variable. The disruption brought by Netflix shifted the focus on data and its smart deployment. The early 2000s marked the crisis of linear television in the US (Lotz, 2014 ) and propelled many to reach out to big data in an attempt to predict performance. The reliance on traditional film ratings, box office predictions, critics' reviews, TV ratings and metrics were enriched by marketing strategies deploying sentiment analysis through social media to determine audience size and corresponding investments in marketing efforts (Simon and Schroeder, 2019 ).

The revolution that Netflix made, in contrast to traditional media conglomerates, was the ability to institute this cognitive business approach. Metrics of television ratings relied on small sample sizes, often with a time lag while Netflix gradually built its capacities to collect data on almost all subscribers’ viewing habits, and in near real-time (Netflix Technology, 2020). Furthermore, Netflix's algorithm differentiated itself by being post-demographics. This means it does not rely on demographic data, but the direct relationship with subscribers and the capacity to track all activities on the video streaming service (Wayne 2022 ). In simple language, the type and number of devices you use to log in help Netflix predict your income bracket, without you telling them. This Silicon Valley influence in orientation towards data raised discussion in both industry and academic research in media studies that labelled Netflix as pursuing the “myth of big data” (Couldry, 2017 ) or characterized the company as the epitome of "algorithmic culture" (Hallinan and Striphas, 2016 ). The debate of the “Moneyball mindset”, where data-driven policies replace human creativity (van Es, 2022 ), has since expanded in many aspects of the public sphere and our contemporary life on the internet.

In summary, one side of the Netflix story is the approach to big data applications across company verticals, while the other reflects the data culture surrounding its brand. This constituted the demarcation line and launch into the streaming wars with dominant media conglomerates trying to catch up in technology. However, the focus here remains on the scope of big data in film and media in the last five years. Sources used here aim to generate a rich industry context, and while some of the technical solutions might have been upgraded in the past year, the overall approach to technology is what is more important. The analysis includes the history around the Netflix prize, a milestone in the company trajectory, as well as broader social, ethical and legal issues surrounding it.

What data does Netflix capture?

Suffice to say data is embraced in all decision-making processes across the company: from engineering to UX, to marketing and finance. Moreover, Netflix’s data operations are entirely cloud-based, with Amazon Web Services as the main data lake, and dozens of data processing tools. Some of those solutions are very niche, like Druid, an open-source data store designed for real-time exploratory analytics on large data sets. It outperformed widely known Hadoop in the speed of ingesting and exploring large quantities of transactional events and time series.

?The Privacy Policy Statement indicates three types of data captured:?

1) explicit data - factual data about the users necessary to deliver the service, such as email, home address, credit card number (named “information you have provided us with”),

2) implicit data – that is automatically collected, including cookies, IP address, shopping or browsing history. This group of data is the largest by volume, variety, velocity and veracity. It gradually replaced the role of explicit data in the improvement of algorithms. It is collectively referred to as "events" and extends to behavioural data such as browsing, minutes played, pause, rewind, rewatch etc. According to Netflix Data and Science blog in 2020, they were ingesting over 2 million events per second, and querying over 1.5 trillion rows to produce a highly detailed insight of how users are experiencing the service (Netflix Technology, 2020).

3) information from third-party partners which include a long list of advertising agencies, but more importantly the partners among broadcasting, telecommunication companies and device manufacturers to provide a seamless experience. This includes geo-location and device identifiers that track wherever the consumer is (on a plane, at home, in a hotel) how they access the service (internet provider, browser) and what type of screen (laptop, tablet, TV, mobile). In return, this rich data has enabled the company, and the partners to capture, measure and enhance performance on every step of that user experience, sometimes including the offline world.

Netflix data ecosystem and data volume, Blake Irvine, Tableau Conference 2018


Having said that, analysis of big data applications does not only take into account algorithms and infrastructure but also human elements. How teams are organized, what principles are used in designing the big data organisation and management, and ultimately what is the balance between the data outputs and human logic when making decisions.

How is the analytics ecosystem organized?

Data management and infrastructure would not be possible without the data culture. Netflix's analytic ecosystem is accessible to all employees. Visualization plays an important role in the high adoption rate of analytic tools by the employees is non-data related roles (Irvine, 2018 ). In 2019, approximately 10% of total company employees worked in the Data Science and Engineering?group ingesting and delivering solutions for all company verticals. For those literate in technology language Netflix research and Netflix research blog provide an abundance of articles and insights with the aim to generate knowledge and attract talent in machine learning and data science.

Business analytics are aligned with the following business verticals. The Netflix app (PRODUCT) delivers high-quality streaming and bespoke customer service. The catalogue of licensed and originally produced content constitutes the CONTENT vertical, separate from the STUDIO vertical, which is in-house productions known as Netflix Originals. MEMBERSHIP vertical aggregates acquisition marketing, sign-up flow, pricing, partnering with other companies, and messaging. Oppose the MARKETING vertical with corporate partnerships, consumer merchandising, traditional media outlets, and editorial content on digital platforms. And finally, PLATFORM vertical that embodies the engineering infrastructure to ensure efficient, secure and state-of-the-art use of tools and compute resourcing.

Under each of the verticals, there are multiple sub-segments and types of applications that are combined to provide input for data analysis. This cross-functional organizational structure around business analytics is enforced through the company culture that favours "context not control" (Netflix Technology Blog, 2020). Individual freedom to contribute to chosen projects is of high value. Employees are regarded as responsible domain experts empowered to direct project priority to those with the most business impact, echoing the legacy of kaizen.?

The Netflix prize – the evolution of big data in product development

In the paradigm of Silicon Valley, the data is ubiquities to the company from the top down. Even in the early days, when DVD rentals were processed through the website, Reed Hastings was persistent in deploying AB testing in product development, sign-up methods, payments, product changes, and messaging. This could have been described as a mainly operational support system. However, there is a certain tactical level that puts focus on capturing star ratings to feed the database and develop a predictive analysis that would later become a video recommendation algorithm.

It is important to note this historical evolution of Netflix's approach to big data in video recommendation to fully understand the scalability the company has acquired. The in-house video-recommendation algorithm Cinematch used a variant of Pearson's correlation with all other movies to determine a list of "similar" movies. Once the user provides ratings a multivariate regression was computed in real-time. Based on these correlations a unique, personalized prediction for each predictable movie was produced based on those ratings (Bennett et al., 2009). The accuracy of the system was measured by Root Mean Square Error (RMSE) statistics and at the time was 0.9514, which could be interpreted as a moderate error given that the value range is 1 to 5.

The data-driven aura of the startup came to the spotlight of the machine learning and computer science community in 2006. The Netflix Prize of $1 million was announced for the first developer that could beat existing predicted customer ratings by lowering the RMSE statistic by more than 10%. The open competition released a dataset of 100,000,000 ratings for 17,770 movies given by 480,189 users. The dataset consisted of four variables: user ID, movie, date of grade and rating value in the range from 1 to 5. Users and movies were represented with integer IDs, personal data was removed (Bennett et al., 2009). After several iterations in 2009, the team that managed to produce an improved algorithm combined hundreds of predictive models beat Cinematch by 10.89%. However, Netflix decided not to deploy the algorithm due to several controversial factors. On one hand, speculation was raised on violation of US Fair Trade laws and following legal procession. On the other hand, the improved algorithm didn't improve the retention rate of subscribers when put in A/B testing (Biddle, 2021 ). Thirdly, by the time engineering managed to deploy the algorithm in the existing pipeline, the company had dropped a 5-star rating, realizing the potential of capturing a much larger variety of behavioural data through the streaming media service that was launched in 2007.

However controversial it was, the Netflix Prize generated traction among the researchers and data science community which, on one hand, increased knowledge generated around big data predictive analytics, and on the other hand, supported the building of the Netflix brand as an innovative and data-driven company. This is an important element that benefited the marketing of the company towards investors, attracting talents, but also provided an abundance of critical media studies and questioning of the societal role of algorithms.


Photo by Marcus Spiske, Unsplash

?

Post Netflix Prize big data application

The key element of Netflix’s product, i.e., the application, is the user interface (UI) powered by the content recommendation algorithm. The key performance indicator (KPI) that all solutions are tested against is the members' retention rate. While in the development of UI company heavily deployed AB testing, the in-house recommendation algorithm Cinematch was eventually replaced by collaborative filtering algorithms and star ratings were dropped thanks to the volume of implicit data from personal behaviour that they were able to capture via streaming. It is important to note here a less-known fact that to deploy collaborative filtering Netflix used manual tagging in an attempt to crack the “movie genome” (Biddle, 2021 ). The project included 30 “moviecologists” hired to tag various attributes of movies and TV shows, similar to the method deployed by the Pandora music gene project. This demonstrates the complexity of the development of big data applications that had to be assisted by human direction, as a sort of Mechanical Turk, for the analysis to take place.

In the iterative process of improving their predictive model, Netflix also realized that age or gender did not contribute to improving the predictive models of audience taste. Instead, "supplemental information" which might include online and offline data turned out to be of greater value to predictive models (Bellanova and González Fuster, 2018 ). Predictive analysis feeds on historical data. Thus, the larger the backlog of data on the individual user the higher the success rate of the predictive analysis is enabled (Biddle, 2021).

This is a very important insight, concerning the criticism of Netflix’s scientific approach to the consumption of cultural content and their never-ceasing attempt to increase the probability of predictive statistics through big data. As stated in the introduction, Netflix was not the one to invent this approach. Their pursuit to increase the success rate of users choosing the suggestion skyrocketed from 2% to 80% (Biddle, 2022), almost conquering the idiosyncratic nature of people's taste in film. A subject that even cognitive psychology researchers had a challenge confirming (Wallisch and Whritner, 2017).

At the same time, this reveals that the myth of big data was at times pursued even though the state of development of algorithms was not as sophisticated as it was marketed. In 2013, Netflix was awarded a technical Emmy for Personalized Recommendation Engines for Video Discovery. The only two other companies that were awarded such prize are YouTube and Amazon. It supports the company’s determination and contribution to something radically innovative, even before the capacities of big data capturing and volume of implicit data via streaming were enabled to the extent we know today.

Big data application in the content vertical

A more significant disruption in the film market relations happened when Netflix entered the production arena with original content. The direct relationship with the audience through data capturing is a building block of Netflix’s ability to predict audience size. Consequently, it enhances the decision-making process to determine the right-sized expenditure on producing content. This has led to an arena with leading media companies like Disney and Time Warner that worked with big data applications in their own right but were less agile than Netflix. Here are some examples of Netflix’s right-sizing with big data (Biddle, 2022):?Stranger Things (predicted 100 million watchers, invested $500 million in the series),?Bojack Horseman (predicted 20 million watchers, investment $100 million), Everest climbing documentaries (prediction one million members, investment $5 million in the genre).

Probably the most well-known case was the House of Cards TV series being commissioned without a pilot in 2013, which was a precedent in the US TV business. The historical amount of data was enough to demonstrate the simple power of big data that the company harnessed by the overlapping large population of members who showed interest in films featuring Kevin Spacy, the esthetics of the directorial style of David Fincher and the number of rentals of the original British TV series House of Cards. The Venn diagram of these populations was substantial in market revenue for the company to outbid the competitor HBO in acquiring rights for the TV series and further invest 100 million in the first two seasons.

This prescriptive use of data in the creation of new products beyond surfacing similar titles and predicting audience size goes into drawing from transfer learning, embedding representations, natural language processing, and supervised learning (Netflix Technology Blog, 2021). As already mentioned A/B test experimentation is widely used across Netflix, but more importantly, the investment in infrastructure to support and scale this experimentation is what makes it compatible with big data.

On the other hand, Netflix has been often criticized for lack of transparency in market performance data by industry analytics, trade magazines and peers (Wayne, 2022 ). Until recent years the company policy was explicitly refusing to disclose the number of subscribers by country, or the methodology on how many views each film scored, comparable to standard metrics such as Nielsen audience measurement and the box office reports. In response to this pressure, the company introduced semi-transparent data, namely the feature list “country top 10”. This still does not disclose the number of actual views per title, which according to some industry sources, is not even shared with showrunners and creators of particular shows. For this reason, the real economic impact of their predictive analysis could not be fully assessed. The global popularity of certain content and total subscription growth are the only external criteria that put Netflix in the same field as other content producers.

Big data application in the platform vertical

To provide global service at all times Netflix runs on dozens of thousands of servers. As the scale of cloud infrastructure increases the importance of automating operational decisions becomes the key to reducing human intervention. Further to the fact that features of the platform are in part or entirely dependent on one or more micro-services, which are seen by users as rows on the personal home page, there are three criteria for the outlier analysis to be performed – security, reliability and efficiency (Aswani, 2019).

In the?security?space, the focus of the data teams’ effort is to detect suspicious or malicious activity using a collection of machine learning and statistical models. Concerning?reliability, the focus is on prevention and diagnosis. Data teams measure the impact of outages and expose patterns across their occurrence, as well as provide a connected view of micro-service-level availability. Regarding efficiency, the focus is on transparency and optimization. This is where; once again, Netflix’s culture of “Freedom and Responsibility” plays an important role as every micro-service has an owner.

Engineering teams are under a continuous imperative to build up solutions that lighten the cognitive load while detecting outliers is dealt with cluster analysis. The precise analysis used is?Density-Based Spatial Clustering of Applications with Noise?(DBSCAN) which determines outlier performing server and has significantly simplified operations in running the platform (Netflix Technology, 2017).

However, test results illustrate that outliers cannot be perfectly distilled in the environment, but results are close enough to be deemed acceptable as the cost of an individual mistake is considered low (Netflix Technology, 2017).

Discussion - Benefits and challenges

Three clear benefits of using big data include market performance and penetration with a hard-to-copy technological advantage that puts customer experience in focus better than any other competitor in film and broadcasting. Secondly, access to thousands of touchpoints enables Netflix to expand into interactive content or gaming, as announced last year. Thirdly, a data culture that promotes constant experimentation and adoption of analytics throughout the company benefits the positioning with stakeholders, capacity to attract investors, high-quality talent and employees. Generally, we know the brand as one of the most talked about in the industry, even when it was criticized.

From the point of view of the infrastructure and data pipeline, there are two objective challenges that many big data companies face: data scale and data lineage (Binging on Data, 2018). The data and analytics team are constantly balancing the volume of data that can enter reporting. In-house development combines various tools to enable increased interactivity. The real challenge of data scale is integrating various commercial and in-house applications that provide enough depth and response time. Second, the challenge of data lineage refers to where the data comes from and how trustworthy it is. Netflix's data landscape stores various data artefacts that are overly complex to develop trustworthy relationships among them. In solving this issue Netflix is working with various solutions, collaborating with other data catalogue companies, and at times turning to the "brute force" of their teams in creating manuals to overcome the challenge (Irvine, 2018).

Social, legal and ethical dilemmas of big data applications

Being at the forefront of big data means that Netflix brand is closely associated to emerging social, legal and ethical dilemmas. Operating in the realm of cultural production raised early concerns in media studies regarding social influence. The general critique of algorithmic predictive systems, including one deployed at Netflix, is overly personalization that creates an echo chamber where existing views of users are reinforced, rather than challenged. Since the 2019 antitrust scrutiny that Facebook, Google, Amazon and other big tech companies faced in the US Congress, the public debate has become more evident in the social realm, becoming the new internet normal. For a while Netflix positioned itself among companies that claim to play on both sides of tech and media. However, public policies are moving towards regulating frameworks for operators in the media sector to level the playing field for national broadcaster and streamers. Netflix turned to repositioning and addressed concerns over the negative impact of algorithms on society in its communications. The company claims exploration and diversity are?built into?its algorithms (Sudeep, 2019) and wants recommendations to be diverse and “appeal to your range of interests and moods” (Netflix Technology Blog, 2017).

The Netflix Prize itself caused the first signs of future privacy issues triggered by the pervasive data mining practices and thus the massive amount of sometimes highly sensitive data being processed. At the time of the Netflix Prize release, datasets were disseminated for various other research purposes, which raised suggestions from Narayanan and Smatikov that the dataset enabled them to identify persons behind the ratings to the extent of sensitive data like political preferences, religious affiliation or even sexuality. Nowadays, with the variety of data that is scrapped through browsing history, profiles created under one subscription, type of device that is used, it becomes even easier for Netflix to predict information such as annual earnings or the marital status of the users, without having to ask for it.

In the long list of ethical dilemmas of the impact of algorithms on creative production, curation as an alternative to the algorithmic recommendation brought hope to new business models. However, the competition in streaming services that choose curation of content (e.g. MUBI, Criterion Channel, TCM) poses only a societal challenge, while the business model has not been rivalled yet.

Photo by


Conclusions

There is nothing new in the media industry obsession with tracking viewership, ratings, and predicting box office sales. Netflix has tapped into this segment of the industry and made it its core, with content coming only of secondary importance. The evolution of big data applications in the company's history has always corresponded to strategic goals and contributed to macro as well as micro levels. Netflix describes its data analysis as a truly scientific method used to inform a wide range of questions, where hypotheses are formulated and then tested. Netflix takes pride in the amount of data that it processes, even when these sophisticated methods are combined with some more traditional methods, such as AB testing or Venn diagrams, or even the use of Mechanic Turk, all in aim to contribute to building the company brand as a data-driven company. In times when big tech companies were called for responsibility, Netflix backed down and shifted the narrative away from data-driven, replacing it with "data in support of gut instinct”.

As laid out in this report it can be concluded that the success of Netflix analytics is a result of well-combined domain knowledge, technology, analytic techniques and corporate culture that enforces the commitment and dedication of teams to their domains. This is to say that successful big data applications are not just a matter of capacity to build a robust infrastructure to capture data but require smart design and a combination of different analytics methodologies. This becomes an important takeaway for smaller companies that cannot afford large investments and human resources in business analytics. There is no “one size fits all” solution, as seen in Netflix's case, and often it goes to include quantitative methods, and in-depth interviews, to uncover human perspective and ultimately improve algorithmic similarity and product innovation. But how to compete in the film and media market with big data companies remains a challenge.

The high data sensitivity imposes missing links that made the depth of the evaluation difficult, and incomparable in terms of some classical film industry KPIs. In a way, Netflix made itself a stand-alone category by putting big data at the heart of the business model and ventured linear television into a "streaming war", spreading beyond the US market to national broadcasters in Europe. The dust has not yet settled.

***This article was originally written as part of the coursework in Business Analytics and Strategy in 2022, and abridged for the LinkedIn audience. The academic references are removed with only key in-line citations left for those interested in further reading.

Miljana Jovovi?

CEO / Founder at Materriya

1 年

Thank you, Milica, for this comprehensive analysis! I finally had the chance to read it in full and it provides valuable insights. Netflix's deployment of the data model as a business strategy is undeniably lucrative, and kudos to them for it. However, this approach led them towards mainstream content, avoiding risks. Consequently, much of their content appears similar, drawn from the same recipes book, made with the same ingredients and ends up tasting somewhat the same. And what’s worse, it molds public taste in a loving of one content cuisine.

Milos Tomic

Driving Efficiency, Growth & Revenue

1 年

"In times when big tech companies were called for responsibility, Netflix backed down and shifted the narrative away from data-driven, replacing it with "data in support of gut instinct”." ??

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了