NASDAQ DataLink: Retail Trading Activity Tracker – (RTAT) Dataset Review
I have been studying for an AWS certification for the past few days, and I needed to take a break from learning about the AWS Services and Products Catalog. I checked out my favorite Data Provider: Nasdaq Data Link (formerly Quandl) and saw the announcements for a new dataset: Retail Trading Activity Tracker. I decided to give it a go.
Dataset Overview
The dataset seeks to provide insights into self-directed individual investors’ trading activity. Typically, retail trading flow is considered an uninformed and late participant of market movements. They are primarily entering or exiting the market due to news events, advice from financial news programs, or media experts.
The claim that retail has emerged as a significant market participant, with the ability to influence market directionality and not just provide liquidity to informed flow, piqued my interest and deserved further analysis.
The dataset only reflects a portion of the overall market. It provides details on NASDAQ exchanges related flows, representing 30B USD/daily retail flow, across 9,500+ US names, American Depository Receipts, and Exchange Traded Products, going back five years, and it lags one day. A complete list of covered names is available; One can use it to enrich the dataset with reference data to perform more insightful and creative analysis.
The premium feed is available for 10 USD/Month, but it also provides the top 10 names daily for free to those looking to evaluate the dataset. Its elements are trading “date,” “ticker,” “activity” – a measure of daily retail flow within a name vs. all names, and “sentiment” – a score based on retail net flows.
Getting into it
The first question I wanted to answer was: What names have been favored by retail investors over the past five years? What has the crowd favored over the period? I am confident that other questions will arise as I work with the dataset, but I needed a starting point. I selected Stata to perform my analysis, given that I needed fast answers.
I downloaded the Retail Trading Activity Tracker – Daily Top 10 (Free) (RTAT10) dataset via API as a CSV from the site and loaded it into Stata. A JSON file with the table metadata is available too, which gives some idea of the structure. I could see I needed to do some wrangling.
I wanted to focus on the simple question of what names have been the crowd favorites; therefore, I only needed two data elements: date and ticker. I dropped the rest.
The date was imported as a string type, and it needs transformation into the Stata date data type. Dates within Stata are the number of days since January 1, 1960, not to be confused with Unix Epoch. I also needed to ensure the format remained meaningful while manipulating the data and later use.
The transformation is looking good. I could have replaced the date variable, but I wanted to retain a way to validate any operations performed implemented correctly. I kept the data for the moment. Now I need to bin the data by year, and I only want to focus on what names are of interest to retail on an annual basis. I will then save my dataset to avoid transforming it repeatedly as I work with the data.
Now, on to establishing baselines.
I can see that 2017 might not be a whole year compared to the rest of the years, and I know that 2021 still has 45+ days to go. I need to validate the 2017 assumption.
As expected, 2017 is not a whole year, and the behavior of 2021 aligns with expectations as well. However, I will keep 2021 but drop 2017 from the data set. A simple: drop if Year==2017 and all 2017 observations are gone—the beauty of Stata.
Now, on to focus on the symbols.
A look at the ticker data shows that not all symbols were relevant every year. I need to establish a baseline to understand how many symbols will be removed. I can identify that there are 327 unique symbols after dropping 2017.
Looking at the data since 2018, I can already get a pretty good idea of what names retail seems to participate in the most to date.
Now I only need to focus on determining what names they participated in every year, but I can see an investment theme from the top symbols – FAANG trade.
Arriving at the answer
To find my final answer, I need to work with the frequency distribution of the dataset for each ticker per year; thus, I have to create new variables capturing the statistic and identify as well those instances for which a ticker had no event within a given year. Stata makes this process super fast; it has taken me longer to write this article than it took me to analyze and study the data arriving at my answer.
I fill in with Stata all zero (no participation) values, “adding” all missing years for a ticker. I then sort by ticker with a year dimensionality and generate a new variable capturing the total frequency distribution for that symbol but excluding those that were “added.” I proceed to calculate the minimum frequency for each ticker, given that I want to use it to exclude all tickers without annual participation, and with all those steps taken, I have my answer.
From the data we have wrangled, we can see that while GOOG is not on the list, the rest of the FAANG trade components represent a significant focus of retail activity since 2018. We can also easily see the retail flow focus has been on technology-related names and the broad indices (Nasdaq and S&P500).
Technology names participation was not a huge surprise, considering Technology has been the best performing sector globally since 2009. It was shocking to see GE make the top list.
The Conclusion
In summary, Did I learn something game-changing from this dataset? No, but it was certainly interesting. Perhaps with more time to explore, expanding the scope to the premium dataset, enriching it with reference data, and looking at a daily granularity will provide more powerful insights.
I would be curious to know the impact of retail trading activity on daily returns - this might be difficult to assess given this seemingly never-ending bull market in which we find ourselves. At the very least, we can test if the data has any predictive abilities and can signal trading opportunities.
Would I switch my focus from attempting to understand and follow informed flow over to focusing on Retail Trading Activity? No, but I will continue to look at the data to see where it can lead and how it can be leveraged within my overall investment strategy, as one never knows what trading opportunity could arise at any point.
You can find more details about the new Retail Trading Activity Tracker dataset from Nasdaq Data Link at the URL below.
Disclaimer: The views expressed are solely those of the author in his private capacity and do not in any way represent the views of past/current employers
Retired
3 年Hope all is well ??