Our Data Problem - and a cheap way around it

Our Data Problem - and a cheap way around it

Mika: "Guru, how will we defeat the almighty large tech and data in our quest for the holy grail that is dynamic game?"
Guru: "Seek not in others' words but go to the Source and you will soon harness knowledge and enlightenment of data. Go now Mika, for I need to code, that is, meditate."

Facing the limitations of data and Bloomberg's power

The most pressing problem with our neural network and the connected game theory code has been the lack of extensive, relevant data. You can utilise APIs to get basic financial and trading data, such as a few Yahoo finance-based APIs and Alpha Vantage. Then you realise, it is about 5 queries per minute. Unless you automate things and take a six months vacation, this is a problem.

Then you have excellent commercial databases - which are very costly if you need hundreds of millions of data points. And then good-old Bloomberg devastated us, minor players, by launching Bloomberg GPT using decades of extensive trained data. Should we just give in? Unlikely. So, there are two options:

  1. compete head-on with them, or
  2. purchase eventually API rights and build on their work (extremely tempting).

However, you'll still have to train your own models to reach the capability of dealing with the ultimate objective. For us that is being able to play dynamic financial games with a trained neural network - that is, knowing what to do taking into consideration everyone else's optimal actions in changing environment.

Get to the Basics and Do your Scraping

The solution was clear. Albeit, not having a Guru to resort to, I did the next best thing. I did what I was taught at Oxford.

You do not use secondary sources. You go to the original sources of everything - no shortcuts. We did that. We've been doing the tedious work of scraping free, extensive financial data from mandatory securities filings and sites that are public. Pre-processing and building Pandas Dataframes, arranging and evaluating data throughout. We are also currently at hundreds of millions of items as a result. Thanks SEC!

Because we are not building large language models, unless that proves as the only game in town, we'll crunch the data using low-key supervised learning. In addition, we will add a (legal) reinforced learning module and a Q-learning module to out software during the Summer. Thus, we do not need a hundred NVIDIA GPUs to dance with the data - I'm gonna regret saying this though.

No alt text provided for this image
The cash saved by getting to the basics buys my daughters a lot of ice-cream - highest game theoretical utility

But where, where does it all lead (in a Scottish accent)!

Next month we'll get additional hands just for quantitative finance machine learning - a cool title Machine Learning Quantitative Finance Analyst - 'Maleqfa' - perhaps not a good acronym. I'll think again.

Pre-processing of the data is a huge task, as we are not investing in providers of ready-made data. It may appear tedious work, but it combines financial skills, with a bit of Python Pandas and Numpy acrobatics and generally analyzing what is relevant for the solvency and liquidity of the companies.

No alt text provided for this image

Such work is material. See what happens, when the data is not properly vetted, pre-processed and 'relevancy-checked'. It leads to the demise of the coder and mandatory meditative practices to restore the willingness to plough through the intricate neural networks built over the months.

What we are testing at the moment is the simple categorization of very large company datasets based on target categorisations of certain financial ratios. We started with the solvency/liquidity measures like the Current ratio moving on to multiple financial ratios. Oddly enough (not a surprise) good old corporate finance basics provide the best solutions. The 'bad thing' - a lot of financial analysis for the Quant Analyst(s).

No alt text provided for this image
No alt text provided for this image

As for myself, due to our slow systems compared to most competitors; I'll just leave the computers to iterate through 100 epochs and head for the best place in H?meenlinna - 'Appara' as we locals say. Back there, I do not need a Guru, for Appara makes you a Guru. Happy Weekend!

No alt text provided for this image

要查看或添加评论,请登录

Mika Lehtim?ki的更多文章

社区洞察

其他会员也浏览了