Whose sentiment? One or two annoying things about "sentiment indicators"? in macro

Whose sentiment? One or two annoying things about "sentiment indicators" in macro


I come from the Economics field, where one uses data to test a theory.

Data science mostly works the other way around: one builds a model then tries to make sense of it.


There is no absolute right or wrong way of doing things; but we need to be aware of the pitfalls inherent to each approach.


"Sentiment" indicators are very popular in quantitative finance these days. They often rely on Machine Learning and other statistical techniques to compile big chunk of data. I am a big fan of the technology, Natural Language Processing in particular, but only when it comes with a clear sense of purpose.


The single most important thing in a Machine Learning application (in the fields I know at least) is to establish the use case. Whose "sentiment" is this? What does it add to well-known market prices? Are robots better at "sentiment" than humans now, really?


Too often, a "sentiment" indicator is there for lack of a better word and, sometimes, of a proper thinking about the concept and goal.


Let's take an example. Someone builds a "sentiment" indicator aggregating all sorts of news about a few companies, using Natural Language Processing. It correlates well with stock prices.


Great.


Now, looking at the data a bit more closely, it appears that a good portion of these news relate with the stock price itself, like:

- "Company X rose 5% yesterday on the back of..."

or

- "Traders turned negative on Company Y".

There may or may not be a new external reason mentioned in the article.


That can be a recipe for disaster. The whole "correlation" with stock prices may well be coming from this type of news. In our experience, they can represent a surprisingly large share of the sample.


These news really need to be taken out of the sample. Then, if there is still a good "correlation" between what's left of the sentiment and stock prices, we have got something to work with.


If not, "sentiment" is just a stock prices lagging indicator, by about a day or so. The correlation with stocks is a spurious relationship.


When we aggregate inflation news in the News Inflation Pressure Indices, we have a special model trained to detect the news about official inflation releases, to take them out of our sample.


It's not a nice-to-have feature: it must be done. To put the numbers out: these news represent around 22% of our sample. We have 1.3 million news which relate with inflation, over a three-years period. Of these, just over 100k are selected to be inflation relevant. And of these, 22k need to be taken out because they relate with inflation releases.


If we were not to do that, we'd have a spurious indicator.


What's the bottom line?


It's not because we do Machine Learning that we should not think about a model strategy.


Data Science should start with... knowing the data, which requires field expertise. To build a useful Machine Learning model, the data scientist and the analyst need to work together.




Nicolas Woloszko

Research Vice President @CFM

4 年

I couldn’t agree more. Thanks, this is a very useful example !

回复

要查看或添加评论,请登录

Laurent Bilke的更多文章

  • The tools that transformed the way I work

    The tools that transformed the way I work

    I spend a lot of time building models and have witnessed the explosion of non-structured data, mostly text, in finance.…

    1 条评论
  • Will austerity come after the pandemic?

    Will austerity come after the pandemic?

    A NY Fed paper shows a relationship (correlation) between Spanish flu and local fiscal spending and far-right vote in…

  • Are "coronavirus" Google Trends data useful?

    Are "coronavirus" Google Trends data useful?

    If you wake up one morning with a sick child at home, there is a good chance you will Google search "coronavirus…

    2 条评论
  • Recession, corporate credit spreads and credit easing

    Recession, corporate credit spreads and credit easing

    Ahead of the ECB meeting Thursday, forget about a potential 10bp deposit rate cut. The real question is: can the ECB…

    1 条评论
  • US inflation only 1.0% in the last 10 years, not 1.5% ..?

    US inflation only 1.0% in the last 10 years, not 1.5% ..?

    A Fed paper released yesterday claims mismeasurement in consumer digital access services have led to a significant…

    4 条评论

社区洞察

其他会员也浏览了