24 Algorithms & Data Structures  you need to know for Finance

24 Algorithms & Data Structures you need to know for Finance

The Goal of this article - Provide the Building Blocks

How to read this Article

Read the full list and critique the Usability of each definition.

If you have at least 2 hours a day to dedicate to Programming, you can compare it to definition in recommended resources. Every term may not be defined.

Practice each object through recommended exercises. Then, try to redefined each definition weekly.

What to read next

Your goal is to build a trading assistant. To do so, you need to learn about 4 programming's fundamentals : Data Structures, Algorithms, Design Patterns, Neural Networks.

After this article, you can read 12 Design Patterns for Finance.

Then, you can read Introduction to Deep Learning for Trading.

Finally, Neural Networks for Finance. You can find Neural Networks for the Trading Assistant we'll build here.


12 Data Structures for Finance and Automatic Trading

Nature, Function and 3 most common uses

Here you'll find a list of 12 Data Structures for Finance and Automatic Trading defined by Nature, Function and 3 most common uses.

They have been listed in a typical order to facilitate comparison with definitions from other resources. We may group them in upcoming articles for faster reading.

  1. Lists are a dynamic collection of elements. Their Function is adding, removing, and accessing elements. They are more flexible than arrays and also more limited in uses : Lists can contain different types of elements. Lists can be used for Portfolio Management, Risk Management and Market Analysis.
  2. Linked lists are made up of nodes that contain information and a reference to the next node. They provide efficient insertion and deletion at any position. Linked Lists are meant for Implementing high-frequency trading, managing real-time streaming and building event-driven systems.
  3. Arrays are a collection of elements of the same data type stored in contiguous memory locations. Their Function is to provide efficient indexing and random access to these elements. 3 Common uses of arrays in finance are storing price data, managing portfolio holdings and implementing indicators.
  4. Matrices are at least two-dimensional arrays, organized in rows and columns. Their Function is to organize mathematical operations in data analysis. 3 Common uses of matrices in finance are multi-dimensional analysis, portfolio optimization and risk modeling.
  5. Hash tables are tables using key-value pairs to store and retrieve data. Their Function is to provide fast lookup and insertion times. 3 Common uses of these tables are Storing symbol mappings, caching market information and modeling order books.
  6. Queues are a First-In-First-Out structure: elements are added at the rear and removed from the front. Queues enable processing of elements in the order they arrive. 3 Common uses of queues are Managing order execution, handling event-driven systems and implementing message queues.
  7. Stacks are a Last-In-First-Out structure: elements are added and removed from the same end. Mirroring Queues, Stacks are useful for tracking function calls, managing undo/redo operations, and handling recursive algorithms. 3 Common uses of stacks are Evaluating expressions, simulating financial instruments and tracking trading strategies.
  8. Trees are hierarchical structures composed of nodes. Function: Trees provide efficient searching, insertion, and deletion operations. 3 Common uses of Trees are Representing market structures, building decision frames and organizing option chains. Most Neural Networks will be displayed in Tree or Graph form.
  9. Tries are tree structures used to store and retrieve strings. Tries facilitate prefix-based searches and provide autocomplete functionality. 3 Common uses of Tries are Symbol lookup, implementing search engines for financial news and storing options chains.
  10. Heaps are specialized binary trees in which parent nodes have a different value from their children. Heaps are useful to maintain the maximum or minimum information in constant time. 3 Common uses of Heaps are establishing priority for order execution, optimizing portfolio rebalancing and implementing market structures.
  11. Graphs consist of nodes connected by edges, representing relationships between entities. Graphs facilitate analysis of interconnected information & pathfinding algorithms. 3 Common uses of Graphs are Modeling financial networks, analyzing correlations between assets, detecting trading patterns.
  12. Sets are collections of elements. Sets facilitate membership testing through operations such as intersection, union, and difference. 3 Common uses of Sets are filtering duplicate data, identifying unique trading signals, managing risk exposure.

Remember that in programming there are 6 main concerns and therefore 6 main operations : Creating, Reading, Updating, Deleting, Getting & Setting. Each of these structures has been created to respond to at least 1 of these concerns.

You'll learn these Structures by coding and will be able to use them in websites, chatbots, games or generative software.

If you want to know more about these structures, learn about Discrete Mathematics applied to Probabilities & Statistics.

Main Applications of these data structures

Here you'll find a list of main Financial applications for each Data Structure. This is list is separate for faster access.

If you re-read this article, you'll likely want to focus on Applications.

  1. Lists allow you to manage order queues when executing trades. They do track & classify trade history for auditing and performance evaluation. Finally, they maintain watchlists of securities for monitoring price movements.
  2. Linked Lists allow you to implement high-frequency trading systems. This is done by Managing streaming information and updating it in a linear manner. You can then code event-driven systems.
  3. Arrays will generally store price information for multiple securities. This is essential to update and track portfolio holdings performance. The main way to update then is to implementing technical indicators like moving averages or Bollinger Bands.
  4. Matrices conduce multi-dimensional analysis of financial information. One of the most common is calculating mean-variance in portfolios. It allows you to Optimize portfolio allocations. Thanks to covariance matrices, you can model risk factors.
  5. Hash Tables store symbol mappings for getting and setting. This facilitate caching which reduces latency in subsequent retrieval operations. They are used to Implement order books for high-frequency trading.
  6. Queues are used to manage trades' order execution. They queue incoming events in event-driven systems. They help to implement message queues for asynchronous communication. It is interesting to implement queues in chatbots to test the complexity of the software.
  7. Stacks allow fast evaluation of options pricing/risk calculations formulas. It is done by Simulating & Tracking financial instruments states. Managing trading strategies with support for undo/redo operations and backtesting.
  8. Trees represent Hierarchies in price or option trees. By building decision trees you start your algorithmic trading & risk assessment. Trees allow you to organizing stocks/option chains. Trees are hierarchy in your system.
  9. Tries serve for Symbol lookup for quick retrieval of security information. You can also use them in search engines for articles. Storing options chains and facilitating efficient retrieval based on strike prices or expiration dates.
  10. Heaps will help you implementing priority queues for order execution & trade matching. You'll also use them for portfolio rebalancing by identifying deviations. Finally, they serve to building graphics and tables for time-series analysis or event-driven trading.
  11. Graphs are meant for representing networks, such as Lending networks or stock market Correlations. Analyzing correlations between assets helps to identify opportunities & risks. This is done by detecting patterns and anomalies in markets.
  12. Sets serve to represent elements nature, function and uses. They help to Find, Filter and Update Information to ensure integrity. Identifying unique signals or patterns is essential for optimal strategies and security purposes. You'll be able to Manage Risk by identifying overlapping positions or assets within a portfolio.

You'll find new uses and applications to these Structures by coding your trading assistant.

How to Practice these Data Structures

In a code editor

Create a folder in your Documents named 'Programing Tools for Finance'. In this folder create 3 subfolders named 'Data Structure/Algorithms', 'Design Patterns' & 'Neural Networks'.

At least 3 days per week, review your different files using the python/JavaScript/C# documentation. Comment your code to indicate relationships between each type of programming tool.

If you want an easily accessible, online option you can use google collaboratory.

On paper

Buy a dedicated notebook for your programming, mathematics and finance practice. Note your definitions in this notebook following the Nature/Function/Uses pattern. If you can already, note sample code for each structure.

At least 3 days per week, do a proof or set exercise. Then try to find a corresponding Memory optimization or Computer Architecture problem.


12 Algorithms for Finance and Automatic Trading

A Greater Specificity of Use

Contrary of Data Structures which are elementary and global, the following Algorithms will be specific to Finance & Trading. The names used in this article are popular names. You can find more specific names or transcriptions of these algorithms if you specialize in 1 asset.

  1. The Moving Average is an algorithm averaging a series of coordinates. In instance, it averages price data, identifies trends, and generates trading signals. This serve for the whole Price Action process. This is the most Important Indicator in this List an many others depend of it.
  2. Bollinger Bands is a volatility-based algorithm consisting of a moving average and upper/lower bands. Bollinger Bands indicate overbought and oversold conditions and help to identify potential price reversals. It is used to identify Breakouts and Reversions.
  3. The Moving Average Convergence Divergence is another trend-following algorithm used to calculate the difference between two moving averages. It identifies potential trend reversals & generates trade signals based on the crossovers. Like the 2 precedent indicators it is used for Trend identification. Unlike them, it is used for divergence analysis.
  4. Relative Strength Index is a momentum oscillator measuring the speed of price movements. It identifies overbought and oversold conditions to anticipate trend reversals. Use it to Identify extrema, generate mean reversion signals and confirm trend strength.
  5. Arbitrage is a strategy involving price comparison differences between markets. Its function is to profit from market inefficiencies by buying low and selling high. Its 3 main forms are Statistical, Triangular & price Discrepancies arbitrage.
  6. Pair Trading is a strategy involving trading two correlated instruments simultaneously. Its function is to profit from inefficiencies between these instruments. Its main difference with arbitrage is the hedging against market risk.
  7. Mean Reversion is a strategy assumes that prices tend to revert to their mean. It identifies overextended price movements to indicate when the trade should be made. It is used in day/swing trading, identifying market extrema and exploiting price mean reversion.
  8. Trend Following is a strategy identifying established market trends. It generates trading signals based on the direction and strength of the trend. It's used in Swing trading to identify large market moves and exploit momentum trends.
  9. Monte Carlo Simulations are a category of statistical algorithms modeling random outcomes. They generates probabilistic outcomes for pricing and risk analysis. They're mostly used in Option pricing, for portfolio optimization & risk management. They can be used for high-volatility instruments trading.
  10. Kalman Filter is an algorithm used for state estimation in the presence of measurement noise. It updates and predicts the state of a dynamic system based on noise observations. It's most common uses are Estimating hidden states in models, tracking asset prices and filtering noise.
  11. Genetic Algorithms are a Fundamental type of optimization algorithms. They iteratively optimizes by adding/removing iterations from a population of candidates. They're used for Parameter optimization for trading strategies, portfolio optimization and feature selection.
  12. Multiple Machine Learning Algorithms are used for evaluating Price, Volume and Speed anticipation. They adapt to information to generate predictions or make trading decisions. They can be used to Predict asset prices and perform sentiment analysis of news, leading to more reliable risk modeling.

If you want to know more about these Algorithms you can read our summary of 'Technical Analysis of the Financial Markets'.

If you are following this newsletter you are likely interested in Finance. These algorithms will help you to revise each step of your procedure. They may also make the trading process clearer, and therefore more enjoyable.

3 Main Applications of these Algorithms

Here you'll find a list of main Financial applications for each Algorithm. Because each Algorithm already has Specific uses this list may look redundant. If you are confused and think this information is redundant, criticize the repetition in each application to clarify them.

If you are re-reading this article, you'll likely want to focus on Applications.

  1. The Moving Average is used for Trend Identification. 1st, it indicates the direction and strength of trends, which is crucial for sentiment analysis. Then, they act as dynamic support and resistance levels, providing traders with entry and exit points. Finally, they indicate Crossover Signals, namely Moving average crossovers. The most common being the golden cross (growth) or death cross (decrease), which are used to generate trading signals. Moving Average = Sentiment, Support/Resistance & Crossover
  2. Bollinger Bands expand and contract based on volatility, enabling you to assess the volatility levels and anticipate price breakouts. Bollinger Bands are used to identify periods of low volatility (narrow bands) followed by price expansion (wide bands), indicating potential breakout opportunities. When prices extend beyond the upper or lower Bollinger Bands, it suggests overbought or oversold conditions, potentially signaling a mean reversion trade. Bollinger = Volatility, Breakout & Reversion
  3. Moving Average Convergence Divergence crossovers, specifically the signal line crossing MACD line, are used to identify Trend Reversals and generate trade signals. It is also used for Divergence Analysis. Divergences between it and price movement indicate trend weakness/strength. It is finally used for Histogram Analysis. Its histogram, representing the difference between the MACD and the signal line, is used to gauge the momentum of price movements. MACD = Trend Reversal, Divergence, Histogram
  4. Relative Strength Index indicates overtrades: Overbought and Oversold Conditions. Above 70 it indicates overbought conditions, suggesting a price decrease. Below 30 it indicates oversold conditions, signaling a price rebound. Divergences between Relative Strength & price movement indicates reversals or continuations. Relative Strength are used to confirm trends strength. Higher RSI values during uptrends indicate strong buying pressure. Lower RSI values during downtrends indicate strong selling pressure. Relative Strength Index = Overtraded, Divergence, Trend Strength
  5. Arbitrage involves finding trade rate/price/volume/behavior inconsistencies between multiple items to generate profits. Market-making algorithms identify price/volume discrepancies across different trades and profit from the bid-ask spread. This may contribute to market liquidity. These algorithms have a Comparative Probabilistic basis. They commonly work through Triangulation. Arbitrage = Profit from irregularity in Behavior or Asset's price/volume/valuation
  6. Pair Trading is a special form of probabilistic Arbitrage. It involves taking long/short positions in 2 correlated instruments to profit from their price movements while remaining market-neutral. Pair trading can be used as a hedging strategy to mitigate market risk. This is called Hedge Against Market risk. By taking offsetting positions, you can reduce exposure to systemic movements. This is essential to identify price imbalances. Pair Trading = Correlation, Hedging & Offsets
  7. Mean reversion involve identifying assets that have deviated from their mean values and taking positions to profit from price correction. Mean reversion identifies Extrema in overbought/sold conditions, indicating when to enter or exit. You can take advantage of price dislocations by expecting the price to revert back to its mean, generating profit from its correction. Mean Reversion = Deviation, Extrema & Dislocation
  8. Trend strategies aim to profit from sustained trends over month or years by taking positions based on their strength/direction. Trend following algorithms identify and capitalize on significant price movements. They rely on momentum indicators to confirm the sustainability of a trend. Note that Direction seems to be be more important than Strength. Trend = Sustainability, Movement & Momentum
  9. Monte Carlo simulations are used to model and price options. They factor underlying asset price movements, volatility, and interest rates. Their main use if Portfolio Optimization : they can be employed to assess performance/risk characteristics of portfolios, aiding in the optimization of asset allocations. This is essential to Assess Risk under adverse market conditions. Monte Carlo = Volatility, Portfolio, Allocation
  10. Kalman filters are used to identify hidden states of volatility or price movements. Therefore, they can be used to Track Prices. This operation is performed by filtering Noisy information. They can remove noise to average information. This improve the accuracy of time series predictions. Kalman Filter = State Estimation, Price Tracking, Noise Filtering
  11. Genetic Algorithms optimize other algorithms. This is essential to determine moving average lengths or stop-loss levels for your trading assistant. Therefore, they are used for Feature Selection in Portfolio Optimization. Genetic = Self-Renewing, Features & Coordinating
  12. Machine Learning Algorithms optimize models: other Algorithms, Design Patterns or Neural Networks. One of their main uses is Sentiment Analysis. They can analyze text, tables or images based on sentiment indicators. Machine = Self-Renewing, Features/Anomalies & Sentiment

How to Practice these Algorithms

In a code editor

Create a folder in your Documents named 'Trading Assistants'. In this folder create 3 subfolders named 'Trading Assistant - Test', 'Trading Assistant - Tools' & 'Trading Assistant - Deployed'.

At least 3 days per week, update your algorithms, patterns and networks. If you want an easily accessible, online option you can use google collaboratory.

On paper

As for Data Structures, define these algorithms in your dedicated notebook following the Nature/Function/Uses pattern. If you can already, note sample code for each structure. Copy ours if it helps you.

At least 3 days per week, do a Logic or Algorithmic exercise. Then try to find a corresponding Software Development problem/feature and see if you need to implement it in your Assistant.


Additional Data Structures & Algorithms for Finance to consider

Data Structures

  1. Bloom filters are probabilistic data structures used for membership testing. Their Function is to provide a space-efficient representation of a set to determine its likely elements. It is used to filter & remove duplicates in anti-money laundering systems.
  2. Count-Min Sketch is a probabilistic data structure used to approximate frequency counting. It estimates the frequency of elements, even with limited memory. It is used to tracking order flow and analyze volumes.
  3. Quotient Filter is a probabilistic data structure used for set membership queries. It provides membership testing with low false positive rates and supports deletions. It is used for Symbol lookups, order books, and transaction records.

Algorithms

  1. Random forests are ensemble of decision trees used to make predictions or classifications. They improve the accuracy/consistency of predictions by reducing overfitting and incorporating multiple trees information. They're used for Stock price prediction, credit risk modeling & fraud detection.
  2. Dynamic programming is an algorithmic technique used to divide problems into overlapping subproblems. Its Function is to optimizes resource allocation by finding 1 optimal sequence of decisions. Its common uses are Portfolio optimization, option pricing & risk management.
  3. Longest Common Subsequence finds the longest subsequence common to two or more sequences. In pairs trading and correlation analysis it identifies similar price patterns/trends.
  4. Network Analysis represents relationships between financial entities, as a network or graph. It detects model contagion effects. It's used in Risk assessment, systemic risk analysis & portfolio diversification.
  5. Stochastic Gradient Descent is an optimization algorithm iteratively updating model parameters. Today, you'll likely to use the Rectified Linear Unit, its updated version. Both are used to Train machine learning models.
  6. HyperLogLog is a probabilistic approximator used to define the cardinality of a set or multiset. It offers a memory-efficient way to count elements. You can guess by now that Duplicate detection & Counting are used in Optimization and Security. HyperLogLog has 2 unusual uses : Counting unique users and analyzing market liquidity.

You may have noticed a redundancy in the Uses of each Tool presented in this article. Since, we are focused on Buying/Selling signals through Trends in order to update our Portfolio all uses will be subdivisions of these 3 concerns.

Optimizing your Portfolio is your main Goal.

Optimizing your Portfolio is your main Goal.

Additional Resources to learn about this topic

  1. Python Programming for Economics and Finance by Thomas Sargent & John Stachurski
  2. The Book of Proof by Richard Hammack
  3. Discrete Mathematics : An Open Introduction by Oscar Levin
  4. "Schaum's Outline of Python Programming" by John R. Hubbard and Anne B. Kromer (paid)
  5. "Schaum's Outline of Data Structures with C++" by John Hubbard (paid)

If you can, form a Practice group

A person for each topic

It will be more sustainable to practice in group. If each of you query each other, you're likely to find more unusual questions & answers and ultimately develop new software structures.

Ideally, try to find a person interested in each of the following topic :

  • Interface, Accessibility & Ergonomics
  • Algorithmic, Logic & Games
  • Deep Learning & Probabilities/Statistics
  • Computer Architecture, Security & Networks

Each step of the Process

The goal is to give you an overview of the production and deployment of your software.

Being able to solicitate people in each step makes your learning more synergistic.

Ideally, you want to be able to solicitate future customers to receive feedback at each step.


Code Samples

In upcoming Articles, you'll find code samples of each Structure & Algorithm listed here. This is also true for the upcoming articles on Design patterns & Neural Networks.

The Goal is to purvey a repertoire for your tests.

Most of them will be written in Python. In latter updates examples in JavaScript, C# and C++ will be added.


abhimanyu kumar

Quant Prompt Machine Learning

3 个月

thank you

回复

要查看或添加评论,请登录

Farid BAHRI (Premium Insurance Ghostwriting)的更多文章

社区洞察

其他会员也浏览了