"Transforming Asset Prices into Tertiary and Binary Sequences: An Intuitive Approach"
Asset prices in their raw form are essentially an array of numbers indexed by time. We ask what would be the most natural way to convert these into a tertiary or binary sequence? If one recalls the trinomial tree from option pricing, we have at each node an up (u), down (d) factor and m=1 factor (when the asset price does not change at the next node) and to get the potential values of the next node we multiply the current asset price by these respective factors (see below).
Throughout we assume no discounting or dividends and will use the mid-price as our asset price. The above factors are determined by 2 inputs, the volatility of the asset σ and the size of the user specified time step Δt and are given by the formula u = e^σ√2Δt, ??d= 1/u and m=1.
However, we will not be needing to calibrate risk-neutral pu, pd, pm for this application as we will be observing the real world outcomes of these trials, although we can nicely back out real world probabilities from our observed tertiary sequences where our paths will now encode what is essentially observations of a sequence of rolls of a (possibly biased) 3 sided die, we can also find path dependent (Bayesian) probabilities if we have enough data to make these meaningful, for example we can find the observed probability of an up move given a particular sequence of previous moves.
Let's start with a simple model (which we later extend to include a time dependent σ) to calculate the tertiary sequence for time interval specified by the user, in this example let it be daily closing prices i.e. Δt = 1 day:
First, find the most recent tick before our specified daily closing time call it P(t) and the closest tick to the previous closing time which will we call our starting tick S(t), calculate
P(t) - u*S(t) ?if this ≥ 0,
we append a 1 onto our array (1,...) and we update our starting tick S(t+1) = P(t) vice versa
if P(t) - d*S(t) ≤ 0,
we append a -1 onto our array (-1,...) and we update our new starting tick S(t+1) = P(t)
领英推荐
If the price S(t)*d < P(t) < S(t)*u we append a 0 into our array (0,...) and again our new starting tick S(t+1) = P(t).
What is important here is a choosing volatility factor that captures the size of the moves between timesteps, i.e. if σ is too small, you will lose information about the size of the moves and you will append a 1 or -1 if the upward or downward change was x or if the move was 10x, both these scenario's give us a 1 and -1 by the above logic, but one move is 10 times the size of the other, yet we have no information from our sequence that captures this.
How do we go about improving this? If we log a 0 at t we can take away the residual, S(t+1)= P(t)- S(t) so the next starting tick incorporates the previous move, and we could also append multiple 1s and -1s to the array at each time step if the price move is a positive (1) or negative (-1) integer multiple of Su or Sd and also incorporating any residual amount left over in the next starting tick, but these come at the expense of losing track of the time dimension from the length of the array i.e. t will no longer be the length of the array.
Another more intuitive way to capture realistic factors at each time step is to update the volatilities needed to calculate the multiplicative factors by backing out a volatility σ(t) measure from observed option prices of the asset in question, so our u,d at every node in the above simple model change at each node t becoming time-dependent ut, dt. Although we could also use GARCH, historical, a hybrid method or some function of order book imbalance at t to give us an estimate for σt (and therefore give us ut, dt) in hope of retaining as much information as possible about the size of the moves (if our volatility predictions are accurate), so when we have larger expected volatility our threshold for appending 1s and -1s to our sequence increase.
Note that we are also able to “flatten” these tertiary sequences into a binary sequence, but this costs us alignment of our sequences across the time dimension, for example if we have a tertiary sequences (1,-1,-1,0,0) this encodes a path over 5 time steps where t = the length of the array, if we flatten this to binary by removing the 0s and replacing the -1’s with 1s we get (1,0,0) although we will have the same cumulative sum of both arrays, this comes with a loss of the time dimension as we no longer have t= length of our array, in some cases this is fine and such a simplification can be useful for other applications when we only care about price movement.
So for given volatilities at each node, which we only need to for the calculation of the next bar and can dispose of, we have a tertiary sequence that approximates the assets path, great, so other than an nice way to back out observed probabilities what use is this?
How about if we stack multiple of these sequences upon each other where each row represents a different asset and its columns represent time intervals, this then becomes a matrix, call it A(t) whose column number grows with time for each new time interval, we could then then use an LLM type model to predict the next columns of this tertiary matrix which encodes all the asset paths we give it. Ideas comments and links to LLM type models and APIs that can predict matrix entries are welcome!
Link to Python code:
Head of Quantitative Research
1 年?? Matt Dancho ??this conversion works for any time series, instead of using implied volatility use Garch or historical.
Blockchain | Digital Assets | Building the future of capital markets and helping the convergence between TradFi and Web3 finance
1 年Very elegant