On Signal, Alpha and Strategy
Jakub Polec
20+ yrs in Tech & Finance & Quant | ex-Microsoft/Oracle/CERN | IT / Cloud Architecture Leader | AI/ML Data Scientist | SaaS & Fintech
In quantitative finance, it is critical to differentiate between "strategy," "signal," and "alpha," as they each represent distinct aspects of the investment process.
===> Full code of this example is at https://quantjourney.substack.com
Here's how the process might work in a simplified example:
Let’s see code for very simple signal:
def generate_signals(data):
# Example using Simple Moving Average (SMA) Crossover
short_window = 50
long_window = 200
# Calculate the short and long moving averages
short_ma = data['close'].rolling(window=short_window, min_periods=1).mean()
long_ma = data['close'].rolling(window=long_window, min_periods=1).mean()
# Generate signals: 1 for buy, -1 for sell, 0 for hold
signals = np.where(short_ma > long_ma, 1, 0)
signals = np.where(short_ma < long_ma, -1, signals)
return pd.Series(signals, index=data. Index)
And code for alpha, which can be calculated using a variety of methods. In the one below, we perform regression analysis, where the strategy's returns are regressed against the returns of a benchmark index. The intercept of this regression can be interpreted as the alpha, which represents performance that's independent of the market's movements.
from sklearn.linear_model import LinearRegression
def calculate_alpha(returns, benchmark_returns):
# Ensure both series are aligned by date and have the same length
aligned_returns = returns.align(benchmark_returns, join='inner')
# Prepare the data for regression
X = aligned_returns[1].values.reshape(-1, 1) # Benchmark returns
y = aligned_returns[0].values # Strategy returns
# Perform regression
model = LinearRegression().fit(X, y)
# The intercept is the alpha
alpha = model.intercept_
return alpha
# Example usage:
# Assuming 'strategy_returns' and 'benchmark_returns' are pd.Series of daily returns
strategy_alpha = calculate_alpha(strategy_returns, benchmark_returns)
I have played a bit on very simple strategy for daily-trading based on Bollinger-Bands, as in code:
领英推荐
def get_signals(df, min_volatility, max_buy_perc, min_sell_perc):
"""
Calculate the signals for the model.
:param ohlcv: DataFrame with the OHLCV data
:param min_volatility: minimum volatility to buy
:param max_buy_perc: maximum percentage to buy
:param min_sell_perc: minimum percentage to sell
"""
# Generate a copy of the OHLCV DataFrame to avoid modifying the original
#df = df.copy().reset_index(drop=True)
# Buy signal, we check volatility and close_percentage is lower than max_buy_perc (1, buy and 0 do nothing)
df['signal'] = np.where((df['volatility'] > min_volatility) & (df['close_percentage'] < max_buy_perc), 1, 0)
# Sell signal, we check volatiiy and close_percentage is higher than min_sell_perc, (-1 or previous signal)
df['signal'] = np.where((df['close_percentage'] > min_sell_perc), -1, df['signal'])
# Return rows but only with signals
result = df[df['signal'] != 0]
# remove two the same consecutive signals
result = result[result['signal'] != result['signal'].shift()]
# Remove the entry if it's sell and last if it's buy
if (len(result) > 0) and (result.iat[0, -1] == -1): result = result.iloc[1:]
if (len(result) > 0) and (result.iat[-1, -1] == 1): result = result.iloc[:-1]
# Adjusting PnL for commission and transaction costs
result['trade_amount'] = result['close'].shift() * (CASH // result['close'].shift())
result['transaction_cost'] = result['trade_amount'] * TRANSACTION_COST_PERCENTAGE
result['total_cost'] = COMMISSION_COST + result['transaction_cost']
# result['pnl'] = np.where(result['signal'] == -1,
# (result['close'] - result['close'].shift()) * (CASH // result['close'].shift()) - result['total_cost'],
# 0)
# Calculate the returns and the wins and losses
result['pnl'] = np.where(result['signal'] == -1, (result['close'] - result['close'].shift()) * (CASH // result['close'].shift()), 0)
result['wins'] = np.where(result['pnl'] > 0, 1, 0)
result['losses'] = np.where(result['pnl'] < 0, 1, 0)
# Return the result with -1
result = result[result['signal'] == -1]
#return result.drop(['signal', 'trade_amount', 'transaction_cost', 'total_cost'], axis=1)
return result.drop('signal', axis=1)
And used genetic algorithm to generate different ranges and volatility level to shape signals:
def run_generations():
with tqdm(total=GENERATIONS) as pbar:
# Create genetic algorithm instance
ga_instance = pygad.GA(
# The total number of generations (iterations) the genetic algorithm will evolve.
num_generations=GENERATIONS,
# Number of solutions to be selected as parents in the mating pool.
num_parents_mating=5,
# The fitness function to evaluate the fitness of each solution.
fitness_func=fitness_func,
# Number of solutions in the population.
sol_per_pop=SOLUTIONS,
# Number of genes in each solution.
num_genes=3,
# The space of each gene. Here, each gene has a space from 0 to 1 with a step of 0.0001.
gene_space=[
{'low': 0, 'high': 1, 'step': 0.0001},
{'low': 0, 'high': 1, 'step': 0.0001},
{'low': 0, 'high': 1, 'step': 0.0001}],
# The type of parent selection. 'sss' stands for steady-state selection.
parent_selection_type='sss',
# The type of crossover (mating). 'single_point' stands for single point crossover.
crossover_type='single_point',
# The type of mutation. 'random' stands for random mutation.
mutation_type='random',
# The number of genes to mutate.
mutation_num_genes=1,
# The number of parents to keep in the next population. -1 means all parents are kept.
keep_parents=-1,
# A random seed for reproducing the results.
random_seed=42,
# A callback function to be called after each generation. Here, it updates the progress bar.
on_generation=lambda _: pbar.update(1),
)
# Run the genetic algorithm
ga_instance.run()
# Return the best solution
return ga_instance.best_solution()[0]
That for AAPL from 2022-12-01 to 2023-12-23 gave:
These figures don't include transaction costs, commissions, and other related expenses.
Without considering these costs, the strategy appears effective, but it might not be as successful when these expenses are factored in.
As I used the PyGAD library (pygad.readthedocs.io ) it's worth adding that it's used for solving a variety of optimization problems. It's like having a powerful tool at your disposal that can help you find the best solutions to complex challenges. You define your problem, and #PyGAD takes care of the optimization process.
Imagine you have a problem, like finding the best parameters for a machine learning model or optimizing a business process. PyGAD can handle it! You just need to tell it what you want to optimize (your "fitness function"), and PyGAD will do the rest.
The rest is in the blog https://quantjourney.substack.com/ with full code.
Econ & ECE Junior @Bits Pilani | Ex: NCAER
9 个月Great Insight sir !!! Solved a few doubts .