登录查看更多内容

Day-Trading with AI: When to Hold, When to Fold, and When to Not Play!

Andrew (Andy) Carl

Senior Machine Learning Engineer / Senior Data Scientist at Boeing

发布日期: 2020年11月16日

+ 关注

Market Clustering with Transdimensional Machine Learning

TLDR:

The ability to characterize market day-trading with TML enables tailoring strategies to specific market “personalities”. These “personalities” render it seemingly impossible to trade profitably approximately 40 percent of trading days!

Background Post: The New AI Gold Rush — Transdimensional Machine Learning (Pan Provided!)

A Jupyter Notebook is provided including code in the GitHub repository HERE.

An Example:

For a given underlying instrument, 10 minute OHLC candles were developed from tick data covering an approximate 10 year period.
Backtesting was performed on the candle data using a baseline “Buy/Sell” indicator for (9) cases, for each day, of fixed stop loss/ATR based stop loss/no stop loss instances.
Each case was optimized for the baseline indicator hyperparameter, asymmetrically for both LONG and SHORT positions, for both BEST and WORST day P/L, recording max and min day trading P/L, number trades and associated optimal hyperparameter value.
For each day, the results for each (9) cases were averaged to form the “raw-data” vector for each day.
Preliminary data exploration was performed using TML establishing the probable number of underlying clusters to be approximately (9).
The “Fitness Function” was modified to help focus attention on (9) cluster solutions during the Hybrid-NEAT evolutionary process.

Resulting “Typical” Day-Trading Market “Personalities” by relative size:

?“BEFORE” Applying TML (12-D to 2-D via tSNE):

time_start = time.time()
tsne = TSNE(n_components=2, verbose=1, perplexity=100, n_iter=1000)
tsne_results_orig = tsne.fit_transform( data_orig )
print('t-SNE done! Time elapsed: {} seconds'.format(time.time()-time_start))

df_subset['tsne-2d-one'] = tsne_results_orig[:,0]
df_subset['tsne-2d-two'] = tsne_results_orig[:,1]

plt.figure(figsize=(16,10))
sns.scatterplot(
    x="tsne-2d-one", y="tsne-2d-two",
    hue="y",
    palette=['purple','red','darkcyan','brown','blue', 'dodgerblue','green','lightgreen', 'black'],
    data ?=df_subset,
    legend="full",
    alpha=0.3 )

Applying TML (12-D to 1000-D via TML):

...
metric = "jaccard"
n_neighbors_max = 100
n_neighbors_min = 2
min_dist_max = 0.99
min_dist_min = 0.0
n_components_max = 1000
n_components_min = 1
min_samples_max = 1000
min_samples_min = 2
min_cluster_size_max = 2
min_cluster_size_min = 2
...
if ( num_clusters_found == 9 ):
    genome.fitness = 10000.0 / abs( clustered_COMB_sum_SE + 1)
elif ( num_clusters_found == 0 ):
    genome.fitness = -99999.0
else:
    genome.fitness = 10000.0 / abs( clustered_COMB_sum_SE + 1) — ( abs( num_clusters_found — 9 ) * 1000.0 )
...
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
New best_fitness_so_far =  -2984.7710672587614 1
New best: metric                    =  jaccard
New best: n_neighbors               =  98
New best: min_dist                  =  0.06658783809866256
New best: n_components              =  1000
New best: min_samples               =  3
New best: min_cluster_size          =  2
New best: cluster_selection_epsilon =  0.6658783809866257
OUT: num_clusters_found              =  12
OUT: ratio_clustered                 =  1.0
OUT: clusterer_probabilities_sum     =  0.9558447965277097
OUT: clusterer_probabilities_sum_SE  =  184.0931575208609
OUT: clusterer_outlier_scores_sum    =  0.13366803680011266
OUT: clusterer_outlier_scores_sum_SE =  471.55167569985406
OUT: clustered_COMB_sum_SE           =  655.644833220715
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
…
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
New best_fitness_so_far = 182.75239956493343 104
New best: metric = jaccard
New best: n_neighbors = 100
New best: min_dist = 0.9899882983797373
New best: n_components = 1000
New best: min_samples = 2
New best: min_cluster_size = 2
New best: cluster_selection_epsilon = 9.899882983797372
OUT: num_clusters_found = 9
OUT: ratio_clustered = 1.0
OUT: clusterer_probabilities_sum = 0.9978606463926271
OUT: clusterer_probabilities_sum_SE = 1.649803561162849
OUT: clusterer_outlier_scores_sum = 0.03316588079964379
OUT: clusterer_outlier_scores_sum_SE = 52.31724504377773
OUT: clustered_COMB_sum_SE = 53.96704860494058
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

“AFTER” Applying TML (1000-D to 2-D via tSNE):

time_start = time.time()
raw_data = fit_HDBSCAN._raw_data
tsne = TSNE(n_components=2, verbose=1, perplexity=100, n_iter=1000)
tsne_results_1 = tsne.fit_transform( raw_data )
print('t-SNE done! Time elapsed: {} seconds'.format(time.time()-time_start))

df_subset['tsne-2d-one'] = tsne_results_1[:,0]
df_subset['tsne-2d-two'] = tsne_results_1[:,1]

plt.figure(figsize=(16,10))
sns.scatterplot(
    x="tsne-2d-one", y="tsne-2d-two",
    hue="y",
    palette=['purple','red','darkcyan','brown','blue', 'dodgerblue','green','lightgreen', 'black'],
    data ?=df_subset,
    legend="full",
    alpha=0.3 )

...
unique_elements, counts_elements = np.unique(fit_HDBSCAN.labels_, return_counts=True)
print(“Frequency of unique values of the said array:”)
print(np.asarray((unique_elements, counts_elements)))
# Frequency of unique values of the said array:
# [[ 0 1 2 3 4 5 6 7 8]
# [638 893 269 19 41 23 486 159 225]]
threshold = pd.Series(fit_HDBSCAN.outlier_scores_).quantile(0.9)
# threshold = 0.09822259079456185
outliers = np.where(fit_HDBSCAN.outlier_scores_ > threshold)[0]
sns.distplot(fit_HDBSCAN.outlier_scores_[np.isfinite(fit_HDBSCAN.outlier_scores_)], rug=True)
...

LONG Trade characteristics, avoiding LONG trades 38.5 percent of trading days associated w/ Cluster ID’s 6, 7, 8 and 9.

SHORT Trade characteristics, avoiding SHORT trades 40.3 percent of trading days associated w/ Cluster ID’s 0, 2, 5 and 8.

Summary:

Avoid trading LONG 38 percent of trading days associated w/ Cluster ID’s 6, 7, 8 and 9.
Avoid trading SHORT 40 percent of trading days associated w/ Cluster ID’s 0, 2, 5 and 8.
The market has multiple “personalities”, rendering a single “one-size-fits-all” strategy inadequate!
The ability to identify individual market “personalities” enables tailoring strategies to specific “persona” in pursuit of profitability.
Risk management dictates knowing “How-to-play”, but more importantly, knowing “When-not-to-play”!
A raw data vector of 12-Dimensions was transformed into a 1000-Dimension vector to achieve cluster separation using TML, then transformed back into a 2-D vector using tSNE for visualization purposes.
Transdimensional Machine Learning (TML) could be defined as the holistic application perspective viewing data, metric selection/creation, manifold mapping, AI/ML/DL tool selection, and fitness function determination, driven only by the specifics of the intended use-case, and more importantly, independent of concern of the dimensionality of the underlying raw data and manifold mapping dimension.

Inspiration:

UMAP, Leland McInnes
HDBSCAN, Leland McInnes, John Healy, Steve Astels
NEAT, Kenneth Stanley
How to Tune Hyperparameters of tSNE, Nikolay Oskolkov

About Andrew (Andy) Carl:

The enthusiastic developer of the “GitHub AI Brain-of-Brains” and “GITHUB2VEC” NLP productivity tools. A passionate multi-discipline Aerospace Mechanical Engineer with extensive experience integrating Artificial Intelligence, Hybrid Reinforcement Machine Learning (Hybrid-NEAT), data science and multi-discipline based simulation in Hybrid Reinforcement Learning based Optimization (Hybrid-NEAT), design and analysis of complex air, space and ground-based systems and engineering tool development.

Andy’s “GitHub AI Brain-of-Brains”
Andy’s Online Brain
Andy on Linkedin
Andy on GitHub
Andy on Computer-Controlled Baseball Pitching Machines :)

Original Post: https://medium.com/@andycarl_40001/day-trading-with-ai-when-to-hold-when-to-fold-and-when-to-not-play-42743fddcdc

Day-Trading with AI: When to Hold, When to Fold, and When to Not Play!

Andrew (Andy) Carl

Senior Machine Learning Engineer / Senior Data Scientist at Boeing

Market Clustering with Transdimensional Machine Learning

TLDR:

An Example:

Summary:

Inspiration:

About Andrew (Andy) Carl:

更多精彩文章

社区洞察

其他会员也浏览了

Artificial Intelligence #141

Data Science / Machine Learning: Thoughts AND Quotes

#Artificial Intelligence #25 - My challenges with the definition of data centric vs model centric

Data Phoenix Digest - ISSUE 3.2024

Issue #208 - THE ML ENGINEER???

An Introduction to Z-Streams (and Collective Microprediction)

Machine Learning - Hyperparameter Tuning

Machine learning as a competitive advantage

Futurist: My Machine Intelligence Learning model

AI in 2024 - some predictions

Market Clustering with Transdimensional Machine Learning

TLDR:

An Example:

Summary:

Inspiration:

About Andrew (Andy) Carl:

Day-Trading with AI: A Case Study In Higher Profits Using Transdimensional Machine Learning (TML)

2021年4月9日

The New AI Gold Rush — Transdimensional Machine Learning (Pan Provided!)

2020年7月12日

社区洞察

其他会员也浏览了

Artificial Intelligence #141

Data Science / Machine Learning: Thoughts AND Quotes

#Artificial Intelligence #25 - My challenges with the definition of data centric vs model centric

Data Phoenix Digest - ISSUE 3.2024

Issue #208 - THE ML ENGINEER???

An Introduction to Z-Streams (and Collective Microprediction)

Machine Learning - Hyperparameter Tuning

Machine learning as a competitive advantage

Futurist: My Machine Intelligence Learning model

AI in 2024 - some predictions