Day-Trading with AI: When to Hold, When to Fold, and When to Not Play!
Andrew (Andy) Carl
Senior Machine Learning Engineer / Senior Data Scientist at Boeing
Market Clustering with Transdimensional Machine Learning
TLDR:
The ability to characterize market day-trading with TML enables tailoring strategies to specific market “personalities”. These “personalities” render it seemingly impossible to trade profitably approximately 40 percent of trading days!
Background Post: The New AI Gold Rush — Transdimensional Machine Learning (Pan Provided!)
A Jupyter Notebook is provided including code in the GitHub repository HERE.
An Example:
- For a given underlying instrument, 10 minute OHLC candles were developed from tick data covering an approximate 10 year period.
- Backtesting was performed on the candle data using a baseline “Buy/Sell” indicator for (9) cases, for each day, of fixed stop loss/ATR based stop loss/no stop loss instances.
- Each case was optimized for the baseline indicator hyperparameter, asymmetrically for both LONG and SHORT positions, for both BEST and WORST day P/L, recording max and min day trading P/L, number trades and associated optimal hyperparameter value.
- For each day, the results for each (9) cases were averaged to form the “raw-data” vector for each day.
- Preliminary data exploration was performed using TML establishing the probable number of underlying clusters to be approximately (9).
- The “Fitness Function” was modified to help focus attention on (9) cluster solutions during the Hybrid-NEAT evolutionary process.
Resulting “Typical” Day-Trading Market “Personalities” by relative size:
?“BEFORE” Applying TML (12-D to 2-D via tSNE):
time_start = time.time() tsne = TSNE(n_components=2, verbose=1, perplexity=100, n_iter=1000) tsne_results_orig = tsne.fit_transform( data_orig ) print('t-SNE done! Time elapsed: {} seconds'.format(time.time()-time_start)) df_subset['tsne-2d-one'] = tsne_results_orig[:,0] df_subset['tsne-2d-two'] = tsne_results_orig[:,1] plt.figure(figsize=(16,10)) sns.scatterplot( x="tsne-2d-one", y="tsne-2d-two", hue="y", palette=['purple','red','darkcyan','brown','blue', 'dodgerblue','green','lightgreen', 'black'], data ?=df_subset, legend="full", alpha=0.3 )
Applying TML (12-D to 1000-D via TML):
... metric = "jaccard" n_neighbors_max = 100 n_neighbors_min = 2 min_dist_max = 0.99 min_dist_min = 0.0 n_components_max = 1000 n_components_min = 1 min_samples_max = 1000 min_samples_min = 2 min_cluster_size_max = 2 min_cluster_size_min = 2 ... if ( num_clusters_found == 9 ): genome.fitness = 10000.0 / abs( clustered_COMB_sum_SE + 1) elif ( num_clusters_found == 0 ): genome.fitness = -99999.0 else: genome.fitness = 10000.0 / abs( clustered_COMB_sum_SE + 1) — ( abs( num_clusters_found — 9 ) * 1000.0 ) ... $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ New best_fitness_so_far = -2984.7710672587614 1 New best: metric = jaccard New best: n_neighbors = 98 New best: min_dist = 0.06658783809866256 New best: n_components = 1000 New best: min_samples = 3 New best: min_cluster_size = 2 New best: cluster_selection_epsilon = 0.6658783809866257 OUT: num_clusters_found = 12 OUT: ratio_clustered = 1.0 OUT: clusterer_probabilities_sum = 0.9558447965277097 OUT: clusterer_probabilities_sum_SE = 184.0931575208609 OUT: clusterer_outlier_scores_sum = 0.13366803680011266 OUT: clusterer_outlier_scores_sum_SE = 471.55167569985406 OUT: clustered_COMB_sum_SE = 655.644833220715 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ … $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ New best_fitness_so_far = 182.75239956493343 104 New best: metric = jaccard New best: n_neighbors = 100 New best: min_dist = 0.9899882983797373 New best: n_components = 1000 New best: min_samples = 2 New best: min_cluster_size = 2 New best: cluster_selection_epsilon = 9.899882983797372 OUT: num_clusters_found = 9 OUT: ratio_clustered = 1.0 OUT: clusterer_probabilities_sum = 0.9978606463926271 OUT: clusterer_probabilities_sum_SE = 1.649803561162849 OUT: clusterer_outlier_scores_sum = 0.03316588079964379 OUT: clusterer_outlier_scores_sum_SE = 52.31724504377773 OUT: clustered_COMB_sum_SE = 53.96704860494058 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
“AFTER” Applying TML (1000-D to 2-D via tSNE):
time_start = time.time() raw_data = fit_HDBSCAN._raw_data tsne = TSNE(n_components=2, verbose=1, perplexity=100, n_iter=1000) tsne_results_1 = tsne.fit_transform( raw_data ) print('t-SNE done! Time elapsed: {} seconds'.format(time.time()-time_start)) df_subset['tsne-2d-one'] = tsne_results_1[:,0] df_subset['tsne-2d-two'] = tsne_results_1[:,1] plt.figure(figsize=(16,10)) sns.scatterplot( x="tsne-2d-one", y="tsne-2d-two", hue="y", palette=['purple','red','darkcyan','brown','blue', 'dodgerblue','green','lightgreen', 'black'], data ?=df_subset, legend="full", alpha=0.3 )
... unique_elements, counts_elements = np.unique(fit_HDBSCAN.labels_, return_counts=True) print(“Frequency of unique values of the said array:”) print(np.asarray((unique_elements, counts_elements))) # Frequency of unique values of the said array: # [[ 0 1 2 3 4 5 6 7 8] # [638 893 269 19 41 23 486 159 225]] threshold = pd.Series(fit_HDBSCAN.outlier_scores_).quantile(0.9) # threshold = 0.09822259079456185 outliers = np.where(fit_HDBSCAN.outlier_scores_ > threshold)[0] sns.distplot(fit_HDBSCAN.outlier_scores_[np.isfinite(fit_HDBSCAN.outlier_scores_)], rug=True) ...
LONG Trade characteristics, avoiding LONG trades 38.5 percent of trading days associated w/ Cluster ID’s 6, 7, 8 and 9.
SHORT Trade characteristics, avoiding SHORT trades 40.3 percent of trading days associated w/ Cluster ID’s 0, 2, 5 and 8.
Summary:
- Avoid trading LONG 38 percent of trading days associated w/ Cluster ID’s 6, 7, 8 and 9.
- Avoid trading SHORT 40 percent of trading days associated w/ Cluster ID’s 0, 2, 5 and 8.
- The market has multiple “personalities”, rendering a single “one-size-fits-all” strategy inadequate!
- The ability to identify individual market “personalities” enables tailoring strategies to specific “persona” in pursuit of profitability.
- Risk management dictates knowing “How-to-play”, but more importantly, knowing “When-not-to-play”!
- A raw data vector of 12-Dimensions was transformed into a 1000-Dimension vector to achieve cluster separation using TML, then transformed back into a 2-D vector using tSNE for visualization purposes.
- Transdimensional Machine Learning (TML) could be defined as the holistic application perspective viewing data, metric selection/creation, manifold mapping, AI/ML/DL tool selection, and fitness function determination, driven only by the specifics of the intended use-case, and more importantly, independent of concern of the dimensionality of the underlying raw data and manifold mapping dimension.
Inspiration:
- UMAP, Leland McInnes
- HDBSCAN, Leland McInnes, John Healy, Steve Astels
- NEAT, Kenneth Stanley
- How to Tune Hyperparameters of tSNE, Nikolay Oskolkov
About Andrew (Andy) Carl:
The enthusiastic developer of the “GitHub AI Brain-of-Brains” and “GITHUB2VEC” NLP productivity tools. A passionate multi-discipline Aerospace Mechanical Engineer with extensive experience integrating Artificial Intelligence, Hybrid Reinforcement Machine Learning (Hybrid-NEAT), data science and multi-discipline based simulation in Hybrid Reinforcement Learning based Optimization (Hybrid-NEAT), design and analysis of complex air, space and ground-based systems and engineering tool development.
- Andy’s “GitHub AI Brain-of-Brains”
- Andy’s Online Brain
- Andy on Linkedin
- Andy on GitHub
- Andy on Computer-Controlled Baseball Pitching Machines :)
Original Post: https://medium.com/@andycarl_40001/day-trading-with-ai-when-to-hold-when-to-fold-and-when-to-not-play-42743fddcdc