登录查看更多内容

Steepest Decent v's ADAM

Colman M.

Software Developer

发布日期: 2024年7月26日

The Steepest Descent method is an optimization technique used to find the minimum of a function. Imagine you're hiking downhill towards the lowest point of a valley. In this method, you always move in the direction of the steepest descent, or the negative gradient, to reach the minimum point.

Key Equations

Gradient (?f(x)\nabla f(x)?f(x)): The vector pointing in the direction of the greatest rate of increase of the function f(x).
Update Rule: x_{k+1} = x{k} ? α?f(x_k) x_k: Current position α: Step size or learning rate, a small positive number determining the size of the step.?f(x_k): Gradient at the current position.

Simple Example

If you're trying to minimize f(x)=x^2 , the gradient is ?f(x)=2x. Starting from an initial guess x_0, you move in the direction of the negative gradient, ?2x_0, to find the minimum at x=0.

This method is fundamental in machine learning and finance for minimizing cost functions or optimizing portfolios.

This understanding is key in optimization, particularly in methods like Steepest Descent, where the negative gradient points to the steepest descent direction to minimize the function.

1. Prediction Vector in Finance:

In finance, the prediction vector y\mathbf{y}y could represent forecasted asset prices or returns, while yactual\mathbf{y}_{\text{actual}}yactual represents observed values. The goal is to minimize the error between these predictions and actual values.

2. Objective Function:

The objective function f(x)f(x)f(x) could be the sum of squared errors (least squares):

where y_i is the predicted value and yactual, is the actual value.

3. Gradient Calculation:

The gradient ?f(x_k) is calculated at the current parameter vector xkx_kxk (which could represent model parameters). For least squares, the gradient is:

This vector indicates the direction of the steepest increase in error.

4. Steepest Descent Update Rule:

To minimize f(x), we move in the direction opposite to the gradient:

where η is the learning rate. This update rule iteratively adjusts xkx_kxk to reduce the prediction error.

领英推荐

FROM LINEAR REGRESSION TO ANOVA, A CONSTANT VARIABLE

MBSoft 1 年前

A view of 4SIGHT's 2022 integrated report

Nduvho Kutama (MPhil Corporate Strategy, ACMA, CGMA) 1 年前

Kalman Filter: The first dive

Aleksey Kochetkov 1 年前

Example:

Suppose our model predicts stock prices and we use parameters x_k to adjust our model. If the actual price of a stock is $100 and our prediction is $90, the error is $10. The gradient calculated with respect to our model parameters indicates how much each parameter contributed to this error. Using steepest descent, we adjust these parameters to minimize the difference between predicted and actual prices.

This combination allows for iterative improvement of financial predictions, reducing the error in forecasts and thereby improving decision-making based on these predictions.

The steepest descent algorithm can fail to converge or converge very slowly under several conditions:

Poor Choice of Learning Rate (η\etaη):
Non-convex Cost Function:
Ill-conditioned Problem:
Incorrect Gradient Calculation:
High Dimensionality:
Plateaus or Saddle Points:
Noise in the Data:

For many modern applications, Adam (Adaptive Moment Estimation) is a more suitable optimization algorithm than steepest descent. Adam combines the best features of the AdaGrad and RMSProp algorithms to handle sparse gradients on noisy data. It maintains per-parameter learning rates that are adapted based on the first and second moments of the gradients. This method typically converges faster and more reliably than steepest descent, especially on complex, non-convex problems, making it popular in deep learning frameworks.

Adam Update Rule

Gradient estimation: Compute gradients of the objective function.

First moment (mean) estimate

Second moment (uncentered variance) estimate:

Bias correction:

Parameter update:

Where:

g_t is the gradient at time step t.
m_t and v_t are the estimates of the first and second moments.
β_1 and β_2 are decay rates for the moment estimates.
η is the learning rate.
? is a small constant to prevent division by zero.

Adam's adaptive learning rates and momentum components make it highly effective for large-scale machine learning and deep learning tasks.

要查看或添加评论，请登录

Colman M.的更多文章

CPU Optimization

2025年1月16日

CPU Optimization

Super Scalar & SIMD Architectures: Modern CPUs can handle more operations per cycle, but only if workloads are…
T7 EOBI with a Custom SharedPtr

2024年11月26日

T7 EOBI with a Custom SharedPtr

Setting Up Custom Shared Pointer A manages order book updates and execution data coming from the T7 EOBI feed, allowing…
Building a Compliance Module

2024年11月26日

Building a Compliance Module

Key Features for Compliance in HFT Order Validation: Ensure all orders comply with regulatory rules (e.g.
Warming Up an HFT System: Pre-Trading with a Custom SharedPtr and QuantLib

2024年11月26日

Warming Up an HFT System: Pre-Trading with a Custom SharedPtr and QuantLib

HFT systems demand extreme performance and reliability. Before the trading day begins, these systems often require a…
Order Book with Custom shared_ptr

2024年11月26日

Order Book with Custom shared_ptr

Shared Order Representation Use to manage orders efficiently and safely across multiple threads. Lock-Free Order Book A…
Lock-Free shared_ptr

2024年11月26日

Lock-Free shared_ptr

Use Lock-Free Reference Counting Spinlocks, while effective, can be too slow for HFT. Instead, a lock-free reference…
Build a shared_ptr

2024年11月26日

Build a shared_ptr

Define the Control Block with Atomic Reference Counting Use atomic integers for thread-safe reference counting…
To turn AWS-based trading systems on/off or to dynamic

2024年11月24日

To turn AWS-based trading systems on/off or to dynamic

EC2 Instances for Trading Infrastructure Turn Down Trading System Terminate EC2 Instances Move Trading System to a New…
Unifying Market Data Formats Across Global Exchanges

2024年11月24日

Unifying Market Data Formats Across Global Exchanges

Market data integration is a cornerstone of building efficient and robust trading systems. Exchanges like Deutsche…

3 条评论
Trading Strategies: From Simplicity to Code

2024年11月24日

Trading Strategies: From Simplicity to Code

Mean-Reversion When you stretch a rubber band (price goes up or down a lot), it wants to snap back to its normal shape.…

See all articles

Steepest Decent v's ADAM

Colman M.

Software Developer

Key Equations

Simple Example

1. Prediction Vector in Finance:

2. Objective Function:

3. Gradient Calculation:

4. Steepest Descent Update Rule:

领英推荐

Example:

Adam Update Rule

Colman M.的更多文章

社区洞察

其他会员也浏览了

SHAP is not all you need (or why you should always use permutation feature importance)

Transformation by Hugging Face

Building a model? Here is the first question you should ask

A Tutorial on Ridge and Lasso Regression

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

MLflow: a better way to track your models

Why are Confidence Regions Elliptic? Simple Explanation

Fun with Graphing in Power BI - Part 3i

?? Dimensionality Reduction: PCA vs. T-SNE ??

Convex Optimization

Key Equations

Simple Example

1. Prediction Vector in Finance:

2. Objective Function:

3. Gradient Calculation:

4. Steepest Descent Update Rule:

领英推荐

Example:

Adam Update Rule

Colman M.的更多文章

CPU Optimization

T7 EOBI with a Custom SharedPtr

Building a Compliance Module

Warming Up an HFT System: Pre-Trading with a Custom SharedPtr and QuantLib

Order Book with Custom shared_ptr

Lock-Free shared_ptr

Build a shared_ptr

To turn AWS-based trading systems on/off or to dynamic

Unifying Market Data Formats Across Global Exchanges

Trading Strategies: From Simplicity to Code

社区洞察

其他会员也浏览了

SHAP is not all you need (or why you should always use permutation feature importance)

Transformation by Hugging Face

Building a model? Here is the first question you should ask

A Tutorial on Ridge and Lasso Regression

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

MLflow: a better way to track your models

Why are Confidence Regions Elliptic? Simple Explanation

Fun with Graphing in Power BI - Part 3i

?? Dimensionality Reduction: PCA vs. T-SNE ??

Convex Optimization