登录查看更多内容

MLP (Keras) Optimizers for Discrete Problems

Yohanes Nuwara

Senior Data Scientist at Prores AS

发布日期: 2024年8月26日

Optimization problem occurs every time in our daily life. In engineering, optimization is widely in many setup and it is very impactful because it is directly related to business decision making. There are many pretty standard optimization techniques used in most industries. We may have heard about genetic algorithm, simulated annealing, Newton, Quasi-Newton, and so forth. However, recently I think we may have missed something.......

Few airlines tried to understand their optimum cost per trip vs. cost per seat in the 90s (Awad, 2009)

Neural Network Optimizers

We already know that a neural network is an optimization problem. And we know that without optimizers, the neural network cannot learn. It is one of the essential parameters that we have to control. It works by iteratively changing the weights of every neuron after calculating the gradient of the loss function with respect to these weights. Just a little note - these optimizers are modern and powerful!!

I wondered how MLP optimizers can be used to solve discrete optimization problem. Keras, a popular deep learning library for Python, provided variety of optimizers such as Adam, RMSProp, Adagrad, and so forth. So, I started to figure out how to do it. My experiment seemed to work and gave quite satisfactory result.

Case Study

The case study that was used for this work is provided by Shell in its 2024 edition hackathon about optimization of fleet management. Participants were given access to data about fleet costs. The main idea of the challenge is to minimize the fleet cost (cost of buying new fleets, cost of selling old fleets, cost of insurance, and cost of maintenance) and minimizing the CO2 emission footprint while satisfying the annual fleet travel demand. The optimization will seek the composition of fleet bought and sold per year.

I will not go into detail since the purpose of this article is to show how MLP optimizers for optimization problem works, you can access my notebook for free here.

1. Without any constraint

First is a simple problem. We do not put any constraint in it, purely to minimize the costs. We start by defining the input for cost functions. Since it is a framework of Keras, we need to convert all inputs as Tensorflow variables. Learn more about it here .

def inputCost(input_df, vehicles_merge_fuel_df, demand_df):
    num_vehicle_buy = tf.Variable(df_null['Buy'].values, dtype=tf.float32)
....
    # Produce tensor for other variables
    unit_fuel_consumption = tf.constant(merge_with_buy_df['Consumption (unit_fuel/km)'].values, dtype=tf.float32)
....
    return id, year, num_vehicle_buy, num_vehicle_use, ......

After that, we build the cost functions one by one.

def costFuel(num_vehicle_use, unit_fuel_consumption, ....):
    return ...

def costPurchase(num_vehicle_buy, purchase_cost):
    return ...

def costInsurance(num_vehicle_use, num_vehicle_buy, ...):
    return ....

def costMaintenance(num_vehicle_use, num_vehicle_buy, ...):
    return ...

def costResale(num_vehicle_sell, purchase_cost, ...):
    return ...

def totalCost(num_vehicle_use, num_vehicle_buy, ...):
    total_cost = fuel_cost + purchased_cost + insurance_cost + maintenance_cost - resale_cost
    return total_cost

We can test the initial value by running the cost function. Here is a typical result.

<tf.Tensor: shape=(), dtype=float32, numpy=273293470.0>

We run the optimization for 100 steps. We can run multiple times for different Keras optimizers (Adam, RMSProp, Adagrad, Adadelta, Adamax, Nadam, and Ftrl) and set the learning rate as 0.01. In each step, it calculates the loss function using tf.GradientTape, computes gradients with respect to decision variables (e.g., num_vehicle_use, num_vehicle_buy), and updates these variables using an optimizer. Non-negativity constraints are enforced by applying ReLU to the variables because the number of vehicles needs to be postive integer. Every 10 steps, the current loss and objective value are printed.

领英推荐

The importance of a test set

Daniel Bourke 3 个月前

The Backpropagation Algorithm!

Damien Benveniste, PhD 5 个月前

Fast Classification and Clustering via Image…

Vincent Granville 2 年前

# Initialize variables
id, year, num_vehicle_buy, ... = inputCost(submission_df, ...)

# Choose an optimizer
optimizer = RMSprop(learning_rate=0.01)

# Optimization loop
coords_history = []
value_history = []

# Iteration for optimization
for step in range(100):  # Number of optimization steps
    with tf.GradientTape() as tape:
        loss = totalCost(num_vehicle_use, num_vehicle_buy, ...)

    grads = tape.gradient(loss, [num_vehicle_use, num_vehicle_buy, ...])
    optimizer.apply_gradients(zip(grads, [num_vehicle_use, num_vehicle_buy, ...]))

    # Apply projection to ensure non-negativity
    num_vehicle_use.assign(tf.nn.relu(num_vehicle_use))
    num_vehicle_buy.assign(tf.nn.relu(num_vehicle_buy))
...

    current_loss = loss.numpy()
    current_value = current_loss

    if step % 10 == 0:
        print(f'Step {step}, Loss (objective value): {current_loss}, Objective Value: {current_value}')

If we plot the loss curve of every optimizers it will show as below. We can see that RMSProp performs the best among other optimizers as it converges really fast. It reduces the cost from $273M to $50M.

Loss curve using different optimizers without any constraint

We can print the number of fleet that leads to the optimum cost.

# Final optimized values
optimized_num_vehicle_use = tf.round(num_vehicle_use).numpy()
optimized_num_vehicle_buy = tf.round(num_vehicle_buy).numpy()
optimized_num_vehicle_sell = tf.round(num_vehicle_sell).numpy()

We get the fleet numbers

Optimized number of vehicles for use: [0. 0. 0. ... 2. 0. 0.]
Optimized number of vehicles for buy: [ 2. 35. 33. ...  0.  0.  0.]
Optimized number of vehicles for sell: [1. 1. 1. ... 1. 1. 1.]

2. With a constraint: sell each year < 20% of vehicle fleet

In the second experiment, we apply a constraint such that the number of fleet sold each year must be less than 20% of total existing fleet. We can make the constraint function as follows.

def apply_constraints(num_vehicle_use, num_vehicle_buy, num_vehicle_sell, id, year):
...

        max_sell = 0.2 * total_use_buy
        if row['Sell'] > max_sell:
            df.at[index, 'Sell'] = max_sell
    df['Sell'] = tf.round(df['Sell']) 
    return tf.constant(df['Sell'].values, dtype=tf.float32)

This function will modify the values that will be optimized using the constraint mentioned above. The output must be a Tensorflow variable. We just need to put it inside the iteration and apply it on the number of sold fleet.

# Initialize variables
id, year, num_vehicle_buy, ... = inputCost(submission_df, ...)

# Choose an optimizer
optimizer = RMSprop(learning_rate=0.01)

# Optimization loop
coords_history = []
value_history = []

# Iteration for optimization
for step in range(100):  # Number of optimization steps
    with tf.GradientTape() as tape:
        loss = totalCost(num_vehicle_use, num_vehicle_buy, ...)

    grads = tape.gradient(loss, [num_vehicle_use, num_vehicle_buy, ...])
    optimizer.apply_gradients(zip(grads, [num_vehicle_use, num_vehicle_buy, ...]))

    # Apply constraints
    num_vehicle_sell.assign(apply_constraints(num_vehicle_use, num_vehicle_buy, num_vehicle_sell, id, year))

    # Apply projection to ensure non-negativity
    num_vehicle_use.assign(tf.nn.relu(num_vehicle_use))
    num_vehicle_buy.assign(tf.nn.relu(num_vehicle_buy))
...

    current_loss = loss.numpy()
    current_value = current_loss

    if step % 10 == 0:
        print(f'Step {step}, Loss (objective value): {current_loss}, Objective Value: {current_value}')

We can see the loss curve as follows.

Loss curve using RMSProp with a constraint

It converges slowly because the constraint is applied, but it resembles more real case. After 100 iterations, we have reduced the cost to almost $100M. We can set as much iteration as possible to continue the optimization.

Conclusion

This experiment shows an interesting result of using Keras optimizers to solve a discrete optimization problem. As this is currently underexplored, it may be possible to apply MLP optimizers for wider scale. Also, a mathematical investigation needs to be carried out to verify this methodology. However, optimization using Keras optimizers can be extremely more useful and powerful with GPU because we know GPU accelerates MLP training from hours to just few minutes. And what more interesting of this solver - it is open source!!!

Anselmo R. Pitombeira-Neto

Associate Professor at the Federal University of Ceará / Leader of the OPL Operations Research Laboratory/ Founder ARVRA Artificial Intelligence

2 个月

How do you enforce integrality of variables? By rounding? In addition, besides nonnegativity constraints, you seem to add only one constraint in the problem. This is unrealistic, real problems solved by math prog solvers may have up to million constraints (or more). Anyway, your application is interesring. It makes it clear that training a neural network corresponds to solving an optimization problem. Many naive ML users do not know (or remember) this fact.

5 次回应

Daniel Svonava

Vector Compute @ Superlinked | xYouTube

2 个月

Were there any particular challenges you faced with training or convergence when applying neural network optimizers to discrete problems?

1 次回应

Andrea Taverna, PhD, CAP

Data Science, Optimization | Manufacturing, Retail, Logistics | Helped customers dramatically improve KPIs (profit, revenue, ...) as well as quality, speed and confidence of decision making

2 个月

If you have constraints you may want use Lagrangian information, e.g. https://arxiv.org/abs/2001.09394

2 次回应

Partha Pratim Mandal, Ph.D

Assistant Professor | Geomechanics & Geophysical Engineer | Innovator | STEM Leader

2 个月

Very helpful!

查看更多评论

要查看或添加评论，请登录

查看全部

MLP (Keras) Optimizers for Discrete Problems

Yohanes Nuwara

Senior Data Scientist at Prores AS

Neural Network Optimizers

Case Study

1. Without any constraint

领英推荐

2. With a constraint: sell each year < 20% of vehicle fleet

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

TensorFlow - Aamir?P

DETR - End to End object DEtection with TRansformer

Linear Regression and Logistic Regression in Machine Learning

Steve Jobs AI LinkedIn Article Generator: Let's Think Step by Step on How to Create One

Decoding the Transformers: A Dive into GPT with TensorFlow

Generative Adversarial Networks: What it is, How they work, and My Experiments

Understanding the Foundations: Mathematics and Logic in Machine Learning and AI

RAG Deep Dive: Understanding Vector Embeddings and Similarity Search

Symbolic Regression: Bridging Interpretability and Complexity in Machine Learning

Binary Classification in Neural Networks with Tensorflow

Neural Network Optimizers

Case Study

1. Without any constraint

领英推荐

2. With a constraint: sell each year < 20% of vehicle fleet

Conclusion

Training Kolmogorov-Arnold Network (KAN) for Lithology Classification

2024年5月9日

Well Placement Optimization using Python PyMRST

2024年4月18日

Tutorial: Reservoir Simulation with Python PyMRST

2024年4月5日

CutMix Augmentation for Object Detection

2024年3月1日

Big Data & AI in Indonesian Healthcare Services

2023年4月24日

ChatGPT for Sustainability

2023年3月22日

Unlocking the Value of AI in Precision Forestry

2022年6月5日

PDA Series #2 Facies Classification from Well Logs

2022年2月8日

PDA Series #1 Recovering Missing Sonic Logs

2022年1月31日

Digital Transformation in the Context of G20 Collaboration

2022年1月26日

社区洞察

其他会员也浏览了

TensorFlow - Aamir?P

DETR - End to End object DEtection with TRansformer

Linear Regression and Logistic Regression in Machine Learning

Steve Jobs AI LinkedIn Article Generator: Let's Think Step by Step on How to Create One

Decoding the Transformers: A Dive into GPT with TensorFlow

Generative Adversarial Networks: What it is, How they work, and My Experiments

Understanding the Foundations: Mathematics and Logic in Machine Learning and AI

RAG Deep Dive: Understanding Vector Embeddings and Similarity Search

Symbolic Regression: Bridging Interpretability and Complexity in Machine Learning

Binary Classification in Neural Networks with Tensorflow