MLP (Keras) Optimizers for Discrete Problems
Optimization problem occurs every time in our daily life. In engineering, optimization is widely in many setup and it is very impactful because it is directly related to business decision making. There are many pretty standard optimization techniques used in most industries. We may have heard about genetic algorithm, simulated annealing, Newton, Quasi-Newton, and so forth. However, recently I think we may have missed something.......
Neural Network Optimizers
We already know that a neural network is an optimization problem. And we know that without optimizers, the neural network cannot learn. It is one of the essential parameters that we have to control. It works by iteratively changing the weights of every neuron after calculating the gradient of the loss function with respect to these weights. Just a little note - these optimizers are modern and powerful!!
I wondered how MLP optimizers can be used to solve discrete optimization problem. Keras, a popular deep learning library for Python, provided variety of optimizers such as Adam, RMSProp, Adagrad, and so forth. So, I started to figure out how to do it. My experiment seemed to work and gave quite satisfactory result.
Case Study
The case study that was used for this work is provided by Shell in its 2024 edition hackathon about optimization of fleet management. Participants were given access to data about fleet costs. The main idea of the challenge is to minimize the fleet cost (cost of buying new fleets, cost of selling old fleets, cost of insurance, and cost of maintenance) and minimizing the CO2 emission footprint while satisfying the annual fleet travel demand. The optimization will seek the composition of fleet bought and sold per year.
I will not go into detail since the purpose of this article is to show how MLP optimizers for optimization problem works, you can access my notebook for free here.
1. Without any constraint
First is a simple problem. We do not put any constraint in it, purely to minimize the costs. We start by defining the input for cost functions. Since it is a framework of Keras, we need to convert all inputs as Tensorflow variables. Learn more about it here .
def inputCost(input_df, vehicles_merge_fuel_df, demand_df):
num_vehicle_buy = tf.Variable(df_null['Buy'].values, dtype=tf.float32)
....
# Produce tensor for other variables
unit_fuel_consumption = tf.constant(merge_with_buy_df['Consumption (unit_fuel/km)'].values, dtype=tf.float32)
....
return id, year, num_vehicle_buy, num_vehicle_use, ......
After that, we build the cost functions one by one.
def costFuel(num_vehicle_use, unit_fuel_consumption, ....):
return ...
def costPurchase(num_vehicle_buy, purchase_cost):
return ...
def costInsurance(num_vehicle_use, num_vehicle_buy, ...):
return ....
def costMaintenance(num_vehicle_use, num_vehicle_buy, ...):
return ...
def costResale(num_vehicle_sell, purchase_cost, ...):
return ...
def totalCost(num_vehicle_use, num_vehicle_buy, ...):
total_cost = fuel_cost + purchased_cost + insurance_cost + maintenance_cost - resale_cost
return total_cost
We can test the initial value by running the cost function. Here is a typical result.
<tf.Tensor: shape=(), dtype=float32, numpy=273293470.0>
We run the optimization for 100 steps. We can run multiple times for different Keras optimizers (Adam, RMSProp, Adagrad, Adadelta, Adamax, Nadam, and Ftrl) and set the learning rate as 0.01. In each step, it calculates the loss function using tf.GradientTape, computes gradients with respect to decision variables (e.g., num_vehicle_use, num_vehicle_buy), and updates these variables using an optimizer. Non-negativity constraints are enforced by applying ReLU to the variables because the number of vehicles needs to be postive integer. Every 10 steps, the current loss and objective value are printed.
领英推荐
# Initialize variables
id, year, num_vehicle_buy, ... = inputCost(submission_df, ...)
# Choose an optimizer
optimizer = RMSprop(learning_rate=0.01)
# Optimization loop
coords_history = []
value_history = []
# Iteration for optimization
for step in range(100): # Number of optimization steps
with tf.GradientTape() as tape:
loss = totalCost(num_vehicle_use, num_vehicle_buy, ...)
grads = tape.gradient(loss, [num_vehicle_use, num_vehicle_buy, ...])
optimizer.apply_gradients(zip(grads, [num_vehicle_use, num_vehicle_buy, ...]))
# Apply projection to ensure non-negativity
num_vehicle_use.assign(tf.nn.relu(num_vehicle_use))
num_vehicle_buy.assign(tf.nn.relu(num_vehicle_buy))
...
current_loss = loss.numpy()
current_value = current_loss
if step % 10 == 0:
print(f'Step {step}, Loss (objective value): {current_loss}, Objective Value: {current_value}')
If we plot the loss curve of every optimizers it will show as below. We can see that RMSProp performs the best among other optimizers as it converges really fast. It reduces the cost from $273M to $50M.
We can print the number of fleet that leads to the optimum cost.
# Final optimized values
optimized_num_vehicle_use = tf.round(num_vehicle_use).numpy()
optimized_num_vehicle_buy = tf.round(num_vehicle_buy).numpy()
optimized_num_vehicle_sell = tf.round(num_vehicle_sell).numpy()
We get the fleet numbers
Optimized number of vehicles for use: [0. 0. 0. ... 2. 0. 0.]
Optimized number of vehicles for buy: [ 2. 35. 33. ... 0. 0. 0.]
Optimized number of vehicles for sell: [1. 1. 1. ... 1. 1. 1.]
2. With a constraint: sell each year < 20% of vehicle fleet
In the second experiment, we apply a constraint such that the number of fleet sold each year must be less than 20% of total existing fleet. We can make the constraint function as follows.
def apply_constraints(num_vehicle_use, num_vehicle_buy, num_vehicle_sell, id, year):
...
max_sell = 0.2 * total_use_buy
if row['Sell'] > max_sell:
df.at[index, 'Sell'] = max_sell
df['Sell'] = tf.round(df['Sell'])
return tf.constant(df['Sell'].values, dtype=tf.float32)
This function will modify the values that will be optimized using the constraint mentioned above. The output must be a Tensorflow variable. We just need to put it inside the iteration and apply it on the number of sold fleet.
# Initialize variables
id, year, num_vehicle_buy, ... = inputCost(submission_df, ...)
# Choose an optimizer
optimizer = RMSprop(learning_rate=0.01)
# Optimization loop
coords_history = []
value_history = []
# Iteration for optimization
for step in range(100): # Number of optimization steps
with tf.GradientTape() as tape:
loss = totalCost(num_vehicle_use, num_vehicle_buy, ...)
grads = tape.gradient(loss, [num_vehicle_use, num_vehicle_buy, ...])
optimizer.apply_gradients(zip(grads, [num_vehicle_use, num_vehicle_buy, ...]))
# Apply constraints
num_vehicle_sell.assign(apply_constraints(num_vehicle_use, num_vehicle_buy, num_vehicle_sell, id, year))
# Apply projection to ensure non-negativity
num_vehicle_use.assign(tf.nn.relu(num_vehicle_use))
num_vehicle_buy.assign(tf.nn.relu(num_vehicle_buy))
...
current_loss = loss.numpy()
current_value = current_loss
if step % 10 == 0:
print(f'Step {step}, Loss (objective value): {current_loss}, Objective Value: {current_value}')
We can see the loss curve as follows.
It converges slowly because the constraint is applied, but it resembles more real case. After 100 iterations, we have reduced the cost to almost $100M. We can set as much iteration as possible to continue the optimization.
Conclusion
This experiment shows an interesting result of using Keras optimizers to solve a discrete optimization problem. As this is currently underexplored, it may be possible to apply MLP optimizers for wider scale. Also, a mathematical investigation needs to be carried out to verify this methodology. However, optimization using Keras optimizers can be extremely more useful and powerful with GPU because we know GPU accelerates MLP training from hours to just few minutes. And what more interesting of this solver - it is open source!!!
Associate Professor at the Federal University of Ceará / Leader of the OPL Operations Research Laboratory/ Founder ARVRA Artificial Intelligence
2 个月How do you enforce integrality of variables? By rounding? In addition, besides nonnegativity constraints, you seem to add only one constraint in the problem. This is unrealistic, real problems solved by math prog solvers may have up to million constraints (or more). Anyway, your application is interesring. It makes it clear that training a neural network corresponds to solving an optimization problem. Many naive ML users do not know (or remember) this fact.
Vector Compute @ Superlinked | xYouTube
2 个月Were there any particular challenges you faced with training or convergence when applying neural network optimizers to discrete problems?
Data Science, Optimization | Manufacturing, Retail, Logistics | Helped customers dramatically improve KPIs (profit, revenue, ...) as well as quality, speed and confidence of decision making
2 个月If you have constraints you may want use Lagrangian information, e.g. https://arxiv.org/abs/2001.09394
Assistant Professor | Geomechanics & Geophysical Engineer | Innovator | STEM Leader
2 个月Very helpful!