MLP (Keras) Optimizers for Discrete Problems

MLP (Keras) Optimizers for Discrete Problems

Optimization problem occurs every time in our daily life. In engineering, optimization is widely in many setup and it is very impactful because it is directly related to business decision making. There are many pretty standard optimization techniques used in most industries. We may have heard about genetic algorithm, simulated annealing, Newton, Quasi-Newton, and so forth. However, recently I think we may have missed something.......

Few airlines tried to understand their optimum cost per trip vs. cost per seat in the 90s (Awad, 2009)

Neural Network Optimizers

We already know that a neural network is an optimization problem. And we know that without optimizers, the neural network cannot learn. It is one of the essential parameters that we have to control. It works by iteratively changing the weights of every neuron after calculating the gradient of the loss function with respect to these weights. Just a little note - these optimizers are modern and powerful!!

I wondered how MLP optimizers can be used to solve discrete optimization problem. Keras, a popular deep learning library for Python, provided variety of optimizers such as Adam, RMSProp, Adagrad, and so forth. So, I started to figure out how to do it. My experiment seemed to work and gave quite satisfactory result.

Case Study

The case study that was used for this work is provided by Shell in its 2024 edition hackathon about optimization of fleet management. Participants were given access to data about fleet costs. The main idea of the challenge is to minimize the fleet cost (cost of buying new fleets, cost of selling old fleets, cost of insurance, and cost of maintenance) and minimizing the CO2 emission footprint while satisfying the annual fleet travel demand. The optimization will seek the composition of fleet bought and sold per year.

I will not go into detail since the purpose of this article is to show how MLP optimizers for optimization problem works, you can access my notebook for free here.

1. Without any constraint

First is a simple problem. We do not put any constraint in it, purely to minimize the costs. We start by defining the input for cost functions. Since it is a framework of Keras, we need to convert all inputs as Tensorflow variables. Learn more about it here .

def inputCost(input_df, vehicles_merge_fuel_df, demand_df):
    num_vehicle_buy = tf.Variable(df_null['Buy'].values, dtype=tf.float32)
....
    # Produce tensor for other variables
    unit_fuel_consumption = tf.constant(merge_with_buy_df['Consumption (unit_fuel/km)'].values, dtype=tf.float32)
....
    return id, year, num_vehicle_buy, num_vehicle_use, ......        

After that, we build the cost functions one by one.

def costFuel(num_vehicle_use, unit_fuel_consumption, ....):
    return ...

def costPurchase(num_vehicle_buy, purchase_cost):
    return ...

def costInsurance(num_vehicle_use, num_vehicle_buy, ...):
    return ....

def costMaintenance(num_vehicle_use, num_vehicle_buy, ...):
    return ...

def costResale(num_vehicle_sell, purchase_cost, ...):
    return ...

def totalCost(num_vehicle_use, num_vehicle_buy, ...):
    total_cost = fuel_cost + purchased_cost + insurance_cost + maintenance_cost - resale_cost
    return total_cost        

We can test the initial value by running the cost function. Here is a typical result.

<tf.Tensor: shape=(), dtype=float32, numpy=273293470.0>        

We run the optimization for 100 steps. We can run multiple times for different Keras optimizers (Adam, RMSProp, Adagrad, Adadelta, Adamax, Nadam, and Ftrl) and set the learning rate as 0.01. In each step, it calculates the loss function using tf.GradientTape, computes gradients with respect to decision variables (e.g., num_vehicle_use, num_vehicle_buy), and updates these variables using an optimizer. Non-negativity constraints are enforced by applying ReLU to the variables because the number of vehicles needs to be postive integer. Every 10 steps, the current loss and objective value are printed.

# Initialize variables
id, year, num_vehicle_buy, ... = inputCost(submission_df, ...)

# Choose an optimizer
optimizer = RMSprop(learning_rate=0.01)

# Optimization loop
coords_history = []
value_history = []

# Iteration for optimization
for step in range(100):  # Number of optimization steps
    with tf.GradientTape() as tape:
        loss = totalCost(num_vehicle_use, num_vehicle_buy, ...)

    grads = tape.gradient(loss, [num_vehicle_use, num_vehicle_buy, ...])
    optimizer.apply_gradients(zip(grads, [num_vehicle_use, num_vehicle_buy, ...]))

    # Apply projection to ensure non-negativity
    num_vehicle_use.assign(tf.nn.relu(num_vehicle_use))
    num_vehicle_buy.assign(tf.nn.relu(num_vehicle_buy))
...

    current_loss = loss.numpy()
    current_value = current_loss

    if step % 10 == 0:
        print(f'Step {step}, Loss (objective value): {current_loss}, Objective Value: {current_value}')        

If we plot the loss curve of every optimizers it will show as below. We can see that RMSProp performs the best among other optimizers as it converges really fast. It reduces the cost from $273M to $50M.

Loss curve using different optimizers without any constraint

We can print the number of fleet that leads to the optimum cost.

# Final optimized values
optimized_num_vehicle_use = tf.round(num_vehicle_use).numpy()
optimized_num_vehicle_buy = tf.round(num_vehicle_buy).numpy()
optimized_num_vehicle_sell = tf.round(num_vehicle_sell).numpy()        

We get the fleet numbers

Optimized number of vehicles for use: [0. 0. 0. ... 2. 0. 0.]
Optimized number of vehicles for buy: [ 2. 35. 33. ...  0.  0.  0.]
Optimized number of vehicles for sell: [1. 1. 1. ... 1. 1. 1.]        

2. With a constraint: sell each year < 20% of vehicle fleet

In the second experiment, we apply a constraint such that the number of fleet sold each year must be less than 20% of total existing fleet. We can make the constraint function as follows.

def apply_constraints(num_vehicle_use, num_vehicle_buy, num_vehicle_sell, id, year):
...

        max_sell = 0.2 * total_use_buy
        if row['Sell'] > max_sell:
            df.at[index, 'Sell'] = max_sell
    df['Sell'] = tf.round(df['Sell']) 
    return tf.constant(df['Sell'].values, dtype=tf.float32)        

This function will modify the values that will be optimized using the constraint mentioned above. The output must be a Tensorflow variable. We just need to put it inside the iteration and apply it on the number of sold fleet.

# Initialize variables
id, year, num_vehicle_buy, ... = inputCost(submission_df, ...)

# Choose an optimizer
optimizer = RMSprop(learning_rate=0.01)

# Optimization loop
coords_history = []
value_history = []

# Iteration for optimization
for step in range(100):  # Number of optimization steps
    with tf.GradientTape() as tape:
        loss = totalCost(num_vehicle_use, num_vehicle_buy, ...)

    grads = tape.gradient(loss, [num_vehicle_use, num_vehicle_buy, ...])
    optimizer.apply_gradients(zip(grads, [num_vehicle_use, num_vehicle_buy, ...]))

    # Apply constraints
    num_vehicle_sell.assign(apply_constraints(num_vehicle_use, num_vehicle_buy, num_vehicle_sell, id, year))

    # Apply projection to ensure non-negativity
    num_vehicle_use.assign(tf.nn.relu(num_vehicle_use))
    num_vehicle_buy.assign(tf.nn.relu(num_vehicle_buy))
...

    current_loss = loss.numpy()
    current_value = current_loss

    if step % 10 == 0:
        print(f'Step {step}, Loss (objective value): {current_loss}, Objective Value: {current_value}')        

We can see the loss curve as follows.

Loss curve using RMSProp with a constraint

It converges slowly because the constraint is applied, but it resembles more real case. After 100 iterations, we have reduced the cost to almost $100M. We can set as much iteration as possible to continue the optimization.

Conclusion

This experiment shows an interesting result of using Keras optimizers to solve a discrete optimization problem. As this is currently underexplored, it may be possible to apply MLP optimizers for wider scale. Also, a mathematical investigation needs to be carried out to verify this methodology. However, optimization using Keras optimizers can be extremely more useful and powerful with GPU because we know GPU accelerates MLP training from hours to just few minutes. And what more interesting of this solver - it is open source!!!

Anselmo R. Pitombeira-Neto

Associate Professor at the Federal University of Ceará / Leader of the OPL Operations Research Laboratory/ Founder ARVRA Artificial Intelligence

2 个月

How do you enforce integrality of variables? By rounding? In addition, besides nonnegativity constraints, you seem to add only one constraint in the problem. This is unrealistic, real problems solved by math prog solvers may have up to million constraints (or more). Anyway, your application is interesring. It makes it clear that training a neural network corresponds to solving an optimization problem. Many naive ML users do not know (or remember) this fact.

Daniel Svonava

Vector Compute @ Superlinked | xYouTube

2 个月

Were there any particular challenges you faced with training or convergence when applying neural network optimizers to discrete problems?

Andrea Taverna, PhD, CAP

Data Science, Optimization | Manufacturing, Retail, Logistics | Helped customers dramatically improve KPIs (profit, revenue, ...) as well as quality, speed and confidence of decision making

2 个月

If you have constraints you may want use Lagrangian information, e.g. https://arxiv.org/abs/2001.09394

Partha Pratim Mandal, Ph.D

Assistant Professor | Geomechanics & Geophysical Engineer | Innovator | STEM Leader

2 个月

Very helpful!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了