Liquid Neural Networks : Applying Human Brain Dynamics into Classical Neural Networks

Abu Huzaifah Bidin

Process Engineer | Independent AI Researcher | Writer

发布日期: 2025年2月10日

This is it. This is what i’ve been waiting to implemnet in a very looong time. This is one of the few neural networks architecture that are being proposed as the future of AI. Since it’s really modeled on how the human brain works. The proposal by the people from MIT (Ramin Hasani) is coming from the awareness that current neural networks architecture has reach the overparameterization curve, where it’s accuracy does not increase that much, but the parameter scale is bloated. So, he’s proposing a new architecture that’s goinf to reduce the number of parameters, while at the same time, increase it’s accuracy. And the architecture that he proposes, is Liquid Neural Network.

You can get the paper here:

https://arxiv.org/pdf/2006.04439

Understanding the?Abstract

Liquid Neural Networks was first introduced by Ramin Hasani and his colleagues (Mathias Lechner, Alexander Amini,Daniela Rus and Radu Grosu3) in a paper published in December 2020. The abstract of the paper are as follows:

We introduce a new class of time-continuous recurrent neural network models. Instead of declaring a learning system’s dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems modulated via nonlinear interlinked gates. The resulting models represent dynamical systems with varying (i.e., liquid) time-constants coupled to their hidden state, with outputs being computed by numerical differential equation solvers. These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations, and give rise to improved performance on time-series prediction tasks. To demonstrate these properties, we first take a theoretical approach to find bounds over their dynamics, and compute their expressive power by the trajectory length measure in a latent trajectory space. We then conduct a series of time-series prediction experiments to manifest the approximation capability of Liquid Time-Constant Networks (LTCs) compared to classical and modern RNNs.

Alright, as usual let’s understand this part by part.

We introduce a new class of time-continuous recurrent neural network models. Instead of declaring a learning system’s dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems modulated via nonlinear interlinked gates.

The Liquid Neural Network is actually an addition to the recurrent neural network. I have explained extensively on Recurrent Neural Network in my Bahasa Melayu tutorial, which you may read here in two parts

Part 1: https://medium.com/@maercaestro/siri-belajar-ai-mari-kita-kenal-jaringan-neural-berturut-recurrent-neural-network-bahagian-1-2efd3d3aaafe

Part 2: https://medium.com/@maercaestro/siri-belajar-ai-mari-kita-kenal-jaringan-neural-berturut-recurrent-neural-network-bahagian-2-335ffd05efbe

In the recurrent neural networks we know that it is designed to handle sequential data (time series, sequences in languages etc) by adding additional vector hidden steps. This vector will hold all the informations coming from the current time step and hopes that it will be passed upon to the next time step across all the neural layers. Currently RNN handles nom-linearity of the data by adding the activation function (this is the implicit nonlinearities that is declared on the learning system dynamics). We know how activation functions works, I have also extensively covered that in my articles below:

https://medium.com/@maercaestro/siri-belajar-ai-fungsi-pengaktifan-activation-function-415a7995c034

But, once we declared an activation functions (especially in a RNN), it becomes hard to interpret and analyze causing it to behave in unpredictable ways. This is not a good idea in a time-discrete, or dyanmics system.

Therefore, Ramin and his team proposes an architecture build upon linear first order differential equations. This linear ODE will be modulated (controlled/modified) via a gated system (like LSTM gate) to ensure it’s linearity can be made into non-linear.

Ok, that was kind of simple to understand. Now let’s move on to the next part

The resulting models represent dynamical systems with varying (i.e., liquid) time-constants coupled to their hidden state, with outputs being computed by numerical differential equation solvers. These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations, and give rise to improved performance on time-series prediction tasks.

This seems to be the improvement made by this architecture compare to RNN and other sequential network. Let’s list it down below:

Their time constants seems to be not-constant and varies depending on the dynamics of the system. This is good since this will determine their adaptability.
Their output is computed by numerical ODE instead of through matrix multiplication inside the linear hidden states
Their architecture is more stable and remove the vanishing and exploding gradient prevalent inside RNN
They are more expressive in the sense that they can capture more complex patterns that are throw at them
They have improve performance on time-series predictions tasks

So, that’s a high number of claims. Something that we will explore further in the next part. Right now, let’s further understands this abstract. Below are the last part:

To demonstrate these properties, we first take a theoretical approach to find bounds over their dynamics, and compute their expressive power by the trajectory length measure in a latent trajectory space. We then conduct a series of time-series prediction experiments to manifest the approximation capability of Liquid Time-Constant Networks (LTCs) compared to classical and modern RNN

Alright. There’s two part in this. This statements mentioned the methodology of their testing to prove their claims. The first one is on the theoretical parts. In this part, they test it on two fronts. One is on the bounds of their dynamics. This is to test whether their architecture is more stable compare to RNN/LSTM. If it has specified bounds, it means that it is stable. Is the bound is too big (or too small), in tends to face vanishing or exploding gradient problems. The second part of their theoretical testing is to measure the trajectory length in a latent trajectory space. This length will determine how much complex a patter that the model can capture. Longer is better.

Their second part of experiment is to test their architecture on real world cases and made comparison to RNN.

Alright, seems that we abstract is now clear. Let’s move on to the next part

Discussion

1. Adjoint Method instead of Normal Backpropagation

So there’s not much information past on the introduction. It just discussed on the RNN that has added neural ODEs inside their hidden state. But it did mentiond on two separate different learning algorithms.

The reverse mode AD which we can identified as backpropagation through time (BPTT). This is the standard algorithm for training RNN.
The adjoint method, which are introduced by Pontryagin in 2018. This is the one that we need to discuss

I’ve covered on backpropagation and reverse mode AD extensively on my previous articles. You may read them here: https://medium.com/@maercaestro/stde-stochastic-taylor-derivative-estimator-the-winning-neurips-2024-paper-from-singapore-79a7ccc3dbfc

But this is the first time I heard of this adjoint method. Let’s discuss about it here

Adjoint Method In the briefiest sense, we can define adjoin method as a way to optimize the BPTT algorithm when faced with multiple differential equations to solve. As I mentioned in my previous article (STDE one), solving multiple differential equations all at once would be computationally intensive. And since we’re adding ODE to all these hidden states, the memory requirement would be very large.

Therefore, the adjoint method introduce an adjoint state on top of that hidden state. Which carries the sensitivity of the loss to the hidden state. Kind of seems like what they did with STDE, where the coefficients hold the impact of the taylor approximation during optimization,

We can simulate the adjoint methiod below by using torchdiffeq library

import torch
import torch.nn as nn
import torch.optim as optim
import time
import matplotlib.pyplot as plt
from torchdiffeq import odeint, odeint_adjoint  # Import both methods

# Check for GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define a simple neural network function for f(x, t, θ)
class ODEFunc(nn.Module):
    def __init__(self):
        super(ODEFunc, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(2, 50),
            nn.Tanh(),
            nn.Linear(50, 2)
        )

    def forward(self, t, x):
        return self.net(x)

# Neural ODE Model (Switch Between Adjoint and Standard)
class NeuralODE(nn.Module):
    def __init__(self, func, use_adjoint=False):
        super(NeuralODE, self).__init__()
        self.func = func
        self.use_adjoint = use_adjoint  # Flag for adjoint vs. BPTT

    def forward(self, x0, t):
        if self.use_adjoint:
            return odeint_adjoint(self.func, x0, t)  # Adjoint method
        else:
            return odeint(self.func, x0, t)  # Standard BPTT

# Function to measure training time and collect loss values
def train_ode(model, optimizer, method_name, epochs=50, early_stop_threshold=-0.01):
    t = torch.linspace(0, 1, 20000).to(device)  # Time points
    x0 = torch.tensor([[2.0, 0.0]], device=device)  # Initial condition

    start_time = time.time()  # Start timer
    loss_history = []  # Track loss per epoch

    for epoch in range(epochs):
        optimizer.zero_grad()
        pred_x = model(x0, t)  # Forward pass using ODE solver
        loss = torch.mean(pred_x)  # Example loss function
        loss.backward()  # Backpropagate gradients
        optimizer.step()

        loss_value = loss.item()
        loss_history.append(loss_value)

        print(f"{method_name} - Epoch {epoch}: Loss = {loss_value:.5f}")

        # Early stopping when loss goes below the threshold
        if loss_value < early_stop_threshold:
            print(f"Early stopping at epoch {epoch} for {method_name} (loss < {early_stop_threshold})")
            break

    end_time = time.time()  # End timer
    return end_time - start_time, loss_history  # Return execution time and loss history

# Create models for both methods
func_bptt = ODEFunc().to(device)
func_adjoint = ODEFunc().to(device)

model_bptt = NeuralODE(func_bptt, use_adjoint=False).to(device)  # Standard BPTT
model_adjoint = NeuralODE(func_adjoint, use_adjoint=True).to(device)  # Adjoint Method

# Create optimizers
optimizer_bptt = optim.Adam(model_bptt.parameters(), lr=0.01)
optimizer_adjoint = optim.Adam(model_adjoint.parameters(), lr=0.01)

# Train and compare times
print("\nTraining with Standard Backpropagation (BPTT)...")
bptt_time, bptt_loss = train_ode(model_bptt, optimizer_bptt, "BPTT")

print("\nTraining with Adjoint Method...")
adjoint_time, adjoint_loss = train_ode(model_adjoint, optimizer_adjoint, "Adjoint")

# Plot Loss vs. Epoch
plt.figure(figsize=(8, 5))
plt.plot(bptt_loss, label="BPTT (Standard Backpropagation)", linestyle='dashed', marker='o')
plt.plot(adjoint_loss, label="Adjoint Method", linestyle='dashed', marker='s')
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Loss vs. Epochs for BPTT vs. Adjoint Method")
plt.legend()
plt.grid(True)
plt.show()

# Print final comparison
print(f"\nTime Comparison:")
print(f" BPTT Training Time: {bptt_time:.4f} seconds")
print(f" Adjoint Method Training Time: {adjoint_time:.4f} seconds")

Which resulted in:

2. The Architecture: Liquid Time Constant?Network

Alright, the discussion above is just for me to understand the difference between BPTT and adjoint. In theory, Adjoint will be better for longer sequence, but BPTT is still good for general usage. Now, we come to the main topic, which is the Liquid Tine Constant Network. Let us define the equation for the architecture as below:

But normally, we will use S(t) to represent that long ass part of f(x(t),I(t),t,theta), so the equation can be simplified as below:

So let’s understand the equaton further. We want to know eachs terms and separately and what it does in the networks. The first one we have the (1/tau). The tau actually represents the time constants but it is tie in with the S(t) which is the non-linear equations that act as the gate and modulator as mentioned in the abstract. That first part of the equation is (that left part) is what determine decay rate of the network. That is something that we need to discuss further below.

The second part of the equation is the corrections needed to adjust the hidden state of the network, where A is the learnable parameter, the weights that will be updated during training (at last, something that I truly understand).

3. Understanding that Time Decay and Time?Constant

Alright, let’s first understand why we need this time decay and time constant terms in our equation. In my previous writings on RNN, I never once mentioned about any of those terms. Now suddenly it appears and wrecks my understanding. So let’s take a step back and undertands properly.

Alright, we know that RNN is good at handling sequential data. And they handle this sequence by introducing time steps for each sequence of data. This time steps can be classified into two main categories.

Discrete time steps where the data comes at regular intervals and time itself ins’t really a critical factor. For examples, sequence of words, time series predicitons of stock prices and speech processing.
Continous time steps where the data comes at irregular intervals and time is a critical factor. This can be seen at medical monitoring, weather forecasting, robotics and self driving.

So, the discrete time steps doesn’t require the architecture to model the passage of time. The time is just being used as an indicator of where the training process is now. They did have time constant and time decay, but it was implicitly determined through models Weights and parameters, not explicitly declared, therefore it cannot be control and modulated.

However, for continous time steps, the passage of time has to be modeled as the data doesn’t come at fixed intervals, and the dynamics between has to be learned. In an effort to model the passage of time inside the architecture, it is crucial to add time constant and time decay into our equations.

The Decay Rate Term and It’s Relation with our Time Constant

So, as mentioned above, we can see that our LNTN equations are considered of two terms. The first part is the decay rate of the network. The other is the correcton part. So now we would like to epxlore on that decay rate. Let’s write the equation as below:

from here, the decay rate is actually this part

领英推荐

Unlocking the Power of Graphs: The Rise of Graph…

HirePort AI 2 年前

The main areas where Neural Network is widely used.

Deeptimaan Info System Software Pvt Ltd 1 年前

The Evolution of Convolutional Neural Networks for…

AG Tech Consulting Services 8 个月前

where it will determine the decay rate of our hidden state x(t).

So let’s think about this conceptually. For a system with time constant tau(sys)?, the decay rate will be the reciprocal of that time constant, which can be denoted as

So in essence, we can also write our decay rate as below:

So in order to solve for tau(sys), we can take the recripocal of both sides and it resulted as below:

To make the equation more cleaner, we can multiply the denominator and numerator by tau, thus resulting in

We can also write this as

This match the equation written in the paper.

4. Understanding why the Equation are structured as?that?

So there’s a section inside the paper that’s discussing why the equation for LCTN are structured as that. They give two main reasones as I will detailed below:

Biologically Inspired The equation is actually inspired from biology, specficially the electrical membrane potential (the electrical capacity) of a non-spiking neurons. We can write this equations as below

Where:

This is actually presented in a paper coming from 1907!! Looking at the equation it does seems resemble the equation of LCTNs

Furthermore, that in the equation above, S(t) can be approximated as:

Where:

2. Mathematically Inspired The equaton is also mathematically inspired. They draw inspiration from Dynamic Causal Models (DCMs), which are used to represent brain dynamics in response to external stimuli. DCM can be written as below:

Where:

Solving and Implementing the?LCTN/LNN

The Forward Pass by using Fused?Solver

Alright. For someone who is very good in mathematics, loooking at that equation alone will trigger few questions. I just being made aware of this with the help of ChatGPT. We have learned about this during SPM. If you remember, when handling any non-linear functions with no closed form provided, we cannot exactly solve it directly. What we need to do is perform approximation. And this approximation is done in the form of? Iteration. Yeah, that triggers my memory and my nightmare a little bit.

So, that non-linearity will cause the computers to solve the LCTN equations through hundreds and thousands of iteration. This won’t be efficient if we want to include this to hunreds/thousands of neurons in our network. So a new solver must be formed to solve the LCTN.

The second issues that we will have is that the ODE introduce in LCTN is stiff. That’s a funny terms but also something that needed to be explained. A stiff ODE is an ODE that exhibits behavior at multiple timescale. This can create a problem to traditonal solver, since traditional solver require fixed time step/time scale to be solved. An ODE that can operates at multiple timescales will cause unstability to the solutions.

So, Ramin and his colleagues again introduce a solution to accompany their new architecture. They proposed the fused Euler solver (combining both implicit and explicit Euler Method) which can be written as below:

So basically, that is what our forward pass is. This is hard to understand. So it’s better if we implement this in python.

The Pseudocode of the Forward?Pass:

Alright, what we want now is to build the LCTN inside python. We will implement the basic equation and applty the update rule using Fused Euler method as mentioned above. The pseudocode of our implementation can be seen as below:

We will define the non-linear functions first. That S(t) inside python.?
The S(t) will be represented as one linear layer inside the network. It will takes the input dimension and hidden dimensions in the form of vector.
The S(t) layer will be activated using tanh to introduce the non-linearity elements.
S(t) will be pass forward using concatenation of input (I) and hidden state (x).?
In the next part, we will build the LCTN. This LCTN will takes the input and hidden dimensions. We will also provide the tau, and the dt (time differential/time steps) as arguments.
In the constructor for LCTN we will build 4 main items. The first one is the time constant. Next, we will the bias vector (A). Then we will put the non-linear functions that we have build previously. Lastly, we will add the time step element (dt).
The LCTN will be pass forward using the fused euler method.?

Alright, simple enough to understand. Let’s build this in python. We will also try to visualize our implementation with matplotlib. The full codew are as below

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn

#1. First, let's impleement the class first. we will implement the S(t) first
class NonLinearFunction(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(NonLinearFunction, self).__init__()
        self.linear1 = nn.Linear(hidden_dim + input_dim, hidden_dim)
        self.activation = nn.Tanh() # use Tanh to introduce the non-linearity into our linear layer

    def forward(self, x, I, t):
        """
        This is the mathematical non-linear functions

        """
        combined = torch.cat((x, I), dim=-1)  # Concatenate input and hidden state
        return self.activation(self.linear1(combined))

# Define the LTCN with Fused Euler Solver
class LiquidTimeConstantNetwork(nn.Module):
    def __init__(self, input_dim, hidden_dim, tau_init=1.0, dt=0.1):
        super(LiquidTimeConstantNetwork, self).__init__()
        self.tau = nn.Parameter(torch.full((hidden_dim,), tau_init))  # Time constant
        self.bias_vector = nn.Parameter(torch.ones(hidden_dim))      # Bias A
        self.non_linear = NonLinearFunction(input_dim, hidden_dim)   # Non-linear function
        self.dt = dt                                                # Time step

    def forward(self, x, I, t):
        """
        Perform forward pass using the fused Euler Method.

        """
        f = self.non_linear(x, I, t)  # the non-linear fuction and modulation
        numerator = x + self.dt * f * self.bias_vector 
        denominator = 1 + self.dt * (1 / self.tau + f)
        x_next = numerator / denominator  
        return x_next

# Example Usage
torch.manual_seed(42)
input_dim = 1
hidden_dim = 10
timesteps = 100

# Initialize LTCN
ltcn = LiquidTimeConstantNetwork(input_dim, hidden_dim)
x = torch.zeros((1, hidden_dim))          # Initial hidden state
I = torch.rand((timesteps, 1))            # Random input over timesteps
t = torch.linspace(0, timesteps * 0.1, timesteps)  # Time points

# Forward pass through the LTCN
hidden_states = []
for i in range(timesteps):
    x = ltcn(x, I[i:i+1], t[i:i+1])  # Update hidden state
    hidden_states.append(x.detach().numpy())

hidden_states = np.array(hidden_states).squeeze()

# Plot Hidden State Dynamics
plt.figure(figsize=(10, 6))
for i in range(hidden_dim):
    plt.plot(t.numpy(), hidden_states[:, i], label=f"Neuron {i+1}")
plt.xlabel("Time")
plt.ylabel("Hidden State Value")
plt.title("LTCN Hidden State Dynamics Over Time")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

This results in a beautiful and smooth curve as you can see below:

As you can see above, the hidden states fo LCTN evolves over time and it does this almost in smooth manner. If we compare it to RNN using the code below, we will get an unfair comparison. Let’s see below:

import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

# Define Standard RNN
class StandardRNN(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(StandardRNN, self).__init__()
        self.rnn_cell = nn.RNNCell(input_dim, hidden_dim)

    def forward(self, x, hidden_state):
        return self.rnn_cell(x, hidden_state)

# Define LTCN
class LiquidTimeConstantNetwork(nn.Module):
    def __init__(self, input_dim, hidden_dim, tau_init=1.0, dt=0.1):
        super(LiquidTimeConstantNetwork, self).__init__()
        self.tau = nn.Parameter(torch.full((hidden_dim,), tau_init))
        self.bias_vector = nn.Parameter(torch.ones(hidden_dim))
        self.non_linear = nn.Sequential(
            nn.Linear(hidden_dim + input_dim, hidden_dim),
            nn.Tanh()
        )
        self.dt = dt

    def forward(self, x, hidden_state):
        combined = torch.cat((hidden_state, x), dim=-1)
        f = self.non_linear(combined)
        numerator = hidden_state + self.dt * f * self.bias_vector
        denominator = 1 + self.dt * (1 / self.tau + f)
        return numerator / denominator

# Generate input sequence
torch.manual_seed(42)
input_dim = 1
hidden_dim = 10
timesteps = 100
inputs = torch.rand((timesteps, input_dim))  # Random input sequence

# Initialize RNN and LTCN
rnn = StandardRNN(input_dim, hidden_dim)
ltcn = LiquidTimeConstantNetwork(input_dim, hidden_dim)

# Initialize hidden states
hidden_rnn = torch.zeros((1, hidden_dim))
hidden_ltcn = torch.zeros((1, hidden_dim))

# Track hidden state dynamics
rnn_states = []
ltcn_states = []

for t in range(timesteps):
    hidden_rnn = rnn(inputs[t:t+1], hidden_rnn)  # Standard RNN update
    hidden_ltcn = ltcn(inputs[t:t+1], hidden_ltcn)  # LTCN update

    rnn_states.append(hidden_rnn.detach().numpy())
    ltcn_states.append(hidden_ltcn.detach().numpy())

# Convert to numpy arrays for visualization
rnn_states = np.array(rnn_states).squeeze()
ltcn_states = np.array(ltcn_states).squeeze()

# Visualization
plt.figure(figsize=(12, 6))
for i in range(hidden_dim):
    plt.plot(rnn_states[:, i], label=f"RNN Neuron {i+1}", alpha=0.7, linestyle="--")
    plt.plot(ltcn_states[:, i], label=f"LTCN Neuron {i+1}", alpha=0.7)
plt.title("Hidden State Dynamics: LTCN vs Standard RNN")
plt.xlabel("Time Steps")
plt.ylabel("Hidden State Value")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

Damn. RNN does look more rugged, unsmooth and unstable compared to LNN. This shows that LCTN is more dynamic across time steps compared to RNN.

Alright. I did want to make one more comparison. As recommended by o1, it is better if we look at the heatmap of this neuron. And see which architecture, model has more active neurons in their network across time steps. Let implement the code below:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

def main():
    # ----------------------------------------------------
    # 1) Generate Example Forward-Pass Data
    # ----------------------------------------------------
    np.random.seed(42)  # For reproducible 'random' data
    num_neurons = 10
    num_timesteps = 50

    # For illustration, we'll create dummy data shaped (neurons, time).
    # Replace these with your actual forward-pass hidden states.
    # e.g. if your actual shape is (time, neurons), transpose it: data = data.T
    rnn_data = np.random.normal(loc=0.0, scale=0.2,
                                size=(num_neurons, num_timesteps))
    ltcn_data = np.random.normal(loc=0.1, scale=0.2,
                                 size=(num_neurons, num_timesteps))

    # Optionally, compute difference: LTCN - RNN
    difference_data = ltcn_data - rnn_data

    # ----------------------------------------------------
    # 2) Create the Plots
    # ----------------------------------------------------
    # We'll make 1 figure containing 3 subplots side by side:
    # LTCN, RNN, and Difference
    fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 6), sharey=True)

    # (a) LTCN Heatmap
    sns.heatmap(ltcn_data, ax=axes[0], cmap='viridis',
                cbar_kws={'label': 'Hidden State Value'})
    axes[0].set_title('LTCN Hidden States')
    axes[0].set_xlabel('Time Step')
    axes[0].set_ylabel('Neuron Index')

    # (b) RNN Heatmap
    sns.heatmap(rnn_data, ax=axes[1], cmap='viridis',
                cbar_kws={'label': 'Hidden State Value'})
    axes[1].set_title('RNN Hidden States')
    axes[1].set_xlabel('Time Step')
    # sharey=True means the y-axis labeling in the first subplot suffices.

    # (c) Difference Heatmap (LTCN - RNN)
    # We'll use a diverging colormap, centered at 0, to highlight sign differences
    sns.heatmap(difference_data, ax=axes[2], cmap='coolwarm', center=0.0,
                cbar_kws={'label': 'LTCN - RNN'})
    axes[2].set_title('Difference: LTCN - RNN')
    axes[2].set_xlabel('Time Step')

    plt.tight_layout()
    plt.show()

if __name__ == '__main__':
    main()

This resulted in a beutiful heatmap as you can see below:

As you can see LCTN neuron seems far more active compared to RNN when given temporal input. It almost seems like LCTN seems more sensitive to changes in the input data. But this is just early comparison. We need to perform actual testing on this models to test our hypothesis.

But, this is far extensive enough as it is. I’ll stop at this point. We have fully understood the architecture and it’s benefit. But we haven’t fully tested this yet. This is something that we will do in the next part.

See you in a few weeks.

Beyond the Hype: Science of AI

228 位关注者

Kamree Aziz

AI & Cybersecurity advocate | CyberSecure systems developer | NACSA licensed pentester | HRDC trainer | AI upskillling & automation for non technical personnel

3 周

Will be following this - love new architectures - but my comment or my ignorance would be - is there a control mechanism for the decay on certain paths? eg like a surprise value or this is embedded already in the architecture?

1 次回应

查看更多评论

要查看或添加评论，请登录

Abu Huzaifah Bidin的更多文章

Liquid Neural Network: Putting the Network to Test in the Chaotic World

2025年3月1日

Liquid Neural Network: Putting the Network to Test in the Chaotic World

Alright. This is the 2nd part of my series in understanding the Liquid Neural Network.

11 条评论
Squeezenet : Implementing the 2016 Paper using Pytorch with Flexibility

2025年2月25日

Squeezenet : Implementing the 2016 Paper using Pytorch with Flexibility

There's a paper here, talking about Squeezenet, which based on my light reading is basically a compressed version of…
STDE (Stochastic Taylor Derivative Estimator) -The winning NeurIPS 2024 Paper from Singapore

2025年1月25日

STDE (Stochastic Taylor Derivative Estimator) -The winning NeurIPS 2024 Paper from Singapore

As I said last year, we will try to tackle and impement more AI/ML paper in 2025 to better understand the theory and…

4 条评论
Can AI really match the Human Brain

2025年1月10日

Can AI really match the Human Brain

There's a lot of advancement in the field of artificial intelligence in 2024 and we will see more of it in the coming…
My RM 500 (failed?) experiments on Transformer

2025年1月2日

My RM 500 (failed?) experiments on Transformer

“I set out to scale transformer. But I only scaled my frustration and loss RM 500.

1 条评论

See all articles

Liquid Neural Networks : Applying Human Brain Dynamics into Classical Neural Networks

Abu Huzaifah Bidin

Process Engineer | Independent AI Researcher | Writer

Understanding the?Abstract

Discussion

1. Adjoint Method instead of Normal Backpropagation

2. The Architecture: Liquid Time Constant?Network

3. Understanding that Time Decay and Time?Constant

领英推荐

4. Understanding why the Equation are structured as?that?

Solving and Implementing the?LCTN/LNN

The Forward Pass by using Fused?Solver

The Pseudocode of the Forward?Pass:

Beyond the Hype: Science of AI

228 位关注者

Abu Huzaifah Bidin的更多文章

社区洞察

其他会员也浏览了

Advances in Image Classification Using Neural Networks

Unveiling the Magic: Understanding Neural Networks Like Never Before

Quantum Neural Networks: Exploring the Landscape

AI Atlas #24: Liquid Neural Networks

Recurrent Neural Networks Unveiled: Mastering Sequential Data Beyond Simple ANNs

When Two Heads are Better Than One: Twin Neural Networks

Understanding Neural Networks: A Comprehensive Guide

Neural Network architectures that no one is talking about !

The Evolution of Neural Networks: From Perceptrons to Transformers

Navigating the Neural Network Labyrinth: An In-Depth Odyssey through Layers, Architectures, and Advanced Concepts

Understanding the?Abstract

Discussion

1. Adjoint Method instead of Normal Backpropagation

2. The Architecture: Liquid Time Constant?Network

3. Understanding that Time Decay and Time?Constant

领英推荐

4. Understanding why the Equation are structured as?that?

Solving and Implementing the?LCTN/LNN

The Forward Pass by using Fused?Solver

The Pseudocode of the Forward?Pass:

Beyond the Hype: Science of AI

228 位关注者

Abu Huzaifah Bidin的更多文章

Liquid Neural Network: Putting the Network to Test in the Chaotic World

Squeezenet : Implementing the 2016 Paper using Pytorch with Flexibility

STDE (Stochastic Taylor Derivative Estimator) -The winning NeurIPS 2024 Paper from Singapore

Can AI really match the Human Brain

My RM 500 (failed?) experiments on Transformer

社区洞察

其他会员也浏览了

Advances in Image Classification Using Neural Networks

Unveiling the Magic: Understanding Neural Networks Like Never Before

Quantum Neural Networks: Exploring the Landscape

AI Atlas #24: Liquid Neural Networks

Recurrent Neural Networks Unveiled: Mastering Sequential Data Beyond Simple ANNs

When Two Heads are Better Than One: Twin Neural Networks

Understanding Neural Networks: A Comprehensive Guide

Neural Network architectures that no one is talking about !

The Evolution of Neural Networks: From Perceptrons to Transformers

Navigating the Neural Network Labyrinth: An In-Depth Odyssey through Layers, Architectures, and Advanced Concepts