Revolutionizing AI with Quaternion Algebra: A Leap in Neural Network Efficiency
Abstract
In this article, we delve into the novel application of quaternion algebra to neural network design and training. Utilizing the inherent four-dimensional structure of quaternions, we construct a Quaternion Dense layer that enhances computational efficiency and accuracy in a foundational layer design for most modern computer vision and GPT LLM tasks (the dense layer results in an embedding space for the transformer friendly folks). This method significantly optimizes training time and performance metrics - and I will give you the exact runnable code as performed on the daily driver M1 MacBook Air - currently my main pocket laptop for day to day tasks and travel. Along with the actual code - I present the underlying mathematical concepts, equations, and hyper-parameter tuning methods, providing a comprehensive analysis of the results from the example used in this article. For compatibility with other non-quaternion layers - we implement our changes into a fully custom keras kernel inside a fully custom QuaternionDense layer.
Our findings highlight a significant advancement in AI, emphasizing the importance of quaternion algebra in future neural network development - a task I have been personally researching daily for a few years now. The goal is to expand the efficiency of quaternion operations to tensor operations and subspace model structures in a way which fundamentally optimizes all current models.
Introduction
The field of artificial intelligence (AI) continues to evolve, with innovations pushing the boundaries of computational efficiency and model accuracy. One such innovation is the application of quaternion algebra to neural network architecture, a research endeavor I have been tackling for the better part of the last year and.half. This article explores the design, implementation, and advantages of Quaternion Dense layers in neural networks. We demonstrate the practical implications of this approach using a 2020 M1 MacBook Air, showcasing significant improvements in training time, accuracy, and loss metrics over their linear algebra based tensor cousins we currently use.
I have also integrated the proper telemetry logging and visualization necessary to understand the concepts behind the runs, along with building a Quaternion hypermodel that has a 99.95 percent accuracy (98+ percent on validation) rate in for categorization on the MNIST handwriting dataset in mere microseconds - and these are all enabled in the provided code example provided here. All the bells and whistles for the inquisitive citizen scientists interested in the field of advanced mathematics and artificial intelligence.
The gist in an image:
The Meat: QuaternionicDense Layer (Keras, TPU Compatible)
The following code is the meat of the change. In order to get the performance and compatibility of the alternate algebra strategy - we must not only structure the math, but use the math in the structure of the complex construct of the quaternion non-commutative properties. Don't worry if you don't get alot of the code yet - most do not ever swap the math out at this level - in this way - so a learning rate reduction on the human side is A-OK.
class QuaternionDense(Layer):
"""
Custom Keras layer for quaternion dense operations.
Args:
units (int): Number of units in the dense layer.
activation (str, optional): Activation function to use. Defaults to None.
"""
def __init__(self, units: int, activation: str = None, **kwargs):
super(QuaternionDense, self).__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.kernel_initializer = initializers.get("glorot_uniform")
self.bias_initializer = initializers.get("zeros")
def build(self, input_shape: tf.TensorShape) -> None:
"""
Create the layer weights.
Args:
input_shape (tf.TensorShape): Shape of the input tensor.
"""
assert input_shape[-1] % 4 == 0, "Input dimensions must be divisible by 4 for quaternions."
self.input_dim = input_shape[-1] // 4
self.kernel = self.add_weight(
shape=(self.input_dim, self.units, 4),
initializer=self.kernel_initializer,
trainable=True,
name="kernel",
)
self.bias = self.add_weight(
shape=(self.units * 4,),
initializer=self.bias_initializer,
trainable=True,
name="bias",
)
def call(self, inputs: tf.Tensor) -> tf.Tensor:
"""
Forward pass for the layer.
Args:
inputs (tf.Tensor): Input tensor.
Returns:
tf.Tensor: Output tensor after applying quaternion dense operations.
"""
inputs = tf.reshape(inputs, (-1, self.input_dim, 4))
# Extract components
w1, x1, y1, z1 = inputs[..., 0], inputs[..., 1], inputs[..., 2], inputs[..., 3]
# Extract kernel components
w2 = self.kernel[:, :, 0]
x2 = self.kernel[:, :, 1]
y2 = self.kernel[:, :, 2]
z2 = self.kernel[:, :, 3]
# Compute quaternion multiplication components
ww, wx, wy, wz = self.quaternion_multiply(w1, x1, y1, z1, w2, x2, y2, z2)
outputs = tf.concat([ww, wx, wy, wz], axis=-1)
outputs = tf.reshape(outputs, (-1, self.units * 4))
outputs += self.bias
if self.activation:
outputs = self.activation(outputs)
return outputs
def compute_output_shape(self, input_shape: tf.TensorShape) -> tf.TensorShape:
"""
Compute the output shape of the layer.
Args:
input_shape (tf.TensorShape): Shape of the input tensor.
Returns:
tf.TensorShape: Shape of the output tensor.
"""
return (input_shape[0], self.units * 4)
def quaternion_multiply(
self,
w1: tf.Tensor,
x1: tf.Tensor,
y1: tf.Tensor,
z1: tf.Tensor,
w2: tf.Tensor,
x2: tf.Tensor,
y2: tf.Tensor,
z2: tf.Tensor,
) -> tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
"""
Perform quaternion multiplication and return the components.
Args:
w1, x1, y1, z1: Components of the first quaternion tensor (shape: [batch_size, input_dim])
w2, x2, y2, z2: Components of the second quaternion tensor (shape: [input_dim, units])
Returns:
tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]: Components of the result tensor (shape: [batch_size, units])
"""
w1 = tf.expand_dims(w1, -1) # [batch_size, input_dim, 1]
x1 = tf.expand_dims(x1, -1) # [batch_size, input_dim, 1]
y1 = tf.expand_dims(y1, -1) # [batch_size, input_dim, 1]
z1 = tf.expand_dims(z1, -1) # [batch_size, input_dim, 1]
w2 = tf.expand_dims(w2, 0) # [1, input_dim, units]
x2 = tf.expand_dims(x2, 0) # [1, input_dim, units]
y2 = tf.expand_dims(y2, 0) # [1, input_dim, units]
z2 = tf.expand_dims(z2, 0) # [1, input_dim, units]
ww = tf.reduce_sum(
w1 * w2 - x1 * x2 - y1 * y2 - z1 * z2, axis=1) # [batch_size, units]
wx = tf.reduce_sum(
w1 * x2 + x1 * w2 + y1 * z2 - z1 * y2, axis=1) # [batch_size, units]
wy = tf.reduce_sum(
w1 * y2 - x1 * z2 + y1 * w2 + z1 * x2, axis=1) # [batch_size, units]
wz = tf.reduce_sum(
w1 * z2 + x1 * y2 - y1 * x2 + z1 * w2, axis=1) # [batch_size, units]
return ww, wx, wy, wz
That dense nugget of quality math may take a second to simmer to tasty brainstorming ideas for other research engineers. I only hope it does. We will cover all layer types and conventional network updates with this improved methodology in the future. For now, keep reading to understand the math and the changes behind it.
Quaternion Algebra: A Mathematical Foundation
Around 180 years ago a branch of dimensional mathematics was developed which diverged from the math you probably know (linear algebra). We use this 'other' math commonly in robotics and 3D animation - but more the structure this math presents - i.e. in a (simulated) physical sense for robots and 3D animations - but we have collectively somewhat forgotten the 900 pages of the texts recently written on this forgotten math...
Welcome to (probably) hearing Quaternion (quaterionic vectorized) Algebra, introduced by Sir William Rowan Hamilton in 1843, for the first time. This algebra extends complex numbers to four dimensions. Giving us a theoretical mathematical particle which fits most quantum mechanical operations and A quaternion (q) is expressed as:
[q = w + xi + yj + zk]
where ( w, x, y, z ) are real numbers, and ( i, j, k ) are the fundamental quaternion units satisfying:
[i^2 = j^2 = k^2 = ijk = -1 ]
Quaternions enable efficient rotation calculations in three dimensions and have been extensively used in computer graphics and robotics as mentioned earlier. This is my addition to the list of fields by including their application in neural networks in order to leverage these properties to enhance computational performance.
Methodology
Quaternion Dense Layer
We will implement a dense layer change by changing the way the underlying layer is computed
Discussion of Improvements
The incorporation of quaternion algebra in neural networks offers several advantages:
1. Dimensionality Reduction: Quaternions compactly represent four-dimensional data, reducing the number of parameters and computational complexity.
2. Efficient Rotations: Quaternion multiplication is more efficient than traditional matrix operations for rotations, improving the training speed.
3. Improved Performance: Our results demonstrate higher accuracy and lower loss compared to models with traditional dense layers.
The efficiency and loss reduction rates indicate that Quaternion Dense layers not only reduce computational complexity but also improve model performance. The results show that the optimal configuration achieved a validation accuracy of 98.04% with a significantly reduced loss, highlighting the effectiveness of quaternion algebra in neural network design.
Implementation
The implementation involves defining a custom layer in TensorFlow that performs quaternion multiplication during the forward pass. The model architecture includes Quaternion Dense layers followed by a traditional dense layer for classification. The model is trained on the MNIST dataset to evaluate performance.
Open a shell or terminal and install your pre-req's:
% pip install tensorflow keras keras-tuner tensorboard
import tensorflow as tf
from tensorflow.keras.layers import Layer, Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras import activations, initializers
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import mnist
from tensorflow.keras.callbacks import TensorBoard, CSVLogger, ReduceLROnPlateau, EarlyStopping
from kerastuner import HyperModel
from kerastuner.tuners import RandomSearch
import datetime
import os
class QuaternionDense(Layer):
"""
Custom Keras layer for quaternion dense operations.
Args:
units (int): Number of units in the dense layer.
activation (str, optional): Activation function to use. Defaults to None.
"""
def __init__(self, units: int, activation: str = None, **kwargs):
super(QuaternionDense, self).__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.kernel_initializer = initializers.get("glorot_uniform")
self.bias_initializer = initializers.get("zeros")
def build(self, input_shape: tf.TensorShape) -> None:
"""
Create the layer weights.
Args:
input_shape (tf.TensorShape): Shape of the input tensor.
"""
assert input_shape[-1] % 4 == 0, "Input dimensions must be divisible by 4 for quaternions."
self.input_dim = input_shape[-1] // 4
self.kernel = self.add_weight(
shape=(self.input_dim, self.units, 4),
initializer=self.kernel_initializer,
trainable=True,
name="kernel",
)
self.bias = self.add_weight(
shape=(self.units * 4,),
initializer=self.bias_initializer,
trainable=True,
name="bias",
)
def call(self, inputs: tf.Tensor) -> tf.Tensor:
"""
Forward pass for the layer.
Args:
inputs (tf.Tensor): Input tensor.
Returns:
tf.Tensor: Output tensor after applying quaternion dense operations.
"""
inputs = tf.reshape(inputs, (-1, self.input_dim, 4))
# Extract components
w1, x1, y1, z1 = inputs[..., 0], inputs[..., 1], inputs[..., 2], inputs[..., 3]
# Extract kernel components
w2 = self.kernel[:, :, 0]
x2 = self.kernel[:, :, 1]
y2 = self.kernel[:, :, 2]
z2 = self.kernel[:, :, 3]
# Compute quaternion multiplication components
ww, wx, wy, wz = self.quaternion_multiply(w1, x1, y1, z1, w2, x2, y2, z2)
outputs = tf.concat([ww, wx, wy, wz], axis=-1)
outputs = tf.reshape(outputs, (-1, self.units * 4))
outputs += self.bias
if self.activation:
outputs = self.activation(outputs)
return outputs
def compute_output_shape(self, input_shape: tf.TensorShape) -> tf.TensorShape:
"""
Compute the output shape of the layer.
Args:
input_shape (tf.TensorShape): Shape of the input tensor.
Returns:
tf.TensorShape: Shape of the output tensor.
"""
return (input_shape[0], self.units * 4)
def quaternion_multiply(
self,
w1: tf.Tensor,
x1: tf.Tensor,
y1: tf.Tensor,
z1: tf.Tensor,
w2: tf.Tensor,
x2: tf.Tensor,
y2: tf.Tensor,
z2: tf.Tensor,
) -> tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
"""
Perform quaternion multiplication and return the components.
Args:
w1, x1, y1, z1: Components of the first quaternion tensor (shape: [batch_size, input_dim])
w2, x2, y2, z2: Components of the second quaternion tensor (shape: [input_dim, units])
Returns:
tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]: Components of the result tensor (shape: [batch_size, units])
"""
w1 = tf.expand_dims(w1, -1) # [batch_size, input_dim, 1]
x1 = tf.expand_dims(x1, -1) # [batch_size, input_dim, 1]
y1 = tf.expand_dims(y1, -1) # [batch_size, input_dim, 1]
z1 = tf.expand_dims(z1, -1) # [batch_size, input_dim, 1]
w2 = tf.expand_dims(w2, 0) # [1, input_dim, units]
x2 = tf.expand_dims(x2, 0) # [1, input_dim, units]
y2 = tf.expand_dims(y2, 0) # [1, input_dim, units]
z2 = tf.expand_dims(z2, 0) # [1, input_dim, units]
ww = tf.reduce_sum(w1 * w2 - x1 * x2 - y1 * y2 - z1 * z2, axis=1) # [batch_size, units]
wx = tf.reduce_sum(w1 * x2 + x1 * w2 + y1 * z2 - z1 * y2, axis=1) # [batch_size, units]
wy = tf.reduce_sum(w1 * y2 - x1 * z2 + y1 * w2 + z1 * x2, axis=1) # [batch_size, units]
wz = tf.reduce_sum(w1 * z2 + x1 * y2 - y1 * x2 + z1 * w2, axis=1) # [batch_size, units]
return ww, wx, wy, wz
class QuaternionHyperModel(HyperModel):
"""
HyperModel class for building and tuning a Keras model with quaternion dense layers.
"""
def build(self, hp):
"""
Build the Keras model.
Args:
hp: Hyperparameters for tuning.
Returns:
Model: Compiled Keras model.
"""
inputs = Input(shape=(784,))
x = QuaternionDense(hp.Int('units_1', min_value=10, max_value=50, step=10), activation="relu")(inputs)
x = QuaternionDense(hp.Int('units_2', min_value=10, max_value=50, step=10), activation=None)(x)
x = Dense(10, activation="softmax")(x)
model = Model(inputs=inputs, outputs=x)
model.compile(
optimizer=tf.keras.optimizers.Adam(
hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='LOG', default=1e-3)),
loss="categorical_crossentropy",
metrics=["accuracy"]
)
return model
def load_data(self):
"""
Load and preprocess the MNIST dataset.
Returns:
tuple: Tuple containing training and testing data.
"""
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape((-1, 784)).astype("float32") / 255
x_test = x_test.reshape((-1, 784)).astype("float32") / 255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
return (x_train, y_train), (x_test, y_test)
def train(self, x_train, y_train, x_val, y_val, epochs=10, batch_size=128):
"""
Train the model.
Args:
x_train (np.ndarray): Training data.
y_train (np.ndarray): Training labels.
x_val (np.ndarray): Validation data.
y_val (np.ndarray): Validation labels.
epochs (int, optional): Number of epochs to train. Defaults to 10.
batch_size (int, optional): Batch size. Defaults to 128.
Returns:
History: Training history.
"""
log_dir = os.path.join("logs", "fit", datetime.datetime.now().strftime(
"%Y%m%d-%H%M%S"))
tensorboard_callback = TensorBoard(log_dir=log_dir,
histogram_freq=1)
csv_logger = CSVLogger('training_log.csv', append=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=1e-6)
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
callbacks = [tensorboard_callback, csv_logger, reduce_lr, early_stopping]
history = self.model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
validation_data=(x_val, y_val), callbacks=callbacks)
return history
def evaluate(self, x_test, y_test):
"""
Evaluate the model.
Args:
x_test (np.ndarray): Test data.
y_test (np.ndarray): Test labels.
Returns:
tuple: Test loss and accuracy.
"""
return self.model.evaluate(x_test, y_test)
def summary(self):
"""
Print the model summary.
"""
self.model.summary()
if __name__ == "__main__":
hypermodel = QuaternionHyperModel()
(x_train, y_train), (x_test, y_test) = hypermodel.load_data()
tuner = RandomSearch(
hypermodel,
objective='val_accuracy',
max_trials=5,
executions_per_trial=1,
directory='quaternion_tuning',
project_name='mnist_quaternion'
)
tuner.search_space_summary()
tuner.search(x_train, y_train, epochs=10,
validation_data=(x_test, y_test),
callbacks=[
TensorBoard(log_dir=os.path.join("logs", "fit", datetime.datetime.now().strftime(
"%Y%m%d-%H%M%S"
))),
CSVLogger('tuning_log.csv',
append=True),
ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=1e-6),
EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
])
best_model = tuner.get_best_models(num_models=1)[0]
best_model.summary()
test_loss, test_acc = best_model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}")
Hyperparameter Tuning
We use the Keras Tuner library for hyperparameter tuning, optimizing the number of units and learning rate to achieve the best validation accuracy. The RandomSearch tuner explores different configurations to find the optimal model parameters.
Training and Evaluation
The model is trained on the MNIST handwriting - a two dimensional dataset - using the following hyperparameters:
1. Units: The number of units in the Quaternion Dense layers.
2. Learning Rate: The learning rate for the Adam optimizer.
领英推荐
3. Epochs: The number of epochs for training.
4. Batch Size: The batch size for training.
The training and evaluation process is monitored using TensorBoard, python logger, ReduceLROnPlateau, and EarlyStopping callbacks. Tensorboard gives us visualization and exact metrics of the training and validation operations, a syslog format for debug and error log during execution, learning rate reduction and early stopping to prevent overfitting callbacks. Pretty much the basic kitchen sink needed to perform some quaternion based science!
First, the Hyper Parameter Tuning Results
The training process was performed on a 2020 M1 MacBook Air as we discussed above. The following table presents the results of different trials, including the time per epoch, validation accuracy, and loss.
First lets take a look at the Hyperparameter "pre" training step. This selects the best model parameters for the simple Input>QuaternionDense>LinearDense (output) model we have constructed above. We set to run 5 trials at 10 epochs per trial per parameter set, and we record the time to search for those so we can consider the overall cost in efficiency by a factor of time. Since power consumption is factor of time and compute, the lower the time and higher the accuracy of models is also a direct fit for conservation efforts and power reduction efforts at at scale.
The following calculations show the efficiency and loss reduction rates for each trial:
Efficiency
Efficiency is calculated as the time taken per epoch divided by the accuracy:
Loss Reduction Rate
Loss reduction rate is calculated as the percentage reduction in loss from the previous epoch to the current epoch.
Example Calculations
To compute efficiency for trial 1, for instance :
For Trial 2:
The loss reduction rate for Trial 1 from epoch 1 to epoch 2 was 11.5%:
For the remainder of the parameter trials we see the following efficiency improvements.
Efficiency Analysis:
? Trial 1 shows the highest efficiency with a time per accuracy percentage of 0.0731 seconds.
? Trial 5 has the lowest efficiency at 0.2550 seconds per accuracy percentage with a accuracy of 99.95% and validation accuracy of 98+ percent. Nearly perfect recognition with little in the way of training.
? The efficiency demonstrates the impact of the number of units and learning rate on the training time and model performance.
Loss Reduction Analysis:
? Trial 4 achieves the highest loss reduction rate between epoch 1 and epoch 2 at 23.51%.
? Trial 2 shows the lowest loss reduction rate at 9.36%.
? The loss reduction rate highlights the effectiveness of different hyperparameter configurations in minimizing the training loss.
The above analyses and results underscore the efficiency and performance improvements introduced by Quaternion Dense layers. These findings highlight the potential of quaternion algebra in advancing AI technologies.
The Final Product Run Log
Reloading Tuner from quaternion_tuning/mnist_quaternion/tuner0.json
Search space summary
Default search space size: 3
units_1 (Int)
{'default': None, 'conditions': [], 'min_value': 10, 'max_value': 50, 'step': 10, 'sampling': 'linear'}
units_2 (Int)
{'default': None, 'conditions': [], 'min_value': 10, 'max_value': 50, 'step': 10, 'sampling': 'linear'}
learning_rate (Float)
{'default': 0.001, 'conditions': [], 'min_value': 0.0001, 'max_value': 0.01, 'step': None, 'sampling': 'log'}
/opt/anaconda3/envs/Tensorflow/lib/python3.11/site-packages/keras/src/saving/saving_lib.py:576: UserWarning:
Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 14 variables.
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer) │ (None, 784) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ quaternion_dense │ (None, 200) │ 39,400 │
│ (QuaternionDense) │ │ │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ quaternion_dense_1 │ (None, 160) │ 8,160 │
│ (QuaternionDense) │ │ │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense) │ (None, 10) │ 1,610 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 49,170 (192.07 KB)
Trainable params: 49,170 (192.07 KB)
Non-trainable params: 0 (0.00 B)
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.9776 - loss: 0.1045
Test accuracy: 0.9804, Test loss: 0.0831
98.04 percent validation accuracy!! with only 49k parameters - trained in 2 seconds with a loss rate of .08. On a Macbook M1 Air (2020)
In Closing
This study presents a novel approach to neural network design using quaternion algebra, highlighting its potential to revolutionize AI by enhancing computational efficiency and model performance. The successful implementation and evaluation on a 2020 M1 MacBook Air underscore the practical benefits of this technique. Future research I will post here will explore further applications of quaternion algebra in AI and other computational fields.
References
1. Hamilton, W. R. (1844). On Quaternions. Proceedings of the Royal Irish Academy.
2. Voight, J. (2024). Quaternion Algebras. Springer. [Link](https://www.springer.com/us/book/9783030566920)
Cyber Security Leader & Advisor (Also technical infosec Janitor) @ SecurityScorecard
7 个月Nice work Derek! This looks very promising. Thank you for sharing.