Anomaly Detection with VAE
Anomaly detection is a machine learning technique used to identify patterns that are considered unusual or out of the ordinary. Think of it as the machine learning equivalent of that one friend in your group who always notices when something is off.
In machine learning, we train our algorithms to recognize what is normal behavior, and when it detects something that doesn't fit the norm, it raises a flag. It's like having a bouncer at a party who kicks out anyone who's behaving in a suspicious or unusual way.
And just like that bouncer, sometimes the machine learning algorithm can be a bit overzealous and kick out someone who was just having a bit too much fun. So, it's important to tweak the settings to make sure it's not flagging too many false positives.
But hey, it's better to have an overzealous bouncer than no bouncer at all, right?
There are many machine learning models available which can be that bouncer we are looking for, but autoencoders stand out in specific variational autoencoders (VAE's) as they automatically learn the general structure of the training data to isolate only its discriminative features i.e., latent vector. The latent vector acts as an information bottleneck that forces the model to be very selective about what to encode.
During the training process an encoder produces latent vector, and a decoder reconstructs the original data from the latent vector as faithfully as possible. By detecting inaccuracies in the reconstruction, we can tell which sample is an outlier.
AE's learn to generate latent vector that decoder can reproduce, however VAE learn to generate two vectors that represent the parameters of distribution from which the latent vector is sampled. Which means VAE learning task is to learn a function that will generate parameters of distribution from which latent vector that a decoder can reproduce can be sampled.
Below is an example code for setting up VAE model:
Encoder -
-- encoder model inputs = Input(shape=input_shape, name='encoder_input') hidden_encode = Dense(dim_i, activation='relu')(inputs) latent_mean = Dense(latent_dim, name='z_mean')(hidden_encode) latent_log_var = Dense(latent_dim, name='z_log_var')(hidden_encode) -- sampling latent = Lambda(sampling, output_shape=(latent_dim,), name='z')([z_mean, z_log_var]) -- instantiate encoder model encoder = Model(inputs, [latent_mean, latent_log_var, latent], name='encoder') encoder.summary()
Decoder -
# decoder model latent_inputs = Input(shape=(latent_dim,), name='z_sampling') hidden_decode = Dense(intermediate_dim, activation='relu')(latent_inputs) outputs = Dense(original_dim, activation='sigmoid')(hidden_decode) # instantiate decoder model decoder = Model(latent_inputs, outputs, name='decoder') decoder.summary()
VAE Loss -
To achieve such latent vector, VAE's have two components as part of loss function:
Reconstruction loss component - forces the encoder to generate latent features that minimize the reconstruction loss.
KL loss component - forces the distribution generated by the encoder to be like the prior probability of the input vector.
This results in a heavily regularized encoder which results in more continuous and smoother latent space.
# VAE loss reconstruction_loss = binary_crossentropy(inputs, outputs) reconstruction_loss *= original_dim kl_loss = 1 + latent_log_var - Keras.backend.square(latent_mean) - Keras.backend.exp(latent_log_var) kl_loss = Keras.backend.sum(kl_loss, axis=-1) vae_loss = Keras.backend.mean(reconstruction_loss + kl_loss)
VAE Model -
outputs = decoder(encoder(inputs)[2]) vae = Model(inputs, outputs, name='vae') vae.add_loss(vae_loss) vae.compile(optimizer='adam') vae.summary()
Happy reading!