登录查看更多内容

Quick read: Generative AI & Large Language Models (LLM) #4

Ilan Sinai

Seasoned CTO & entrepreneur. Consultant & Individual Contributor

发布日期: 2025年1月7日

+ 关注

Part 4: Latent states?—?Roles and Applications

Former articles in the series:

Part 1:Generative & Discriminative Models:?Describe versus Decide

Part 2: Contemporary examples of Generative AI Models and their usage

Part 3: Embeddings?: Roles and Applications

Latent states?—?hidden representation of the accumulated knowledge

Figure 2: Schematics of the input sentence — X, generated embedding — H, and the weight matrix - W. From?

Let's revisit Figure 2 from the previous article.By repeating the process used to produce our embedding layer, the H vector enables us to create further coding—namely, latent layers. These layers, supported by normalization and additional mathematical steps, yield highly non-linear “mixtures.” With proper training, these latent layers represent the inner learning of the model's true nature, stripping away noise and unnecessary information. Often, they have a lower dimensionality than the embedding input, acting as a bottleneck layer. Most of the time, this latent space representation serves as the true encapsulation of the accumulated knowledge within the model.

Semantic proximity and embeddings powered by the classic Skip-Gram model

The Skip-gram model predicts context words (words that appear nearby in a sentence) given a target words. It learns to represent words in a high-dimensional vector space based on their semantic context. The Skip-Gram model predicts the surrounding context words from a center word and it used in tasks such as sentence compilation or sentence translation. This is a classic algorithm, part of the Word2Vec framework developed by Mikolov et al. Word2Vec is the most popular algorithm for word embeddings that bring out the semantic similarity of words in order to capture different facets of the meaning of a word in a sentence.

During training, the model dynamically processes each word in the sentence as the target word and generates context words repeatedly for training.

For example, in the sentence "The cat sat on the mat," if the target word is "cat" and the context window is 2, the input is "cat", and the outputs are ["The", "sat"]. We prepare the input in a form of

?(Center word, Context/any word(s), Label wanted y/n) with labeling 1 for a context and 0 otherwise. See for example Figure 3. We train the appropriate matrices for minimal overall error.

?(Cat,sat,1), (Cat,The,1)? ← wanted

vs.

(Cat,on,0),(Cat,the,0),(Cat,mat,0)? ←- "false" content

Figure 3: preparing word pairs with proper labels to train the model to predict the context of the word "cat"

Key Characteristics of Generative Models:

Multiple Outputs per Input: During training, the model generates several outputs for each input (center word).
Output Structure: Each output is a combination of the center word, a context word, and the associated probability.
Dynamic Pair Generation: For a vocabulary of size N, the model produces roughly NXN

?input-output pairs, ensuring training on every possible word pair through its dynamic nature.

Density Function: When generating a density function using the trained model, it performs effectively since it already accounts for the probability distribution.
Sentence Completion: For tasks like sentence completion, additional steps are required to discriminate between potential outputs. This involves selecting the word with the highest probability (see Part 1: Generative & Discriminative Models — Describe versus Decide).
Computational Complexity: Having approximately N*N probabilities reflects both the model's power and its main weakness—excessive computational demands. This challenge is particularly evident in transformer models.

In reality, multiple matrices must be learned to propagate and generate various outputs (see Figure 4). These matrices are refined iteratively: repeatedly feeding inputs, performing calculations, and adjusting coefficients until the error is minimized.

Path to Data Authenticity

Figure 5: Path to genuine and effective new generated data, see text. Credit:?

The generation of new, genuine data requires meeting several important criteria, as depicted in Figure 5. This data should span all possible states, avoiding gaps (who said sparse?). Achieving this allows us to construct distributions that produce new data truly reflecting the core meaning of the original dataset.

Effective Novelty

Naturally, intrinsic order exists in data. We often observe modes of states and varying degrees of clustering. To evaluate the usefulness of the generated latent space, Effective Novelty provides a straightforward metric.

Definition: Effective Novelty is a quantitative measurement ranging from 0 to 1. It represents the proportion of novel and unique entities generated out of the total existing and generated entities.

Posterior Collapse

The Posterior Collapse phenomenon occurs when latent variables become uninformative. In such cases, the generative model disregards the latent space and relies solely on input data to reconstruct the output. This is a common issue in Variational Autoencoders (VAEs).

领英推荐

Almost Timely News: ??? Small Language Models and…

Christopher Penn 5 个月前

Almost Timely News: Improving the Performance of…

Christopher Penn 1 年前

AutoML-GPT; Causal Reasoning and LLMs; MetaGPT; Free…

Danny Butvinik 1 年前

Figure 6: An effective generative model innovates by adding new latent states to the input data? (green circles) based on the learned density function - The blue line.? The blue line represents the model's learned density function, reflecting an accurate representation of the "real world."

Effective learnings introduces evenly spaced (second row? - right) or tightly spaced (left) latent states (yellow plus signs) and removes redundant states when necessary. In contrast, a non-innovative model fails by introducing poorly spaced latent states or neglecting to add meaningful ones (third row).

Two Conflicting Phenomena:

High Clustering + Effective Novelty: Imagine a library where books are grouped by subject (clusters). Within each subject, there is a wide variety of unique titles, and each book serves a specific purpose.
Mode Collapse: Imagine a library with mostly empty shelves. Of the few available books, many are duplicates of the same group, representing a lack of variety.

Figure 6 illustrate the conflicting modes. Table 1 summarize the differences between the different behaviors based on various characteristics .

Clearly, High Clustering + Effective Novelty is preferable to Mode Collapse for achieving meaningful and diverse data generation.

Table 1: Adequate generative latent?space

?

Usage Example 1: Leveraging a Self-Organizing Map (SOM) combined with a Convolutional Autoencoder to characterize current and future users by persona and behavioral traits.

Figure 7: (a) A trained Self-Organizing Map (SOM) identified clusters, including representing patients at high risk of quitting (low retention). The density of states among existing patients (the numerical data) enabled us to generate synthetic patient profiles on demand and predict how future users might behave in advance.

Figure 7:(b) An example of the reward system we implemented, based on patient personas, as well as their past and predicted future activities. This system was designed to reward and motivate users toward desired health indicators, ultimately achieving sustainable patient reconditioning and health benefits.

We aimed to characterize different user groups, identifying individuals at medium to high risk of developing diabetes in the coming years. Since these users carried their mobile phones, we could measure their physical activity levels. This data served as the input for a convolutional autoencoder.

We utilized transfer learning by initializing convolutional filters using the classic high-pass, low-pass, and band-pass filter families, rather than starting with random initial values. The bottleneck layers consisted of several deep latent states. These latent spaces were subsequently mapped to a secondary network, a semi-continuous neural network known as a Self-Organizing Map (SOM). After further training using appropriate metrics (such as the U-matrix) and information propagation techniques, we successfully identified clusters, Figure 7(a). These clusters were interpreted using phenotypic data, including user age, gender, fasting blood glucose levels, and Glycosylated Hemoglobin (Hemoglobin A1c) test results.

Our work achieved the following:

Tracking patient behavior over time: Monitoring responsiveness, consistency, and engagement.
Classifying patients by persona type: Grouping users based on shared traits and characteristics.
Predicting patient retention risk: Identifying individuals at risk of disengagement.
Implementing preventive measures: Designing interventions for patients with a low retention risk.
Personalizing responses and rewards: Utilizing reinforcement learning to motivate patients toward improved health outcomes, achieving sustainable patient reconditioning. Figure 7(b)
Tracking behavioral changes over time: Enabling personal alerts or Physician intervention when necessary.
Generating plausible profiles of future users: Simulating new personas with varying traits for future analysis.

It is important to note that all of the above was achieved before the advent of the LLM (Large Language Model) revolution. Although generative models existed, they were far less mature and recognized than today. Overcoming various constraints required creativity, leveraging numerous techniques and combinations of generative models, deep learning, machine learning, reinforcement learning, and classic signal processing methods.

Usage Example 2?—?using generative model to create virtual?molecule

Enter MolMIM, a novel probabilistic autoencoder model designed for generating small molecules with desired pharmacokinetic (PK) and pharmacodynamic (PD) properties. MolMIM leverages Mutual Information Machine (MIM) learning to create a clustered latent space, enabling efficient sampling of valid, unique, and novel molecules. The model outperforms existing methods in both single- and multi-objective property optimization tasks, such as balancing solubility, permeability, and receptor binding affinity, using a simple evolutionary search algorithm. Its success is attributed to the inherent structure of its learned latent space, which naturally clusters molecules with similar PK/PD profiles.

Figure 8 illustrates this process: small perturbations over the initial molecule (8a & 8b) result in minor modifications to molecular structure, while larger perturbations sample from more distant regions of the latent distribution, yielding significantly altered molecules (8c & 8d). Structural differences are highlighted in red, while similarities are marked in green (see 8e).

Having a generative tool like MolMIM provides an expedited path to experimental assays, facilitating the design of lead compounds with optimized pharmacokinetic (PK) and pharmacodynamic (PD) profiles, thereby accelerating drug discovery and development.

Word of caution: Are we certain that we have achieved only the desired qualities? This, of course, requires thorough verification—an aspect I will address in a future article.

This article explores generative AI models, focusing on latent states as crucial components for capturing accumulated knowledge. We? explains how these models, like Skip-Gram, learn semantic relationships and generate new data by creating a latent space representation. Key concepts such as effective novelty and posterior collapse are discussed. Practical applications are illustrated through examples of predicting user behavior using Self-Organizing Maps and generating novel molecules with MolMIM, emphasizing the potential and limitations of generative models in various fields.

Stay tuned for the next article: Transformer - High Level View

Generative AI-practical usage

415 位关注者

要查看或添加评论，请登录

Ilan Sinai的更多文章

Part 2:Temperature and touch sensors as receptors and medical devices

2025年3月17日

Part 2:Temperature and touch sensors as receptors and medical devices

Former articles in the series: Part 1:Temperature and touch sensors as receptors and medical devices Transient receptor…

1 条评论
Part 1: Temperature and touch sensors as receptors and medical devices

2025年3月4日

Part 1: Temperature and touch sensors as receptors and medical devices

Your body’s built-in heat, cold, and pain sensors—hacked! Meet TRP channels: the molecular gatekeepers that fuse…
Part 1: Temperature and touch sensors as receptors and medical devices

2025年2月26日

Part 1: Temperature and touch sensors as receptors and medical devices

Your body’s built-in heat, cold, and pain sensors—hacked! Meet TRP channels: the molecular gatekeepers that fuse…

2 条评论
P5 Quick read: Road to transformers - Generative AI #5

2025年2月18日

P5 Quick read: Road to transformers - Generative AI #5

Former articles in the series: Part 1:Generative & Discriminative Models?—?Describe versus Decide Part 2: Contemporary…
Circulating tumour cells and mRNA vaccines

2025年2月2日

Circulating tumour cells and mRNA vaccines

Video 1: BREAKING: Trump — Flanked By Larry Ellison, Sam Altman, & Masayoshi Son — Announces Project Stargate Recently,…

2 条评论
Should You Skip University?

2025年1月19日

Should You Skip University?

The Future of Programming as a Profession in the Incoming Decade. Video 1: You Don't Need a College Degree! - Elon Musk…

2 条评论
Should You Skip University?

2025年1月13日

Should You Skip University?

Will AI automation render traditional programming obsolete, or is a solid foundation in Computer Science still the key…
Interested in generating SPICE ??? One day, Generative AI tools might enable us to design complex SPICE molecules with desired properties, opening new

2024年12月30日

Interested in generating SPICE ??? One day, Generative AI tools might enable us to design complex SPICE molecules with desired properties, opening new

https://medium.com/@ilansinai/quick-read-generative-ai-large-language-models-llm-4-cc954b3c8cf9
Quick read: Generative AI & Large Language Models (LLM) #3

2024年12月22日

Quick read: Generative AI & Large Language Models (LLM) #3

Part 3: Embeddings — Roles and Applications Former articles in the series: Part 1:Generative & Discriminative Models —…
Quick read: Generative AI & Large Language Models (LLM) #3 - Embeddings — Roles and Applications

2024年12月16日

Quick read: Generative AI & Large Language Models (LLM) #3 - Embeddings — Roles and Applications

https://medium.com/@ilansinai/quick-read-generative-ai-large-language-models-llm-3-e8591b11fd84

See all articles

Quick read: Generative AI & Large Language Models (LLM) #4

Ilan Sinai

Seasoned CTO & entrepreneur. Consultant & Individual Contributor

Latent states?—?hidden representation of the accumulated knowledge

Semantic proximity and embeddings powered by the classic Skip-Gram model