ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

How to create x,y (input,output) for Text Generation Models:

Sriram Kumar

Tech Lead / EM | Founding Eng @Thena | IIT Guwahati

å‘å¸ƒæ—¥æœŸ: 2023å¹´9æœˆ25æ—¥

+ å…³æ³¨

### 1. One-Step Ahead Character Prediction

- x: Sequence of characters.

- y: Next character in the sequence.

Advantages:

- Simple and easy to implement.

- Suitable for fine-grained character-level text generation.

- Can maintain coherence and grammar in generated text.

Disadvantages:

- Limited context: The model only considers local context, which may limit the capture of long-range dependencies.

- Slower training: Training can be slower due to the large number of predictions made for each sequence.

Example: Predicting the next character in a sentence like "The quick brown fox jumps over the lazy dog."

### 2. Sequence-to-Sequence Character Prediction

- x: Sequence of characters.

- y: Sequence of characters (same length as x) where each character is the next character in x.

Advantages:

- Similar to one-step ahead prediction but generates longer sequences.

- Can still maintain coherence and grammar in generated text.

Disadvantages:

- Same limitations as one-step ahead prediction regarding limited context and training speed.

Example: Given "Hello, how are you?" as input, predict "ello, how are you?" as output.

### 3. Word-Level Text Generation

- x: Sequence of words.

- y: Next word in the sequence.

Advantages:

- Higher-level granularity: Generates text at the word level, which may capture more meaningful semantic units.

- Can handle larger context and potentially capture longer-range dependencies.

é¢†è‹±æŽ¨è

Image Watermarking Using Computer Vision

Shorthills AI 3 å¹´å‰

Syntax Scrutiny with T.K.E.P ????(Ep. 1)

Ogo Gladys Amarachi?? (THE KEEN-EYED PROOFREADER??) 11 ä¸ªæœˆå‰

A Comprehensive Guide on Writing and Implementing Regex.

A Comprehensive Guide on Writing and Implementingâ€¦

Umer Nawaz 1 å¹´å‰

Disadvantages:

- Requires a word-level vocabulary, which can be challenging to build for specialized domains.

- May require more complex models to handle variable-length sequences.

Example: Predicting the next word in a sentence like "The quick brown fox."

### 4. Sequence-to-Sequence Word Prediction

- x: Sequence of words.

- y: Sequence of words (same length as x) where each word is the next word in x.

Advantages:

- Generates coherent sentences and paragraphs.

- Captures semantic relationships between words.

Disadvantages:

- Requires a word-level vocabulary.

- More complex than character-level models.

Example: Given "I like to play guitar" as input, predict "like to play guitar" as output.

### 5. Sentence-Level Text Generation

- x: Single sentence.

- y: Next sentence.

Advantages:

- Generates complete and coherent sentences, suitable for dialogue or story generation.

- Can capture high-level context and coherence.

Disadvantages:

- May require more complex models to maintain context across sentences.

- Training data with sentence-level annotations may be needed.

Example: Predicting the next sentence after "Once upon a time, there was a princess."

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Sriram Kumarçš„æ›´å¤šæ–‡ç«

Vector Embedding : UnSung Hero

2023å¹´11æœˆ2æ—¥

Vector Embedding : UnSung Hero

I am writing this article to explain what I have learnt about vector embedding and how this is one of the firstâ€¦
Binary Classification: ANN vs RNN what changed ?

2023å¹´9æœˆ20æ—¥

Binary Classification: ANN vs RNN what changed ?

Question : ANN vs RNN what changed ? Answer: Forget the above question and let's go basic. Any binary classificationâ€¦
Looking for a new start.

2016å¹´5æœˆ16æ—¥

Looking for a new start.

Left Job @misys

How to create x,y (input,output) for Text Generation Models:

Sriram Kumar

Tech Lead / EM | Founding Eng @Thena | IIT Guwahati

é¢†è‹±æŽ¨è

Sriram Kumarçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

A topic that often comes up on the discussions forum is spaCy's Vocab object and its vectors. So let's go over a few properties

Building Sentitrac: Rotating Carousel Component

When is something a (domain-specific) language?

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

C++ Core Guidelines: Semantic of Function Parameters and Return Values

Literals And Integers in Solidity

Contrasting the RAG approach with prompt engineering and fine-tuning in building domain-specific LLM applications

Why is Transformer Preferred Over RNN? - Part 2: Seq2Seq

Feature Extraction from TEXT

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

é¢†è‹±æŽ¨è

Sriram Kumarçš„æ›´å¤šæ–‡ç«

Vector Embedding : UnSung Hero

Binary Classification: ANN vs RNN what changed ?

Looking for a new start.

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

A topic that often comes up on the discussions forum is spaCy's Vocab object and its vectors. So let's go over a few properties

Building Sentitrac: Rotating Carousel Component

When is something a (domain-specific) language?

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

C++ Core Guidelines: Semantic of Function Parameters and Return Values

Literals And Integers in Solidity

Contrasting the RAG approach with prompt engineering and fine-tuning in building domain-specific LLM applications

Why is Transformer Preferred Over RNN? - Part 2: Seq2Seq

Feature Extraction from TEXT

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†