• GenAI – Byte Pair Encoding

    GenAI – Byte Pair Encoding

    GenAI – Byte Pair Encoding Tokenization Table Of Contents: What Is Byte Pair Encoding. Intuition Behind Byte Pair Encoding. Why Use BPE Tokenization. Example Of BPE Tokenization. Benefits Of BPE Tokenization. Tools Used For BPE Tokenization. (1) What Is Byte Pair Encoding? (2) Simple Example Of Byte Pair Encoding. (3) Intuition Behind Byte Pair Encoding. (3) Why Use Byte Pair Encoding ? (4) How Byte Pair Encoding Works ? Step-by-Step Example of BPE: (5) How To Decide The Merge Rule In Byte Pair Encoding ? (6) Limitation Of BPE.

    Read More

  • GenAI – Character Level Tokenization

    GenAI – Character Level Tokenization

    GenAI – Character Level Tokenization Table Of Contents: What Is Character Level Tokenization ? Why To Use Character Level Tokenization ? How It Works Step By Step ? Advantages Of Character Level Tokenization. Limitation Of Character Level Tokenization. Model Used For Character Level Tokenization. (1) What Is Character Level Tokenization ? (2) Advantages Of Character Level Tokenization . (3) Disadvantages Of Character Level Tokenization . (4) How does a character-level tokenizer — which breaks text into individual characters — still manage to form meaningful words or understand language? (5) How Character Level Tokenization Works ?

    Read More

  • GenAI – Word Level Tokenization

    GenAI – Word Level Tokenization

    GenAI – Word Level Tokenization. Table Of Contents: What Is Word Level Tokenization? How words are separated (spaces, punctuation). Handling punctuation and special characters. Case sensitivity (e.g., “Apple” vs. “apple”) Handling contractions (e.g., “don’t” → “do” + “n’t”) Handling Hyphenated words (e.g., “state-of-the-art”) Handling Numbers, URLs, email addresses, emojis Applications Of Word Level Tokenization. Limitations Of Word Level Tokenization. Tools for Word Tokenization (1) What Is Word Level Tokenization ? (2) How Words Are Separated ? (3) Handling Punctuation and Special Characters. (4) Handling Case Sensitivity (e.g., “Apple” vs. “apple”) (5) Handling contractions (e.g., “don’t” → “do” + “n’t”) (6)

    Read More

  • GenAI – What Are Tokens In LLM ?

    GenAI – What Are Tokens In LLM ?

    GenAI – Tokens In LLM Table Of Contents: What Are Tokens In LLM ? Definition Of Tokens. Examples Of Tokens. How Tokenization Works?  Why Tokens Matter ? Tokens Vs Character Vs Words. Tokenization Techniques. Tokenization Tools. (1) What Is Tokenization ? (2) Definition Of Tokens. (3) Examples Of Tokens. (4) Why Tokens Matter ? (5) Tokens Vs Character Vs Words (6) What Is Token Id & How LLM Going To Use It ? (7) For The Same Word Will I Get Same Token Id ? (7) Different Tokenization Techniques. Word Level Tokenization. Character Level Tokenization. Sub-word Tokenization.(Most Popular) Byte Pair

    Read More

  • GenAI – Recursive Chunking

    GenAI – Recursive Chunking

    GenAI – Recursive Chunking Table Of Contents: What Is Recursive Chunking ? How Recursive Chunking Works ? When To Use Recursive Chunking ? When Not To Use Recursive Chunking ? Advantages Of Recursive Chunking. Disadvantages Of Recursive Chunking. Examples Of Recursive Chunking. (1) What Is Recursive Chunking ? (2) How Is Recursive Chunking Works ? (3) When To Use Recursive Chunking ? (4) When Not To Use Recursive Chunking ? (5) Advantages Of Recursive Chunking. (6) Disadvantages Of Recursive Chunking. (7) Examples Of Recursive Chunking.

    Read More

  • GenAI – Delimiter Based Chunking.

    GenAI – Delimiter Based Chunking.

    GenAI – Delimiter Based Chunking Table Of Contents: What Is Delimiter Based Chunking ? Examples Of Delimiters. When To Use Delimiter Based Chunking ? When Not To Use Delimiter Based Chunking ? Advantages Of Delimiter Based Chunking. Disadvantages Of Delimiter Based Chunking. Examples Of Delimiter Based Chunking. (1) What Is Delimiter Based Chunking ? (2) Examples Of Delimiter. (3) When To Use Delimiter Based Chunking ? (4) When Not To Use Delimiter Based Chunking ? (5) Advantages Of Delimiter Based Chunking . (6) Disadvantages Of Delimiter Based Chunking . (7) Examples Of Delimiter Based Chunking . Example 1: Chunking Markdown

    Read More

  • GenAI – Sentence Based Chunking.

    GenAI – Sentence Based Chunking.

    GenAI – Sentence Based Chunking Table Of Contents: What Is Sentence Based Chunking ? When To Use Sentence Based Chunking ? Advantages Of Sentence Based Chunking. Disadvantages Of Sentence Based Chunking. Examples Of Sentence Based Chunking. (1) What Is Sentence Based Chunking ? (2) When To Use Sentence Based Chunking ? (3) When Not To Use Sentence Based Chunking ? (4) Advantages Of Sentence Based Chunking . (5) Disadvantages Of Sentence Based Chunking . (6) Examples Of Sentence Based Chunking . Example 1: Using Sentence Chunks with Transformers from transformers import GPT2Tokenizer tokenizer = GPT2Tokenizer.from_pretrained("gpt2") sentences = [ "This is

    Read More

  • GenAI – Semantic Based Chunking ?

    GenAI – Semantic Based Chunking ?

    GenAI – Semantic Chunking Table Of Contents: What Is Semantic Chunking ? When To Use Semantic Chunking ? Advantages Of Semantic Chunking. Disadvantages Of Semantic Chunking. Examples Of Semantic Chunking. Techniques Available For Semantic Chunking . (1) What Is Semantic Chunking ? (2) When To Use Semantic Chunking ? (3) How Semantic Chunking Works ? (4) Tools & Libraries for Semantic Chunking (5) Advantages Of Semantic Chunking. (6) Disadvantages Of Semantic Chunking. (6)Techniques Available For Semantic Chunking. (7) Examples Of Semantic Chunking. Example 1: Using LangChain’s RecursiveCharacterTextSplitter This splits the text based on semantic boundaries like paragraphs, sentences, etc. from

    Read More

  • GenAI – Sliding Window Chunking.

    GenAI – Sliding Window Chunking.

    GenAI – Sliding Window Chunking Table Of Contents: What Is Sliding Window Chunking ? When To Use Sliding Window Chunking ? Advantages Of Sliding Window Chunking. Disadvantages Of Sliding Window Chunking. Examples Of Sliding Window Chunking. (1) What Is Sliding Window Chunking ? (2) When To Use Sliding Window Chunking ? (3) Advantages Of Sliding Window Chunking . (4) Disadvantages Of Sliding Window Chunking . (5) When Not To Use Sliding Window Chunking ? (6) Examples Of Sliding Window Chunking .

    Read More

  • GenAI – Fixed Size Chunking .

    GenAI – Fixed Size Chunking .

    GenAI – Fixed Size Chunking Table Of Contents: What Is Fixed Size Chunking ? When To Use Fixed Size Chunking ? Advantages Of Fixed Size Chunking. Disadvantages Of Fixed Size Chunking. Examples Of Fixed Size Chunking. (1) What Is Fixed Size Chunking ? (2) When To Use Fixed Size Chunking ? (3) Advantages Of Fixed Size Chunking. (4) Disadvantages Of Fixed Size Chunking. (5) Examples Of Fixed Size Chunking. Example – 1: Document Embedding for Semantic Search Use case: Splitting a long article into 512-token chunks for embedding in a vector database. from transformers import GPT2Tokenizer tokenizer = GPT2Tokenizer.from_pretrained(“gpt2”) text

    Read More