----------------------
Semantic Embedding and Semantic Chunking are two different concepts used in the context of LlamaIndex.
Semantic Embedding refers to the process of converting text into numerical vectors that capture the semantic meaning of the text. This is typically done using machine learning models trained on large text corpora. In the context of LlamaIndex, the
OpenAIEmbedding
class is used for semantic embedding (
source ()).
Semantic Chunking, on the other hand, is a technique for splitting text into chunks in a way that preserves semantic coherence within each chunk. Instead of splitting text into chunks of a fixed size, the Semantic Chunker in LlamaIndex adaptively picks the breakpoint in-between sentences using embedding similarity. This ensures that a "chunk" contains sentences that are semantically related to each other (
source ()).
Here is an example of how to use the Semantic Chunker in LlamaIndex:
from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding()
splitter = SemanticSplitterNodeParser(
buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model
)
(
source ())
In this code,
SemanticSplitterNodeParser
is the class that implements semantic chunking, and it uses
OpenAIEmbedding
for semantic embedding to determine where to split the text.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------