Find answers from the community

Updated 4 months ago

I have a text file, I want embeddings for each line. which documane tloader and splitter

At a glance
I have a text file, I want embeddings for each line. which documane tloader and splitter should I use?
k
F
3 comments
Please stand by @FairlyAverage as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To load a text file and generate embeddings for each line using LlamaIndex, you can use the SimpleDirectoryReader to load the text file and SentenceSplitter to split the text into lines.

Here is an example of how you can do it:

Plain Text
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding

# Load the text file
documents = SimpleDirectoryReader(input_files=["your_text_file.txt"]).load_data()

# Define the sentence splitter
splitter = SentenceSplitter(chunk_size=512)

# Define the embedding model
embed_model = OpenAIEmbedding()

# Generate embeddings for each line
for document in documents:
    nodes = splitter.get_nodes_from_document(document)
    for node in nodes:
        node_embedding = embed_model.get_text_embedding(node.get_content(metadata_mode="all"))
        node.embedding = node_embedding


Please replace "your_text_file.txt" with the path to your text file. Also, make sure to set your OpenAI API key before running the code.

Plain Text
import os
os.environ["OPENAI_API_KEY"] = "your_openai_api_key"


Replace "your_openai_api_key" with your actual OpenAI API key.

Please note that the SentenceSplitter splits the text into chunks of a specified size (512 in this case), not necessarily by lines. If you want to split the text strictly by lines, you might need to implement a custom node parser.

Sources:
  • Semantic Chunking ()
  • Relative Score Fusion and Distribution-Based Score Fusion ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
and if I have a target sentence to search for, how would I query and see the similarity?
Add a reply
Sign up and join the conversation on Discord