Find answers from the community

Updated 3 months ago

Hi everyone, I'm new to llamaindex. I'm

Hi everyone, I'm new to llamaindex. I'm trying to run the IngestionPipeline code here https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/. The only thing I changed is the OpenAIEmbedding model to the HuggingfaceEmbedding model. However, the index instantiation failed (empty index). I check the nodes in the vector store vector_store.get_nodes() and the nodes' embeddings are all None. What could be the issue? I tried searching for this issue in github and google but no luck so far, any help's appreciated!
L
J
6 comments
Whats the actual code that you ended up running?
This is the code

Plain Text
from llama_index.core import Document
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.extractors import TitleExtractor
from llama_index.core.ingestion import IngestionPipeline
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama

import qdrant_client

model_name = 'BAAI/bge-base-en-v1.5'
embed_model = HuggingFaceEmbedding(
    model_name=model_name, trust_remote_code=True)

documents = SimpleDirectoryReader("data").load_data()

Settings.embed_model = embed_model
Settings.llm = Ollama(model="llama3.1", request_timeout=360.0)

client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="test_store")

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=25, chunk_overlap=0),
        # TitleExtractor(),
        embed_model,
    ],
    vector_store=vector_store,
)

# Ingest directly into a vector db
pipeline.run(documents=documents)

nodes = vector_store.get_nodes()

# Create your index
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(vector_store)
Seems to work fine for me
https://colab.research.google.com/drive/1eftieUGM2jiL7If13HtGoaBCOze5D14w?usp=sharing

The index is definitely not empty, and retrieval retrieves nodes just fine
Thank @Logan M. Is there a way I can retrieve all the document embeddings from the index?
I think you'd have to use the underlying qdrant client
Thank you! I'll look into it!
Add a reply
Sign up and join the conversation on Discord