Find answers from the community

Updated 3 months ago

I try to un a ingestionPipeline but it

I try to un a ingestionPipeline but it seems I miss something as the data does not show up in the vector store

pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=100, chunk_overlap=10),
#TitleExtractor(), #braucht ein LLM
],
vector_store=vector_store,
)

pipeline.run(documents=docs)
W
o
13 comments
Only thing missing here is embedding model, have defined it with Settings?
Are you getting any error?
I added the mebedding to the vector_store and was thinking thats it
no error just at the very end it tells me the server had closed connection, but when I set a breakpoint just before exit there is no error
oh youre right im wrong the embedding model is really missing πŸ˜„
Ok I added the mebedding model, the process now takes much more time but still no collection in the store at the end. when I create an index with the code the index seems to be empty too by looking at it. something is odd
reader = SimpleDirectoryReader(input_dir=subdir,
recursive=True,
)

# here we set the file_path to become no part of the embedding, it's not for this use case
# also we check if a doc has zero content then we don't try to embedd it as it would result in an error
docs = []
for doc in reader.iter_data():
if len(doc) > 1:
print('ok')
doc[0].excluded_llm_metadata_keys.append("file_path")
doc[0].excluded_embed_metadata_keys.append("file_path")
if doc[0].text != '':
docs = docs + [doc[0]]


pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=100, chunk_overlap=10),
#TitleExtractor(), #braucht ein LLM
HuggingFaceEmbedding(model_name=embedding_models[model]['path'])
],
vector_store=vector_store,
)

pipeline.run(documents=docs)

index = VectorStoreIndex.from_vector_store(
vector_store=vector_store, embed_model=embed_model, show_progress=True
)
when I the look whats inside index I dont find any data, which makes me think I still miss a point πŸ™‚
Did you take a look at the vector store?
yes, the collection does not show, it should create a new collection
ok now the collection gets created. I did this which in my world should not change the result at all but it does

result = pipeline.run(documents=docs)
i think it was even more bad... the qdrant dashboard does not a realtime search... you have to reload the page to get the full list of collections. so its all my fault beginning at missing to add the embedding model to the pipeline
No issue, always a learning moment πŸ’ͺ
Add a reply
Sign up and join the conversation on Discord