Find answers from the community

Home
Members
kristian
k
kristian
Offline, last seen 2 days ago
Joined September 25, 2024
I'm using the LlamaCloudIndex with similarity_top_k=3, I noticed that when no match is found the source_nodes contain one chunk only and that chunk is the whole document. Is that expected behaviour? I'm surprised as I thought there would always be source nodes as we're looking at top similarities, i.e. it would return the most similar nodes even though they might not be similar at all.
5 comments
k
W
L
Hello! Is there a way to log the actual prompt that was sent to the LLM? I.e. including chunks from the vector store etc?
2 comments
k
W
Hello! I'm using llamaparse to parse a big PDF file, I do have the page numbers added to every page. The resulting file is now a txt/md file. If I load this file with the SimpleDirectoryReader I lose the information of which page number the text is coming from (it's obviously not part of the meta data anymore as the file does not have pages like a pdf anymore), if I'm lucky the page number is in the source node but that's not reliable enough.
How should I handle this?
4 comments
L
k
Hello! I am playing around with llama-index and pinecone as a vector db. It seems like there's no control over the id when ingesting data to pinecone:

Plain Text
llama_doc = Document(id_="f1",text="My text")
index = VectorStoreIndex.from_documents([llama_doc], storage_context=storage_context)


Pinecone will have a different internal id. The reason why I'm asking is that there seems to be no way of deleting docs on pinecone with metadata filtering - you have to use the ID. And I don't seem to be able to get the pinecone assigned ID after ingestion either.
18 comments
L
k