Find answers from the community

Updated 2 years ago

Can anyone explain what is the id to

At a glance

The post asks about the id_to_text_map and the query_vector in the context of using the PineconeReader from the llama-index library. Community members suggest that it's generally better to skip the reader and let llama-index create the index in Pinecone directly. They also mention a "sub-question query engine" as a way to handle complex queries.

In the comments, a community member was exploring the reader but was stuck on how to ingest data without creating document objects. Another community member clarified that if the vectors were inserted into Pinecone using langchain instead of llama-index, they may not be in the correct format, and document objects would still need to be created. The community members then provided a code snippet demonstrating how to connect to an existing Pinecone vector store and create a vector store index using llama-index.

Useful resources
Can anyone explain what is the id_to_text_map? Also I suppose the query vector is the embedding for the prompt given to index. I'm even thinking this will present a challenge if the query happens to be complex that needs further breaking down before it approaches the index in many cases.

Plain Text
from llama_index import download_loader
import os

PineconeReader = download_loader('PineconeReader')

# the id_to_text_map specifies a mapping from the ID specified in Pinecone to your text.
id_to_text_map = {
    "id1": "text blob 1",
    "id2": "text blob 2",
}
# ...
query_vector=[n1, n2, n3, ...]

reader = PineconeReader(api_key=api_key, environment="us-west1-gcp")
documents = reader.load_data(
    index_name='quickstart',
    id_to_text_map=id_to_text_map,
    top_k=3,
    vector=query_vector,
    separate_documents=True
)


Can anyone shed some light here, will be super useful.
L
H
10 comments
Yea the vectordb readers aren't the most useful.

Normally, you'd skip the reader and let llama index create the index in pinecone + send queries to it to retrieve the top k

If questions are complex, we have a few abstractions for this, the main one being the sub question query engine (which generates sub-questions from an initial complex question)

Video + notebook are here: https://gpt-index.readthedocs.io/en/latest/guides/tutorials/discover_llamaindex.html#subquestionqueryengine-10k-analysis
I was just exploring the reader thinking it can solve my problem. Im still stuck on this.

How do I ingest data without creating the documents to put into index. I already have vectors stored in Pinecone, but now I want to connect. Passing an empty list into doesn’t seem to give me anything.

GPTVectorStoreIndex([], storage_context=storage_context)
Were the vectors on Pinecone inserted using llama-index? If not, they won't be in the correct format, and you'll still need to create document objects
Yes. They were inserted through Llama Index
Oh wait. Actually, I confused langchain and pinecone here. I used Langchain to make that update.
ooo yea that might be an issue then
If anyone comes across this problem, its clearly written in the docs. I missed it.
https://gpt-index.readthedocs.io/en/latest/how_to/index/vector_store_guide.html#connect-to-external-vector-stores-with-existing-embeddings



Plain Text
# Connect the pinecone and create the index for query
import pinecone
from llama_index.vector_stores import PineconeVectorStore
from llama_index.indices.vector_store import VectorStoreIndex

PINECONE_API_KEY = API_KEY
PINECONE_API_ENV = API_ENV

pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_API_ENV)

vector_store = PineconeVectorStore(pinecone.Index("ds-websources"), namespace=appID)

# ... Create the service context
# ...

ds_websources_index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)
Add a reply
Sign up and join the conversation on Discord