It's doable but I'm not sure how much help gpt-index provides there.. Basically you just want all the nodes of the simple index and put those (vector, extra data) tuples to another db
So basically it would be just easier to construct a new index
@Mikko How much different will my index be if I use vector database instead of simple vector index? Right now I am creating a simple vector index with chunk_size_limit=512
, is this something I can do with vector db as well?
And when I have my index which is more efficient (I will always use this one index). Creating the index once when the project deploys or creating it each time an user sends a message. When it's number one then I don't understand how storing the index to vector db helps me to save memory as isn't this the same as I am doing now using json file?
Simple vector index and the vector databases have the same features! Just the storage and retrieval methods differ.
Databases will optimize their memory usage, with disk usage
And they may offer faster vector retrieval (sometimes approximate)
@Mikko Thank you, does that mean that I should create one global index when I deploy the project and use it whenever queries are made?
There's use cases for that and use cases for smaller indices
a bigger single index keeps things simple for you, initially
@Mikko Thank you so much!
To give more context, this is how I try to do it now after creating the first pinecone index with GPTPineconeIndex:
INDEX = pinecone.Index("quickstart")
LLM_PREDICTOR = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=NUM_OUTPUT))
TOOLS = [
Tool(
name = "GPT Index",
func=lambda q: str(INDEX.query(q, llm_predictor=LLM_PREDICTOR, text_qa_template=QA_PROMPT, similarity_top_k=5, response_mode="compact")),
description="useful for when you need to answer questions about weddings or marriage.",
return_direct=True
),
]
And I am getting this error pinecone.core.client.exceptions.ApiValueError: Unable to prepare type LLMPredictor for serialization
i don't think there's a direct way to export simple vector indices as picone indices. you can get the embeddings using index.docstore.docs.embeddings_dict and see if you can put them in other indices. might need a feature request since i' m sure many ppl want this
instead of wasting $$ recalculating embeddings
@davidds2020
I have already recalculated the embeddings using GPTPineconeIndex now. The thing I am trying to figure out now is how do I use llama-index to query the database now that it has the embeddings in it? From the docs and examples all I can see is how to create a new index with new data like this:
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTPineconeIndex(documents, pinecone_index=index)
But now that I have all the data in my database how do I create an index from pinecone database without adding documents?
I have a working application with the simple vector index, as suggested for it to be more efficient I should use vector databases to save my embeddings to instead of json file. Which I did (correctly hopefully). What I don't understand is how do I use it now with llama-index.
Have not used Pinecone but I think you just create the index with empty document list
Oh wow, that worked. Thanks again, you're the best @Mikko