Find answers from the community

Updated 2 years ago

Hi, How can I set chunk_size and overlap size when using `GPTPineconeIndex` and how can I

At a glance

The community member asked how to set the chunk_size and overlap_size when using GPTPineconeIndex, and how to store the built index for future use. Another community member provided an example code snippet to set the chunk_size and overlap_size using a custom PromptHelper and ServiceContext. However, they noted that Pinecone indexes are stored on the Pinecone server, so there is no need to save or load the index locally. Instead, the community member can reconnect to the existing index using GPTPineconeIndex([], pinecone_index=pinecone_index).

Useful resources
@kapa.ai Hi, How can I set chunk_size and overlap size when using GPTPineconeIndex and how can I store the built index for future use?
k
L
J
9 comments
Please stand by @JW as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:
----------------------
To set the chunk_size and overlap_size when using GPTPineconeIndex, you need to create a custom PromptHelper and pass it to a ServiceContext. Here's an example based on the extracted information:

Plain Text
from gpt_index import GPTPineconeIndex, SimpleDirectoryReader, LLMPredictor, PromptHelper, ServiceContext
from langchain import OpenAI

# Load documents
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()

# Define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))

# Define PromptHelper with custom chunk_size and overlap_size
chunk_size = 256
overlap_size = 20
prompt_helper = PromptHelper(chunk_size, num_output=chunk_size, max_chunk_overlap=overlap_size)

# Create ServiceContext with custom PromptHelper
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

# Build the GPTPineconeIndex
index = GPTPineconeIndex.from_documents(documents, pinecone_index=index, service_context=service_context)


To save the built index for future use, you can use the save_to_disk and load_from_disk methods. Make sure to pass the same ServiceContext when loading the index:

Plain Text
# Save the index to disk
index.save_to_disk('index.json')

# Load the index from disk
index = GPTPineconeIndex.load_from_disk('index.json', service_context=service_context)


Keep in mind that the code snippets provided are based on the information extracted from the sources. You may need to adapt them to your specific use case and import the necessary libraries.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@JW btw you don't save/load pinecone indexes. Everything is saved on the pinecone server, so you only need to connect to it again.

To connect to an existing index index = GPTPineconeIndex([], pinecone_index=pinecone_index)
I see. Thanks
But I need to save GPTPineconeIndex created in this step
Plain Text
index = GPTPineconeIndex(documents, pinecone_index=pinecone_index, metadata_filters=metadata_filters)

, right ?
I dont think so, all the documents get inserted into pinecone
Which lives on its own server somewhere
Ok, thanks.
Add a reply
Sign up and join the conversation on Discord