Hi, How can I set chunk_size and overlap size when usin...

At a glance

The community member asked how to set the chunk_size and overlap_size when using GPTPineconeIndex, and how to store the built index for future use. Another community member provided an example code snippet to set the chunk_size and overlap_size using a custom PromptHelper and ServiceContext. However, they noted that Pinecone indexes are stored on the Pinecone server, so there is no need to save or load the index locally. Instead, the community member can reconnect to the existing index using GPTPineconeIndex([], pinecone_index=pinecone_index).

Useful resources

JJW

@kapa.ai Hi, How can I set chunk_size and overlap size when using GPTPineconeIndex and how can I store the built index for future use?

9 comments

kkapa.ai

Please stand by @JW as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:

kkapa.ai

----------------------
To set the chunk_size and overlap_size when using GPTPineconeIndex, you need to create a custom PromptHelper and pass it to a ServiceContext. Here's an example based on the extracted information:

Plain Text

from gpt_index import GPTPineconeIndex, SimpleDirectoryReader, LLMPredictor, PromptHelper, ServiceContext
from langchain import OpenAI

# Load documents
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()

# Define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))

# Define PromptHelper with custom chunk_size and overlap_size
chunk_size = 256
overlap_size = 20
prompt_helper = PromptHelper(chunk_size, num_output=chunk_size, max_chunk_overlap=overlap_size)

# Create ServiceContext with custom PromptHelper
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

# Build the GPTPineconeIndex
index = GPTPineconeIndex.from_documents(documents, pinecone_index=index, service_context=service_context)

To save the built index for future use, you can use the save_to_disk and load_from_disk methods. Make sure to pass the same ServiceContext when loading the index:

Plain Text

# Save the index to disk
index.save_to_disk('index.json')

# Load the index from disk
index = GPTPineconeIndex.load_from_disk('index.json', service_context=service_context)

Keep in mind that the code snippets provided are based on the information extracted from the sources. You may need to adapt them to your specific use case and import the necessary libraries.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

LLogan M

@JW btw you don't save/load pinecone indexes. Everything is saved on the pinecone server, so you only need to connect to it again.

To connect to an existing index index = GPTPineconeIndex([], pinecone_index=pinecone_index)

LLogan M

Also just making sure you saw the example notebook: https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/PineconeIndexDemo.ipynb

JJW

I see. Thanks

JJW

But I need to save GPTPineconeIndex created in this step

Plain Text

index = GPTPineconeIndex(documents, pinecone_index=pinecone_index, metadata_filters=metadata_filters)

, right ?

LLogan M

I dont think so, all the documents get inserted into pinecone

LLogan M

Which lives on its own server somewhere

JJW

Ok, thanks.

Add a reply

Find answers from the community

Hi, How can I set chunk_size and overlap size when using `GPTPineconeIndex` and how can I