Find answers from the community

Updated 2 years ago

how to create Pinecone Index from a set of pdfs using GPTPineconeIndex

At a glance

The community member asked how to create a Pinecone Index from a set of PDFs using GPTPineconeIndex. Another community member responded that they don't have enough information to provide a confident answer yet, as the provided extracts do not cover the process of extracting text from PDFs. However, the community member provided an example of how to create a Pinecone Index using GPTPineconeIndex with extracted text from documents, including steps to initialize Pinecone, create a Pinecone Index, and create a GPTPineconeIndex using the extracted text and the Pinecone Index. The community member also mentioned that more information on GPTPineconeIndex can be found in the documentation.

Useful resources
@kapa.ai how to create Pinecone Index from a set of pdfs using GPTPineconeIndex
k
2 comments
Please stand by @LawSnap as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:
----------------------
Hmm, I don't know enough to give you a confident answer yet about creating a Pinecone Index from a set of PDFs using GPTPineconeIndex. The provided extracts do not cover the process of extracting text from PDFs. However, once you have extracted the text from the PDFs, you can follow the steps in the extracts to create a Pinecone Index using GPTPineconeIndex.

Here's an example of how to create a Pinecone Index using GPTPineconeIndex with extracted text from documents:

  1. Import necessary libraries:
Plain Text
from gpt_index import GPTPineconeIndex, SimpleDirectoryReader
import pinecone


  1. Initialize Pinecone with your API key and environment:
Plain Text
api_key = "api_key"
pinecone.init(api_key=api_key, environment="us-east1-gcp")


  1. Create a Pinecone Index:
Plain Text
pinecone.create_index("quickstart", dimension=1536, metric="euclidean", pod_type="p1")
index = pinecone.Index("quickstart")


  1. Load the extracted text from your PDFs into a suitable data structure (e.g., a list of dictionaries with keys like 'title' and 'content').
  1. Create a GPTPineconeIndex using the extracted text and the Pinecone Index:
Plain Text
gpt_index = GPTPineconeIndex(documents, pinecone_index=index)


For more information on GPTPineconeIndex, you can refer to the documentation (https://gpt-index.readthedocs.io/en/latest/reference/indices/vector_store.html).
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord