Find answers from the community

Updated last year

Confluence

I've got the confluence loader successfully pulling documents down, but when i attempt to create a vector store index I get the following error:
Plain Text
ERROR:root:error: 'tuple' object has no attribute 'get_doc_id'


here is the code before that:
Plain Text
c = download_loader('ConfluenceReader')
reader = c(base_url=r["base_url"])
documents = reader.load_data(space_key=r["space_key"], include_attachments=False, page_status="current")
logging.info("downloading documents")
for documents in documents:
    # TODO: fix this with actual values
    logging.info("adding confluence link to document")

logging.info("storing in pinecone")
logging.info("pinecone index name: " + os.environ['PINECONE_INDEX_NAME'])
logging.info("pinecone environment: " + os.environ['PINECONE_ENVIRONMENT'])
pinecone.init(api_key=os.environ['PINECONE_API_KEY'], environment=os.environ['PINECONE_ENVIRONMENT'])
pinecone.Index("astoria").delete(delete_all=True, namespace=workspace_id + "-confluence")

vector_store = PineconeVectorStore(
    index_name=os.environ['PINECONE_INDEX_NAME'],
    environment=os.environ['PINECONE_ENVIRONMENT'],
    namespace=workspace_id + "-confluence",
)


Any ideas what's breaking?
W
H
L
13 comments
Can you check if readers.load_data() is giving you list of document object or something else
The logs return
Plain Text
INFO:root:documents: Doc ID: 1277958
Text: Test data

When i output the contents of the documents variable
It returned a single document or list of documents?
also, where in your code do you actually add data to your index?
It's returning a single instance of the Document class
Plain Text
        storage_context = StorageContext.from_defaults(
            vector_store=vector_store,
        )
        logging.info(type(documents))

        GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)
        logging.info("stored in pinecone")
Its bizarre. The length of the list is 4 documents, but when i create the vector store its only using one instance of Document
yea not sure what to tell you πŸ˜… The only requirement here is from_documents(documents) takes a 1D list of document objects
Something must not be right in the input list
print([type(x) for x in documents])
I can now inform you that I am an idiot
See this block and spot the error:
Plain Text
for documents in documents:
    # TODO: fix this with actual values
    logging.info("adding confluence link to document")
I was resetting the doc var to be one instance of itself every time...
Add a reply
Sign up and join the conversation on Discord