Confluence

At a glance

I've got the confluence loader successfully pulling documents down, but when i attempt to create a vector store index I get the following error:

Plain Text

ERROR:root:error: 'tuple' object has no attribute 'get_doc_id'

here is the code before that:

Plain Text

c = download_loader('ConfluenceReader')
reader = c(base_url=r["base_url"])
documents = reader.load_data(space_key=r["space_key"], include_attachments=False, page_status="current")
logging.info("downloading documents")
for documents in documents:
    # TODO: fix this with actual values
    logging.info("adding confluence link to document")

logging.info("storing in pinecone")
logging.info("pinecone index name: " + os.environ['PINECONE_INDEX_NAME'])
logging.info("pinecone environment: " + os.environ['PINECONE_ENVIRONMENT'])
pinecone.init(api_key=os.environ['PINECONE_API_KEY'], environment=os.environ['PINECONE_ENVIRONMENT'])
pinecone.Index("astoria").delete(delete_all=True, namespace=workspace_id + "-confluence")

vector_store = PineconeVectorStore(
    index_name=os.environ['PINECONE_INDEX_NAME'],
    environment=os.environ['PINECONE_ENVIRONMENT'],
    namespace=workspace_id + "-confluence",
)

Any ideas what's breaking?

13 comments

WWhiteFang_Jr

Can you check if readers.load_data() is giving you list of document object or something else

HHABBYMAN

The logs return

Plain Text

INFO:root:documents: Doc ID: 1277958
Text: Test data

When i output the contents of the documents variable

WWhiteFang_Jr

It returned a single document or list of documents?

LLogan M

also, where in your code do you actually add data to your index?

HHABBYMAN

It's returning a single instance of the Document class

HHABBYMAN

Plain Text

        storage_context = StorageContext.from_defaults(
            vector_store=vector_store,
        )
        logging.info(type(documents))

        GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)
        logging.info("stored in pinecone")

HHABBYMAN

Its bizarre. The length of the list is 4 documents, but when i create the vector store its only using one instance of Document

LLogan M

yea not sure what to tell you 😅 The only requirement here is from_documents(documents) takes a 1D list of document objects

LLogan M

Something must not be right in the input list

LLogan M

print([type(x) for x in documents])

HHABBYMAN

I can now inform you that I am an idiot

HHABBYMAN

See this block and spot the error:

Plain Text

for documents in documents:
    # TODO: fix this with actual values
    logging.info("adding confluence link to document")

HHABBYMAN

I was resetting the doc var to be one instance of itself every time...

Add a reply

Find answers from the community

Confluence