HABBYMAN

If I have multiple indexes to query

If I have multiple indexes to query against is advisable to compose a graph or merge them into one index?

6 comments

Credentials

Hey guys - i'm looking to leverage the GoogleDriveReader. The application i'm building is multi-tenanted, and I have a service that indexes each organisations drive folder that they have selected. I've just noticed that this loader requires a credentials.json to index. I already have the users access_token, is there a way to pass this directly without the need to create the credentials file every time? I figure if two people make a request at the same time this file is just going to be overwritten?

5 comments

HHABBYMAN

Does anyone have any ideas around best

Does anyone have any ideas around best practices to ingest documents with Slack?

I want to retain as much metadata as possible for individual messages, but I want the document retrieval to maintain context of the entire conversation. I've tried this two ways:

Each message is its own document, can sucessfully store permalink, timestamp, user in the metadata, but conversational context is lost.

Store multiple messages in documents, cannot link to individual messages, user, timestamp metadata isn't stored due to multiple messages in the doc.

Any ideas how to solve this problem?

3 comments

HHABBYMAN

Pinecone

Looks like it can't find an id for the metadata?

4 comments

HHABBYMAN

Hi all I’m attempting to build a tool

Hi all, I’m attempting to build a tool that allows users to upload various documents to an S3 bucket, and then an API and front end that can allow a user to query those documents after they have been stored and processed.

My understanding of AI / LlamaIndex is limited, I’m coming from a backend Golang discipline and trying to learn the ropes. My proposed architecture is this:

API Upload

Upload documents to a backend server - forward on to upload in S3.

Upload Processing

An AWS S3 event triggers a python script (example below) to process the documents and store the nodes + indexes. This point is where my lack of knowledge comes in, how can I make this storage process happen in a way that users do not need to re-index these documents, and to speed everything up?

Process Completed Notifications

Alert users that their documents are now queryable

Front-end

Query documents

Firstly, is my understanding of the LlamaIndex project accurate?
Secondly, is my application of these technologies correct?

1 comment

HHABBYMAN

Hi all, I have a question about the

Hi all, I have a question about the SlackReader. Currently, it ingests multiple messages per document. Is there a built in method to have one document per message? I need to add metadata to each document, such as perma link, author etc.

2 comments

HHABBYMAN

Confluence

I've got the confluence loader successfully pulling documents down, but when i attempt to create a vector store index I get the following error:

Plain Text

ERROR:root:error: 'tuple' object has no attribute 'get_doc_id'

here is the code before that:

Plain Text

c = download_loader('ConfluenceReader')
reader = c(base_url=r["base_url"])
documents = reader.load_data(space_key=r["space_key"], include_attachments=False, page_status="current")
logging.info("downloading documents")
for documents in documents:
    # TODO: fix this with actual values
    logging.info("adding confluence link to document")

logging.info("storing in pinecone")
logging.info("pinecone index name: " + os.environ['PINECONE_INDEX_NAME'])
logging.info("pinecone environment: " + os.environ['PINECONE_ENVIRONMENT'])
pinecone.init(api_key=os.environ['PINECONE_API_KEY'], environment=os.environ['PINECONE_ENVIRONMENT'])
pinecone.Index("astoria").delete(delete_all=True, namespace=workspace_id + "-confluence")

vector_store = PineconeVectorStore(
    index_name=os.environ['PINECONE_INDEX_NAME'],
    environment=os.environ['PINECONE_ENVIRONMENT'],
    namespace=workspace_id + "-confluence",
)

Any ideas what's breaking?

13 comments

HHABBYMAN

👋 I'm trying to get the confluence

👋 I'm trying to get the confluence loader to work with a an OAuth 2.0 (3LO) confluence app.

I've set up the application with the correct callback URL and scopes, but when I use the access token with the loader, I get the following error:

Plain Text

{"message":"Current user not permitted to use Confluence","statusCode":403}

I'm setting up the loader as follows:

Plain Text

        token = {
            "access_token": result["access_token"],
            "token_type": "Bearer",
        }
        oauth2_dict = {
            "client_id": os.environ['CONFLUENCE_CLIENT_ID'],
            "token": token,
        }

        logging.info("oauth2 dict: " + str(oauth2_dict))
        c = download_loader('ConfluenceReader')
        reader = c(base_url=r["base_url"], oauth2=oauth2_dict)

Does anyone have any pointers as to why this doesn't work?

6 comments

HHABBYMAN

Top_K

When I build my index, i'm adding a URL to the document to the extra info.

When a response is returned, it returns multiple source_nodes, one of which is the node i need to pull the link from. Is there a way I can select this node only? Or have it only return this as the source node?

10 comments

HHABBYMAN

Is anyone using the google drive reader

Is anyone using the google drive reader in an API?

My workflow is:

User auths with google and I store their access_token and refresh token
User sends some folder_ids over to the API
I refresh credentials using the refresh token if necessary
GoogleDriveReader attempts to download and index the documents

My problem with this process is that the google drive reader launches it's own callback process, waiting for OAuth callbacks... This means my API is launching web pages rather than just processing the data.

4 comments

HHABBYMAN

Would love some insight from the

Would love some insight from the community here:

I've got a use case in which a user can send multiple chat messages to a chatbot - its end goal is to solve their problem, and if it can't it will raise an issue in my platform.
What's the best approach to indicate the start and end of new conversations with the bot? The conversations will live in slack and therefore there is no defined start and end to them. Are there any tools within LlamaIndex that I can leverage that I'm unaware of?

2 comments

HHABBYMAN

this is useful thank you but it doesn t

this is useful, thank you, but it doesn't quite solve what I'm trying to do. I have a Vector Index saved in Redis, but when I'm trying to figure out how to load that vector index in another function call - when i connect to redis in the storage context, its saying there are no indexes, yet I can see them in redis

8 comments

HHABBYMAN

I m querying 2 documents that are

I'm querying 2 documents that are indexed and stored in mongo db. When I use a ListIndex, I get a solid response, when I use a Vector Index over the same docs, i get nothing in the response. Why does this happen?

1 comment

HHABBYMAN

hey guys struggling to get the

hey guys, struggling to get the MongoDocument/IndexStores working as expected. Here's some code if anyone has any tips?: 🧵

6 comments

Find answers from the community

If I have multiple indexes to query

Credentials

Does anyone have any ideas around best

Pinecone

Hi all I’m attempting to build a tool

Hi all, I have a question about the

Confluence

👋 I'm trying to get the confluence

Top_K

Is anyone using the google drive reader

Would love some insight from the

this is useful thank you but it doesn t

I m querying 2 documents that are

hey guys struggling to get the