Hi everyone Im trying to store a slack

At a glance

Hi everyone, Im trying to store a slack conversation history for my slack workspace and then use gpt-index to query the data. Im not sure whats the best way to model slack chat data. Lets say I were to use mongodb to store the chat data, how can I store it? Do I concatenate slack messages and store the entire history for a single channel in a single document, or do I store every message separately. Any pointer is much appreciated

12 comments

jjerryjliu0

I think concatenating and storing it as sequential text makes sense for now

jjerryjliu0

i will have more updates on this by tomorrow morning! stay tuned

mmetahash

That’d be great! Thank you. Also, how can I handle metadata around messages, for ex. user, timestamp, channel etc..

TThibs

👀 Interested as well by this usecase 🙂

BBCM [wade.digital]

I'm doing something similar with Discord. Ran into some weirdness because mentions use the user's unique ID, not their friendly name. I had the bot responding to mentions but it's become a bit schizo so I tabled it.

jjerryjliu0

here's an initial beta notebook on how to use gpt index as memory / a tool for a conversational agent: https://github.com/jerryjliu/gpt_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb

jjerryjliu0

haven't fully announced it though!

mmetahash

Hey @jerryjliu0 this is awesome thank you! How would this work in a production environment do you think? Im trying to feed it all conversation history from all public channels in a workspace. Intuitively, the way Im currently going about it is I store the entire conversation history for a single channel in one document in mongo db. At index creation time, would I load all documents and feed them to the index, and if so would the index fit in memory?

mmetahash

And then an extension of that question, it also could be a dumb question, once I create that index, do I have to ever create it again? Like can I just store it somewhere and then update it once new messages are sent in the slack history again?

jjerryjliu0

yeah @metahash it depends whether you're using an in-memory index (e.g. GPTSimpleVectorIndex, GPTListIndex) or an index backed by a backend db (e.g. GPTPineconeIndex).

For the former, you can serialize the index into a json file to disk and load it again (so you don't have to rebuild).
For the latter, the index storage is already backed by a db like Pinecone / Weaviate, so the next time you build the index you could feed in blank indices like the following: https://discord.com/channels/1059199217496772688/1059201661417037995/1067531191680503890

I have a TODO to add better serialization capabilities this week (e.g. serializing to mongodb etc.)

mmetahash

Got it. Maybe another dumb question but if I save the index do I still need the underlying data or can I discard it?

jjerryjliu0

the index should contain the underlying data

Add a reply

Find answers from the community

Hi everyone Im trying to store a slack