Find answers from the community

Updated 2 years ago

Hi everyone Im trying to store a slack

At a glance
Hi everyone, Im trying to store a slack conversation history for my slack workspace and then use gpt-index to query the data. Im not sure whats the best way to model slack chat data. Lets say I were to use mongodb to store the chat data, how can I store it? Do I concatenate slack messages and store the entire history for a single channel in a single document, or do I store every message separately. Any pointer is much appreciated
1
j
m
T
12 comments
I think concatenating and storing it as sequential text makes sense for now
i will have more updates on this by tomorrow morning! stay tuned
Thatโ€™d be great! Thank you. Also, how can I handle metadata around messages, for ex. user, timestamp, channel etc..
๐Ÿ‘€ Interested as well by this usecase ๐Ÿ™‚
I'm doing something similar with Discord. Ran into some weirdness because mentions use the user's unique ID, not their friendly name. I had the bot responding to mentions but it's become a bit schizo so I tabled it.
here's an initial beta notebook on how to use gpt index as memory / a tool for a conversational agent: https://github.com/jerryjliu/gpt_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb
haven't fully announced it though!
Hey @jerryjliu0 this is awesome thank you! How would this work in a production environment do you think? Im trying to feed it all conversation history from all public channels in a workspace. Intuitively, the way Im currently going about it is I store the entire conversation history for a single channel in one document in mongo db. At index creation time, would I load all documents and feed them to the index, and if so would the index fit in memory?
And then an extension of that question, it also could be a dumb question, once I create that index, do I have to ever create it again? Like can I just store it somewhere and then update it once new messages are sent in the slack history again?
yeah @metahash it depends whether you're using an in-memory index (e.g. GPTSimpleVectorIndex, GPTListIndex) or an index backed by a backend db (e.g. GPTPineconeIndex).

For the former, you can serialize the index into a json file to disk and load it again (so you don't have to rebuild).
For the latter, the index storage is already backed by a db like Pinecone / Weaviate, so the next time you build the index you could feed in blank indices like the following: https://discord.com/channels/1059199217496772688/1059201661417037995/1067531191680503890

I have a TODO to add better serialization capabilities this week (e.g. serializing to mongodb etc.)
Got it. Maybe another dumb question but if I save the index do I still need the underlying data or can I discard it?
the index should contain the underlying data
Add a reply
Sign up and join the conversation on Discord