Find answers from the community

Updated 5 months ago

jeremy analytics 8398 ravitheja 0475

At a glance

Thanks for your inputs! 95% of my documents are powerpoints, so i was planning on chunking slide by slide and generating an embedding per slide. is that the same concept as using sentence transformers?

my main question though is what should the GPT-Index index structure be? because of the vast amount of data, would I need to go in a mult-level tree direction? would this hinder performance?

7 comments

ffailfast

we can continue discussing here so as not to overcrowd the main channel if you'd like

rravitheja

ohh sorry

rravitheja

I replied there itself

ffailfast

no problem, thanks for the reply

rravitheja

but may be you could simply start with simplevectorindex and see how are the results.

ffailfast

yeah i think that's a good plan. i'll start simple and go more robust until it is satisfactory

JJeremy

i think ANN is pretty good for these vector stores. you should try the "naive" approach and then refine as necessary. simple is always easier to maintain 🙂

Add a reply