I am looking for a text extraction use case. Essentially there is a large corpus and i am doing semantic chunking and storing it in a vector db. I want to retrieve from the index top_k matches, re-rank and organize the insights into a document by topic. Of course i can raw dog all of this with LLMs, but i am trying to figure if any of the llamaindex abstractions might be useful beyond chunking and indexing. Fore reference, top_k here might be of the order of ~1000. Of course there is a summarization element to it, but the idea is not to dump everything into the context window once and do some generation.
I am looking to see if llamaindex is a good fit for my usecase - Essentially, instead of a traditional RAG, i want to traverse through all the chunks in my corpus and extract any information they may mention about a specific topic. i want to append this summary to a document and then further create summary of summaries or perhaps do topic extraction etc.
I am just starting my journey with llamaindex. Can you folks recommend a vector store to go with it? I started with weaviate and am running into some issues... not sure if the support level varies across different alternatives
Hopefully this is a quick question but could not find much info in the docs right away. I want to use a search engine api like serpapi or a serperapi and use those results as context for answerign questions. ideally, it would be good if the links can be crawled through headless chrome as well and pumped into the context window. Is there anything close to this out of the box within llamaindex? Latency is not an issue as i can batch my requests.