Find answers from the community

Updated 3 months ago

Questions

For SubQuestionQueryEngine, it's great when you need to generate questions, but what if I already have my questions beforehand?
In that case is it better to loop over the questions and just use index._as_query_engine and ask each question? I want it to run in parallel if possible though like SubQuestionQueryEngine. If there are any best practices for that I'd love to know! Thanks.
L
c
16 comments
If you have questions already, you can use aquery and run async queries concurrently using asyncio gather
Sequential is too slow
Do you have an example on how to do that?

Is it something like this ?

Plain Text
tasks = [
        async_query(base_retriever.query, query_1),
        async_query(base_retriever.query, query_2),
        async_query(base_retriever.query, query_3),
        async_query(query_engine.query, sub_question_query_4)
    ]

    # Wait for all tasks to complete
    responses = await asyncio.gather(*tasks)
I also don't know if for the questions I should use as_retriever vs as_query_engine
Pretty much

You can use query_engine.aquery(sub_question) for async queries

Depends on how you want to do it. Either get all query engine responses and do one llm call to combine them, or retrieve multiple nodes and filter them down before some final response synthesizer
aretrieve also exists for retrievers
The reason I wanted to separate it out is because I get clearer responses when I break it down it seems. When there's too many questions, I feel that the refine and. compact final synthesis kinds it blurs it too much (gut feeling)
ok I'll try to use aquery! thanks.
very stupid question but

Plain Text
query_engine = SubQuestionQueryEngine.from_defaults(
  query_engine_tools=query_engine_tools,
  service_context=service_context,
  use_async=True
)


response = query_engine.query ask/answer the questions sequentially after generating the subquestions?

vs

response = await query_engine.aquery will automatically ask/answer the questions in parallel after subquestion generation?
I assumed just passing in use_async=True would make it async πŸ˜…
use_async=True uses async under the hood to run sub-queries concurrentaly, but the top level function API is still synchronous -- I hope that makes some sense πŸ˜…
user_async=True is the default I think, so both are equivilant here -- the only difference is if the top-level is async or not
ok so basically until all async code within subqueries resolve/finish, it won't move on to the next line of code on the top-level. got it thanks
sorry a completely diff question... I'm migrating some postgres data over to vector db. and i basically have 2MB of data when exported csv.
  • Each row in postgres is basically 1-5 sentences chunked by 512.
I embed it in a 768 dimension embedding in Qdrant and the disk usage on that is around 38 MB.

Is that typical in that Vector DB would just naturally take up way more space per record?

If a chunk is gonna be 768 dimensions regardless, technically stuffing way bigger chunks that are like pages long would be more ideal in disk usage perspective right?

But then I see Llama Index promoting advanced RAGs like Sentence to Window retrieval making me think it's better to chunk by sentence but that would result in even way.... more disk usage?

Is that just the nature of vector dbs?
That's pretty much the nature of vector dbs

There's definitely some vector dbs that do a better job at compression though
Add a reply
Sign up and join the conversation on Discord