Questions

At a glance

The community member is asking about the best way to handle a set of pre-existing questions, rather than generating new questions using SubQuestionQueryEngine. The comments suggest using aquery to run the queries asynchronously in parallel, which is faster than running them sequentially. There is also a discussion about whether to use as_retriever or as_query_engine for the questions. Additionally, the community member raises a separate question about the disk usage of vector databases compared to traditional databases, noting that vector databases tend to take up more space per record.

cchantlong

For SubQuestionQueryEngine, it's great when you need to generate questions, but what if I already have my questions beforehand?
In that case is it better to loop over the questions and just use index._as_query_engine and ask each question? I want it to run in parallel if possible though like SubQuestionQueryEngine. If there are any best practices for that I'd love to know! Thanks.

16 comments

LLogan M

If you have questions already, you can use aquery and run async queries concurrently using asyncio gather

LLogan M

Sequential is too slow

cchantlong

Do you have an example on how to do that?

Is it something like this ?

Plain Text

tasks = [
        async_query(base_retriever.query, query_1),
        async_query(base_retriever.query, query_2),
        async_query(base_retriever.query, query_3),
        async_query(query_engine.query, sub_question_query_4)
    ]

    # Wait for all tasks to complete
    responses = await asyncio.gather(*tasks)

cchantlong

I also don't know if for the questions I should use as_retriever vs as_query_engine

LLogan M

Pretty much

You can use query_engine.aquery(sub_question) for async queries

Depends on how you want to do it. Either get all query engine responses and do one llm call to combine them, or retrieve multiple nodes and filter them down before some final response synthesizer

LLogan M

aretrieve also exists for retrievers

cchantlong

The reason I wanted to separate it out is because I get clearer responses when I break it down it seems. When there's too many questions, I feel that the refine and. compact final synthesis kinds it blurs it too much (gut feeling)

cchantlong

ok I'll try to use aquery! thanks.

cchantlong

very stupid question but

Plain Text

query_engine = SubQuestionQueryEngine.from_defaults(
  query_engine_tools=query_engine_tools,
  service_context=service_context,
  use_async=True
)

response = query_engine.query ask/answer the questions sequentially after generating the subquestions?

vs

response = await query_engine.aquery will automatically ask/answer the questions in parallel after subquestion generation?

cchantlong

I assumed just passing in use_async=True would make it async 😅

LLogan M

use_async=True uses async under the hood to run sub-queries concurrentaly, but the top level function API is still synchronous -- I hope that makes some sense 😅

LLogan M

user_async=True is the default I think, so both are equivilant here -- the only difference is if the top-level is async or not

cchantlong

ok so basically until all async code within subqueries resolve/finish, it won't move on to the next line of code on the top-level. got it thanks

cchantlong

sorry a completely diff question... I'm migrating some postgres data over to vector db. and i basically have 2MB of data when exported csv.

Each row in postgres is basically 1-5 sentences chunked by 512.

I embed it in a 768 dimension embedding in Qdrant and the disk usage on that is around 38 MB.

Is that typical in that Vector DB would just naturally take up way more space per record?

If a chunk is gonna be 768 dimensions regardless, technically stuffing way bigger chunks that are like pages long would be more ideal in disk usage perspective right?

But then I see Llama Index promoting advanced RAGs like Sentence to Window retrieval making me think it's better to chunk by sentence but that would result in even way.... more disk usage?

Is that just the nature of vector dbs?

LLogan M

That's pretty much the nature of vector dbs

There's definitely some vector dbs that do a better job at compression though

cchantlong

i see thanks

Add a reply

Find answers from the community

Questions