Hello, I came across the term "Act Order (desc_act)" here https://huggingface.co/TheBloke/vicuna-7B-v1.3-GPTQ in context of quantisation. I couldn't find any info on what that is. Does anybody happen to know what it means?
Hello, I am having trouble porting my code to async. I have a chat engine initialized with streaming=True for which I now call aquery, this still returns StreamingResponse, which has the attribute response_gen: TokenGen, which is a synchronous generator. I noticed that in types.py there is also a TokenAsyncGen defined but I don't see a way that I can get that by using chat engine. Am I missing something in the library API, or is async streaming of the tokens not implemented yet and I have to use a thread to do this asynchronously?