Runonthespot

Async

Does anyone know how to run async indexing within fastapi (I get a complaint that you can’t use async up in an existing event loop)

16 comments

RRunonthespot

I have a strange issue - trying to get a

I have a strange issue - trying to get a llama-index working with Azure. I've managed to get embedding working fine, and when I run the query, I get the nodes from the query fine, but at the point that I get a streaming response, I get an error from llms\openai.py (which is already odd, as I thought it should be using AzureOpenai.py?)

if (delta.role == MessageRole.ASSISTANT) and (delta.content is None):
AttributeError: 'dict' object has no attribute 'role'

I'm using
llm = AzureOpenAI ( ...)
embed_model = AzureOpenAIEmbeddings(... )

service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)
...
response = query_engine.query(query)

for text in response.response_gen:
....

I get back a response fine and response.source_nodes returns some nodes, but I can't seem to iterate the reponse_gen.

21 comments

RRunonthespot

Hey folks we re doing an update on Llama

Hey folks, we're doing an update on Llama_Index trying to get it on to latest/greatest, and are having some trouble. Thanks to the fast pace here, we know we're pre the big set of changes (I get all that), but we definitely used load_from_string which appears to have disappeared. As we're passing stuff back and forth and trying to avoid persisting stuff to disk unnecessarily, load_from_string was great, but I don't know how to replicate it on the new setup. maybe you can help.

BTW, we've now had 3400 corporate users use our app, with 600+ DAU, 25% Stickiness 🔥

63 comments

RRunonthespot

Qa and summarize

Hey, I'm trying to figure out the right combination of index types for our chatbot usecase (we have already got streaming chat client + basic QA with LlamaIndex working beautifully via GPTSimpleVectorIndex). I need to support two use cases basically:
1) Translate -> I need to iterate through a long doc and translate the whole thing without summarising - is this possible?
2) Summarise -> I want to figure out what index / combination of indexes I need to achieve this. I'm not clear if I need to just use GPTSimpleVectorIndex with response_mode = 'tree_summary' or whether I need to create both a GPTListIndex & a GPTSimpleVectorIndex to achieve this? (Note this is for a single document for now).
Any points welcome! I can see I can do all these things but it's not clear what combination is optimal for a workflow where I index (upload) and then query ideally in a sort of summary mode or a Q&A mode.

6 comments

RRunonthespot

I m using load index from storage and

I'm using load_index_from_storage and index.storage_context.persist(...) to dynamically load and save the indexes as they're created within my API. I already do my own preprocessing steps on the document and have created my own set of StorageService class that handles read/write file, folder etc so I can easily switch from localstorage (local dev) to e.g. S3 or something else - but not sure how this bit can work with that dependency injection approach - @Logan M any pointers 😄

4 comments

RRunonthespot

Parsing pdfs

Hi folks,
I'm using SimpleDiretoryReader + pdfs to chunk up files into some size, querying with GPTSimpleVectorIndex. One issue I have is that it seems quite arbitrary where the chunking happens - and can create some very unpredictable results. If the split happens to be in the middle of a paragraph, the embedding quality drops and doesn't give the right answer. Adding top_k=2 (or more) doesn't help as it's sort of broken the paragraph.

I was wondering if there are any recommended ways of splitting PDFs into more logical chunks (pages, paragraphs) or at least introducing a much bigger overlap between chunks. I've not been able to do this with max_chunk_overlap so far, and am considering writing my own pdf->json parser instead, but would love to see if anyone else has encountered this?

47 comments

RRunonthespot

Prompt lengths

Hi @jerryjliu0 we've had a few instances of "The model's maximum context length is 4097 tokens (3842 in your prompt, 256 for the completion) please reduce your prompt or completion length.

This is with simple index JSON, and straightforward index.query. It doesn't appear to be possible for us to control it (we feed in the document and a 1 line question), is there something going wrong with the math that calculates the token budget? We are looking into the code ourselves to see if we can identify the issue but would be grateful if you know anything that could help here or have an idea what might be going wrong? I assume it's something in the question + refinement cycle that isn't counting properly.

5 comments

RRunonthespot

hi jerryjliu98 9313 we re trying to get

hi @jerryjliu0 we're trying to get gpt-index working from behind an API gateway - we noticed that it made a call to openaipublic.blob.core.windows.net:443 -> any idea what that is? Something about how openai API is implemented? (this is when indexing)

4 comments

RRunonthespot

Do you reckon there s some hacky way I

Do you reckon there's some hacky way I can get at the prompt generated for now? I have a big demo for our CTO and I'm super keen to use this, but it looks starkly different to the rest of the app which is now streaming Gpt 3.5 turbo and flies. If I could retrieve the prompt, I could do the rest.

16 comments

Find answers from the community

Async

I have a strange issue - trying to get a

Hey folks we re doing an update on Llama

Qa and summarize

I m using load index from storage and

Parsing pdfs

Prompt lengths

hi jerryjliu98 9313 we re trying to get

Do you reckon there s some hacky way I