donvito

i tried to set this

Plain Text

service_context = ServiceContext.from_defaults(llm='local', chunk_size_limit=3000)

but I am still getting this error using llama2-13B, the default one

Plain Text

File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/llama_cpp/llama.py", line 900, in _create_completion
    raise ValueError(
ValueError: Requested tokens (3993) exceed context window of 3900

Any ideas what I am doing wrongly?

3 comments

ddonvito

hi I have a use case wherein my company

hi, I have a use case wherein my company wants to have questions answered by exact text we feed in to the LLM. Is this even possible? how can it be done? it is a document chat/query use case.

3 comments

ddonvito

Hi we are trying to summarize very long

Hi, we are trying to summarize very long text. Use case is we will extract the entire chat conversation between our customer and our customer service agent then get a summary so it'll be easier to handoff to another cs agent. We tried pure OpenAI calls but we are hitting the token limit even if we use gpt-3.5-16k. I was thinking we can use llama_index for this use case. Have you guys tried this before? Any patterns which we can use?

Would really appreciate to point me to the right direction. TIA!

26 comments

ddonvito

i am getting this answer when I am using

i am getting this answer when I am using llm=ChatOpenAI . Even I indexed my entire data set, it seems it is not added in the context. Any ideas how I can let it answer more accurately?

Answer:The context provided is about ... Therefore, the original answer remains the same.

6 comments

ddonvito

Hi does GPTSimpleVectorIndex support

Hi, does GPTSimpleVectorIndex support changing of the LLM predictor? I checked my usage and it is still falling back to text-davinci-03. here's the gist of the code.

# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=512))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

index = GPTSimpleVectorIndex.from_documents(
    documents, service_context=service_context
)

4 comments

ddonvito

Costs

hi, is there a way to limit the OpenAI API tokens generated in llamaindex? just wanted to control cost since I am exploring using my own funds. 😄

2 comments

Find answers from the community

i tried to set this

hi I have a use case wherein my company

Hi we are trying to summarize very long

i am getting this answer when I am using

Hi does GPTSimpleVectorIndex support

Costs