Find answers from the community

Home
Members
Rubenator
R
Rubenator
Offline, last seen 3 months ago
Joined September 25, 2024
So uh, what setting field should we use to let openai know the max completion tokens we want? I just tried setting max_tokens in the llm but, it just... stopped mid-sentence? 🤔 is that normal? or is there another setting?
2 comments
R
L
Got a few questions:
1) is there anything special I need to do to go from as_query_engine call to a as_chat_engine call?
meaning:
Plain Text
query_engine = index.as_query_engine(
        node_postprocessors=[SentenceEmbeddingOptimizer(threshold_cutoff=threshold_cutoff,percentile_cutoff=percentile_cutoff)],
        retriever_mode="embedding",
        service_context=service_context,
        similarity_top_k=similarity_top_k,
        streaming=True,
        text_qa_template=qa_template
    )
if I just change that to .as_chat_engine will all those features work just fine?
2) if I'm setting streaming=True in the above (#1), then why do I need to call .stream_chat instead of .chat? 🤔 shouldn't it already know that?
3) my coworker attempted to use the class directly:
Plain Text
chat_engine = CondenseQuestionChatEngine.from_defaults(
        query_engine=query_engine, 
        condense_question_prompt=custom_prompt,
        streaming=True
    )
but it is unhappy about the return value from .stream_chat not being iterable (meaning it is not a streaming response) so... is that just not the/a proper way to do that?
100 comments
R
L
Uhm... v0.7.4 seems to be including extra text before the actual response from the LLM 🤔 -- this have anything to do with your num_output changes Logan? 🤔 (Or is this just... OpenAI leaking other ppls data? XD)
84 comments
L
R
R
Rubenator
·

Bug

Also, as another issue question: I am getting this error and I'm not sure what is causing it, and wasn't getting it before afaik:
Plain Text
replace() should be called on dataclass instances
#--my stuff here, and then my call into llama_index:
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, num_output=700)
File "/opt/python/llama_index/indices/service_context.py", line 140, in from_defaults
prompt_helper = prompt_helper or _get_default_prompt_helper(
File "/opt/python/llama_index/indices/service_context.py", line 44, in _get_default_prompt_helper
llm_metadata = dataclasses.replace(llm_metadata, num_output=num_output)
File "/var/lang/lib/python3.10/dataclasses.py", line 1424, in replace
raise TypeError("replace() should be called on dataclass instances")
TypeError: replace() should be called on dataclass instances
seems like a bug though but, correct me if I'm wrong
50 comments
R
L
Is there somewhere I'm supposed to be able to find out when stuff gets moved around and where it went? :x
2 comments
R
L
Another question...
We're noticing that some of our documents are getting split into multiples in the database...
For example... We've got a post, and it goes on for ~650 words in one entry, stopping 51 words from the end of the post, and then another entry, that contains the last 67 words of the post.
Is there a reason for this? And is there a setting for this?
17 comments
R
L
So... the documentation on the newly renamed MongoDBAtlasVectorSearch is... not terribly inclusive.
Unless I'm missing something... I do not see anything about what settings should/could be used to optimize the index properly... or whether or not it will create an index called default on it's own (because looking at the source code that's what I'm seeing)... so... how am I supposed to use this feature properly? 🤔
18 comments
L
R
I'm having trouble finding the list of (I'm going to call them:) "levers" llama provides for chat_history. Like, how much history is used... which parts of history are used (such as... sentence similarity possibly? 🤷‍♂️)... how long ago and/or how many tokens ago do I start forgetting things... etc. -- just, what functions/features/etc are provided that I can leverage (🤭) to reduce/limit/optimize token usage costs.
26 comments
L
R