Rubenator

So uh what setting field should we use

So uh, what setting field should we use to let openai know the max completion tokens we want? I just tried setting max_tokens in the llm but, it just... stopped mid-sentence? 🤔 is that normal? or is there another setting?

2 comments

RRubenator

Streaming chat

Got a few questions:
1) is there anything special I need to do to go from as_query_engine call to a as_chat_engine call?
meaning:

Plain Text

query_engine = index.as_query_engine(
        node_postprocessors=[SentenceEmbeddingOptimizer(threshold_cutoff=threshold_cutoff,percentile_cutoff=percentile_cutoff)],
        retriever_mode="embedding",
        service_context=service_context,
        similarity_top_k=similarity_top_k,
        streaming=True,
        text_qa_template=qa_template
    )

if I just change that to .as_chat_engine will all those features work just fine?
2) if I'm setting streaming=True in the above (#1), then why do I need to call .stream_chat instead of .chat? 🤔 shouldn't it already know that?
3) my coworker attempted to use the class directly:

Plain Text

chat_engine = CondenseQuestionChatEngine.from_defaults(
        query_engine=query_engine, 
        condense_question_prompt=custom_prompt,
        streaming=True
    )

but it is unhappy about the return value from .stream_chat not being iterable (meaning it is not a streaming response) so... is that just not the/a proper way to do that?

100 comments

RRubenator

Extra text

Uhm... v0.7.4 seems to be including extra text before the actual response from the LLM 🤔 -- this have anything to do with your num_output changes Logan? 🤔 (Or is this just... OpenAI leaking other ppls data? XD)

84 comments

RRubenator

Bug

Also, as another issue question: I am getting this error and I'm not sure what is causing it, and wasn't getting it before afaik:

Plain Text

replace() should be called on dataclass instances
#--my stuff here, and then my call into llama_index:
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, num_output=700)
File "/opt/python/llama_index/indices/service_context.py", line 140, in from_defaults
prompt_helper = prompt_helper or _get_default_prompt_helper(
File "/opt/python/llama_index/indices/service_context.py", line 44, in _get_default_prompt_helper
llm_metadata = dataclasses.replace(llm_metadata, num_output=num_output)
File "/var/lang/lib/python3.10/dataclasses.py", line 1424, in replace
raise TypeError("replace() should be called on dataclass instances")
TypeError: replace() should be called on dataclass instances

seems like a bug though but, correct me if I'm wrong

50 comments

RRubenator

Changes

Is there somewhere I'm supposed to be able to find out when stuff gets moved around and where it went? :x

2 comments

RRubenator

Another question

Another question...
We're noticing that some of our documents are getting split into multiples in the database...
For example... We've got a post, and it goes on for ~650 words in one entry, stopping 51 words from the end of the post, and then another entry, that contains the last 67 words of the post.
Is there a reason for this? And is there a setting for this?

17 comments

RRubenator

So the documentation on the newly

So... the documentation on the newly renamed MongoDBAtlasVectorSearch is... not terribly inclusive.
Unless I'm missing something... I do not see anything about what settings should/could be used to optimize the index properly... or whether or not it will create an index called default on it's own (because looking at the source code that's what I'm seeing)... so... how am I supposed to use this feature properly? 🤔

18 comments

RRubenator

I m having trouble finding the list of I

I'm having trouble finding the list of (I'm going to call them:) "levers" llama provides for chat_history. Like, how much history is used... which parts of history are used (such as... sentence similarity possibly? 🤷‍♂️)... how long ago and/or how many tokens ago do I start forgetting things... etc. -- just, what functions/features/etc are provided that I can leverage (🤭) to reduce/limit/optimize token usage costs.

26 comments

Find answers from the community

So uh what setting field should we use

Streaming chat

Extra text

Bug

Changes

Another question

So the documentation on the newly

I m having trouble finding the list of I