LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

Hello, , How to use gpt-4-turbo as llm for QA chatbot based on documents?

Hello, , How to use gpt-4-turbo as llm for QA chatbot based on documents?

At a glance

The community members are discussing how to use GPT-4 as a language model (LLM) for a QA chatbot based on documents. They provide step-by-step instructions, including loading documents, setting up the LLM, creating an index, setting up the query engine, and querying the engine.

However, the community members are encountering an issue where the response from the query engine is getting truncated in the latest version of the llama-index library, whereas it was working fine in the older version without any configuration. They try various approaches, such as using the OpenAI class, increasing the max_tokens, and adjusting the num_outputs in the service context, but the issue persists.

There is no explicitly marked answer in the comments, and the community members are still trying to find a solution to the truncation issue.

·

Hello, , How to use gpt-4-turbo as llm for QA chatbot based on documents?

1

k

o

W

25 comments

Please stand by @openmind as I search the knowledge base. I normally take about 20 seconds or so. :wave:

----------------------
To use GPT-4 as a language model (LLM) for a QA chatbot based on documents, you can follow these steps:

Load your documents: You need to load your documents into a suitable data structure. Here's an example of how to do this:

Plain Text

from llama_index.readers import SimpleDirectoryReader

reader = SimpleDirectoryReader("./data/your_directory/")
documents = reader.load_data()

Source ()

Set up the LLM: You need to set up GPT-4 as your LLM. Here's an example of how to do this:

Plain Text

from llama_index.llms import OpenAI
from llama_index import ServiceContext

llm = OpenAI(temperature=0, model="gpt-4")
service_context = ServiceContext.from_defaults(llm=llm)

Source ()

Create an index from your documents: You need to create an index from your documents using the service context you set up. Here's an example of how to do this:

Plain Text

from llama_index import SummaryIndex

gpt4_index = SummaryIndex.from_documents(documents, service_context=service_context)

Source ()

Set up the query engine: You need to set up the query engine. Here's an example of how to do this:

Plain Text

query_engine = gpt4_index.as_query_engine(response_mode="tree_summarize")

Source ()

Query the engine: Finally, you can query the engine. Here's an example of how to do this:

Plain Text

response = query_engine.query("Your question here")

Source ()

Please replace "./data/your_directory/" and "Your question here" with your actual directory and question.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Hello, @WhiteFang_Jr , In latest version of llama-index, output is cutted.

In old version, it was returning full content without cutting.

Any advice ?

Hey!
Response getting cropped while getting response?
response = query_engine.query("Your question here") ?

Yes

llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name="gpt-4", streaming=True), system_prompt=companyPrompt)

service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
chunk_size=1024
)

Before, it was working fine without cutting in old version, but in latest version, it is cutting response. @WhiteFang_Jr

I think ChatOpenAI is via Langchain right?

Try with LlamaIndex OpenAI class once and see if it resolves your query

Plain Text

from llama_index.llms import OpenAI
from llama_index import ServiceContext

llm = OpenAI(temperature=0, model="gpt-4")
service_context = ServiceContext.from_defaults(llm=llm)

If answer still gets truncated, try increasing the max_token value
llm = OpenAI(temperature=0, model="gpt-4",max_tokens=512)

still same

With increased tokens also?

before, it was outputing much context without configuration of max_tokens...

Same, @WhiteFang_Jr

try talking with the llm directly

response = llm.complete("chat here")

If you still get half baked answer, then its most probably OpenAI is having a day off 😅

lol

Hello, @Logan M , do you have any advice?

So if you leave max_tokens out, it will output as much as it has room for.

However, by default, llamaindex leaves room for 256 tokens minimum.

So sometimes, it may only have room for 256 tokens.

You can set num_outputs=512 in the service context to adjust this

but before it worked well without this kind of configuration, I was just setting chunk size to 1024

@Logan M

It can depend on the data, index, and query settings. For example, if I set the top-k very large, or use a SummaryIndex, I could see this happening

let me try to set num_outputs and check

how to set num_outputs in service context? @Logan M

service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
chunk_size=1024,
num_outputs=512
)

ServiceContext.from_defaults() got an unexpected keyword argument 'num_outputs'

I got this error, @Logan M

whoops, typo, num_output=512

Add a reply

Sign up and join the conversation on Discord