Please stand by @openmind as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To use GPT-4 as a language model (LLM) for a QA chatbot based on documents, you can follow these steps:
- Load your documents: You need to load your documents into a suitable data structure. Here's an example of how to do this:
from llama_index.readers import SimpleDirectoryReader
reader = SimpleDirectoryReader("./data/your_directory/")
documents = reader.load_data()
Source ()
- Set up the LLM: You need to set up GPT-4 as your LLM. Here's an example of how to do this:
from llama_index.llms import OpenAI
from llama_index import ServiceContext
llm = OpenAI(temperature=0, model="gpt-4")
service_context = ServiceContext.from_defaults(llm=llm)
Source ()
- Create an index from your documents: You need to create an index from your documents using the service context you set up. Here's an example of how to do this:
from llama_index import SummaryIndex
gpt4_index = SummaryIndex.from_documents(documents, service_context=service_context)
Source ()
- Set up the query engine: You need to set up the query engine. Here's an example of how to do this:
query_engine = gpt4_index.as_query_engine(response_mode="tree_summarize")
Source ()
- Query the engine: Finally, you can query the engine. Here's an example of how to do this:
response = query_engine.query("Your question here")
Source ()
Please replace
"./data/your_directory/"
and
"Your question here"
with your actual directory and question.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Hello, @WhiteFang_Jr , In latest version of llama-index, output is cutted.
In old version, it was returning full content without cutting.
Any advice ?
Hey!
Response getting cropped while getting response?
response = query_engine.query("Your question here")
?
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name="gpt-4", streaming=True), system_prompt=companyPrompt)
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
chunk_size=1024
)
Before, it was working fine without cutting in old version, but in latest version, it is cutting response. @WhiteFang_Jr
I think ChatOpenAI is via Langchain right?
Try with LlamaIndex OpenAI class once and see if it resolves your query
from llama_index.llms import OpenAI
from llama_index import ServiceContext
llm = OpenAI(temperature=0, model="gpt-4")
service_context = ServiceContext.from_defaults(llm=llm)
If answer still gets truncated, try increasing the max_token value
llm = OpenAI(temperature=0, model="gpt-4",max_tokens=512)
With increased tokens also?
before, it was outputing much context without configuration of max_tokens...
try talking with the llm directly
response = llm.complete("chat here")
If you still get half baked answer, then its most probably OpenAI is having a day off π
Hello, @Logan M , do you have any advice?
So if you leave max_tokens
out, it will output as much as it has room for.
However, by default, llamaindex leaves room for 256 tokens minimum.
So sometimes, it may only have room for 256 tokens.
You can set num_outputs=512
in the service context to adjust this
but before it worked well without this kind of configuration, I was just setting chunk size to 1024
It can depend on the data, index, and query settings. For example, if I set the top-k very large, or use a SummaryIndex, I could see this happening
let me try to set num_outputs and check
how to set num_outputs in service context? @Logan M
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
chunk_size=1024,
num_outputs=512
)
ServiceContext.from_defaults() got an unexpected keyword argument 'num_outputs'
I got this error, @Logan M
whoops, typo, num_output=512