# LLM (gpt-3.5-turbo) chatgpt = OpenAI(temperature=0, model="gpt-3.5-turbo") splitter = SentenceSplitter(chunk_size=1024) # default mode of building the index response_synthesizer = get_response_synthesizer( response_mode="tree_summarize", use_async=True ) doc_summary_index = DocumentSummaryIndex.from_documents( city_docs, llm=chatgpt, transformations=[splitter], response_synthesizer=response_synthesizer, show_progress=True, )
``
(Source: **GitHub - jerryjliu/llama_index** (<https://github.com/jerryjliu/llama_index/blob/main/docs/examples/chat_engine/chat_engine_best.ipynb>))
Remember to replace the
input_dir with the directory containing your Python and Linux codes, and replace the question in the
chat` method with your specific question.OpenAI
class provided by LlamaIndex. This class is a wrapper around the OpenAI API and can be used to generate responses to queries directly without the need for an index.from llama_index.llms.openai import OpenAI # Initialize the OpenAI class with the model of your choice llm = OpenAI(model="gpt-3.5-turbo") # Ask a query response = llm.complete("Write a Python function to calculate the factorial of a number.") # Print the response print(response.text)
"Write a Python function to calculate the factorial of a number."
with your specific query.OpenAI
class:import os os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"
"INSERT OPENAI KEY"
with your actual OpenAI API key.Ollama
class provided by LlamaIndex. This class allows you to run a local model such as Llama2 locally. from llama_index.llms.ollama import Ollama from llama_index.core import Settings # Set the local model Settings.llm = Ollama(model="llama2", request_timeout=60.0) # Now you can use the local model to generate responses response = Settings.llm.complete("Write a Python function to calculate the factorial of a number.") # Print the response print(response.text)
"Write a Python function to calculate the factorial of a number."
with your specific query.LlamaCPP
for code generation. LlamaCPP
is a wrapper around the llama-cpp-python
library, which allows you to use the Llama2 model locally. from llama_index.llms.llama_cpp import LlamaCPP # Initialize the LlamaCPP class llm = LlamaCPP( model_url=None, # You can pass in the URL to a GGML model to download it automatically model_path=None, # Optionally, you can set the path to a pre-downloaded model instead of model_url temperature=0.1, max_new_tokens=256, context_window=3900, # Llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room generate_kwargs={}, # kwargs to pass to __call__() model_kwargs={"n_gpu_layers": 1}, # kwargs to pass to __init__(), set to at least 1 to use GPU verbose=True, ) # Ask a query response = llm.complete("Write a Python function to calculate the factorial of a number.") # Print the response print(response.text)
"Write a Python function to calculate the factorial of a number."
with your specific query.llama-cpp-python
library installed for this to work. You can install it using pip:!pip install llama-cpp-python
complete
method of the LlamaCPP
class is indeed used for text completion, not specifically for code generation. However, you can use it to generate code by providing a prompt that asks for code. The LLM will then generate a completion for the prompt that could be code if the prompt asks for it.response = llm.complete("Write a Python function to calculate the factorial of a number.") print(response.text)
similarity_top_k
parameter that is set when creating the query engine. This parameter determines the number of top similar results that the engine will return for a given query.similarity_top_k
. However, please note that increasing this value will also increase the computational cost and may slow down the response time.similarity_top_k
value:query_engine = base_index.as_query_engine(similarity_top_k=10) response = query_engine.query("Your query here") print(response)
"Your query here"
with your specific query.similarity_top_k
parameter determines the number of top similar results that the engine will return for a given query, but it does not support excluding previously returned results.n
nodes that will be returned after re-ranking. LLMRerank
class with a custom top_n
value:from llama_index.core.postprocessor import LLMRerank from llama_index.llms.gemini import Gemini reranker = LLMRerank( top_n=5, # number of nodes after reranking llm=Gemini(api_key=GOOGLE_API_KEY), )
top_n
parameter is set to 5, which means that the re-ranker will return the top 5 nodes after re-ranking. You can adjust this value to get more or fewer results.top_n
value remains the same as the original number of nodes, the re-ranker will only re-rank the nodes and not change the number of nodes. Otherwise, it will re-rank the nodes and return the top n
nodes.stream_chat
method of the ChatEngine
class. This method returns a StreamingResponse
object that allows you to control the length of the response by consuming the response tokens one by one.chat_engine = index.as_chat_engine() streaming_response = chat_engine.stream_chat("Tell me a joke.") for token in streaming_response.response_gen: print(token, end="")
stream_chat
method is used to start a chat and get a StreamingResponse
object. The response_gen
attribute of this object is a generator that yields the response tokens one by one. By controlling the number of tokens you consume from this generator, you can control the length of the response.similarity_top_k
, to change the number of results that are considered.from llama_index.core.postprocessor import LLMRerank from llama_index.llms.gemini import Gemini reranker = LLMRerank( top_n=5, # number of nodes after reranking llm=Gemini(api_key=GOOGLE_API_KEY), )
top_n
parameter is set to 5, which means that the reranker will return the top 5 nodes after re-ranking. You can adjust this value to get more or fewer results.from llama_index.core import VectorStoreIndex, SimpleDirectoryReader documents = SimpleDirectoryReader("data").load_data() index = VectorStoreIndex.from_documents(documents)
"data"
with the path to your data.