import torch from llama_index.llms import HuggingFaceLLM from llama_index.prompts.prompts import SimpleInputPrompt # Define system prompt and query wrapper prompt system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>") # Initialize HuggingFaceLLM llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} )
tokenizer_name
and model_name
with any other model available on HuggingFace.``
Please note that the speed of the response will depend on the complexity of the model and the length of the input and output. Larger models and longer inputs/outputs will take longer to process.
For more information, you can refer to the **LlamaIndex documentation** (<https://gpt-index.readthedocs.io/en/latest/api_reference/llms/huggingface.html>).
----------------------
Note: you can ask me a follow up question by @ mentioning me again` :speech_balloon:model_name
and tokenizer_name
in the HuggingFaceLLM
initialization code.Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon:HuggingFaceLLM
class from the LlamaIndex library. Here's an example of how to set it up:import torch from llama_index.llms import HuggingFaceLLM from llama_index.prompts.prompts import SimpleInputPrompt # Define system prompt and query wrapper prompt system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>") # Initialize HuggingFaceLLM llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} )
# Define a user query query = "What is the capital of France?" # Generate a response response = llm.complete(query) # Print the response print(response)
Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon:from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext from llama_index.llms import HuggingFaceLLM # Load documents documents = SimpleDirectoryReader("../../data/paul_graham").load_data() # Initialize HuggingFaceLLM as shown in the previous examples # ... # Create a service context service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm) # Create an index from the documents index = VectorStoreIndex.from_documents(documents, service_context=service_context)
paul_graham
located in the data
directory at the root of your project. You should replace "../../data/paul_graham"
with the path to your documents.# Create a query engine query_engine = index.as_query_engine() # Define a user query query = "What did the author do growing up?" # Generate a response response = query_engine.query(query) # Print the response print(response)
Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon: