I'm trying to use
SummaryIndex
via a TGIS server (and not run the LLM locally) but llamaindex seems like it's ignoring the TGIS predictor. Maybe I'm using this wrong?
service_context = ServiceContext.from_defaults(chunk_size=512,
llm=tgis_predictor,
context_window=2048,
prompt_helper=prompt_helper,
embed_model=embed_model)
# Load data
documents = SimpleDirectoryReader('private-data').load_data()
index = SummaryIndex.from_documents(documents)
summary = index.as_query_engine(response_mode="tree_summarize").query("Summarize the text, describing what it might be most useful for")
but then it tries to download an HF model:
Downloading url https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin to path /tmp/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.bin
total size (MB): 7323.31
And ultimately blows up my machine trying to use this model via CPU