Please update the docs on ServiceContext

I am new to LlamaIndex. The customization tutorial still uses ServiceContext. It would help to newbies if that section can be updated too.

ah that it does. Great call out (it technically still works, but yea, should be updated)

Thank you.

@Logan M Do you have an ETA on the tutorial documentation updates?

I have not had a chance lol its on my todo

Anything specific I can give an example of right now?

Maybe you can just add the revised statements for the first customization tutorial (i.e., smaller chunks example)? If I can have the before and after examples, I should be able to follow. thanks

Already updated the docs (just working on fixing a bug with docs generation)

Here's the example for that section

Plain Text

# Global settings
from llama_index.core import Settings

Settings.chunk_size = 512

# Local settings
from llama_index.core.node_parser import SentenceSplitter

index = VectorStoreIndex.from_documents(
    documents, transformations=[SentenceSplitter(chunk_size=512)]
)

Nothing against openai, but I wish llamaindex didn't assume that everyone is using openai as default. Almost all the examples and documents assume we're using openai with api keys, etc. and there are sprinkles of mentions of how to do things with local and public LLM's

@Logan M , thank you. I understand the needs for "transformations=[SentenceSplitter(chunk_size=512)". But what does "Settings.chunk_size = 512" do? How do we know if it is used in the SentenceSplitter() or something else?

It's modifying the default (which is sentence splitter)

This is the same as the previous service context 👍

Right. It is the same as the previous service context. But I thought the reason for the change is to make it less ambiguous? 🙂

Maybe a section on how to use other LLMs would be useful....

Fair enough.

The simplest version I can think of

Download and start ollama (be sure to do ollama pull to download the model you want to run after the server starts)
https://ollama.com/

I might do ollama pull starling-lm for this example

Plain Text

pip install llama-index llama-index-llms-ollama llama-index-embeddings-huggingface

Plain Text

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbeddings

Settings.llm = Ollama(model="starling-lm", request_timeout=3000.0)
Settings.embed_model = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents =SimpeDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

chat_engine = index.as_chat_engine(chat_mode="condense_plus_context")
response = chat_engine.chat("Hello!")
print(response)

Very good. Thanks @Logan M . Even I can understand it well. 🙂

👍

Thank you @Logan M

@Logan M just a small typo:
should be:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
not
from llama_index.embeddings.huggingface import HuggingFaceEmbeddings

yes! my bad haha

Just curious... not sure why but I keep running into an error that something is trying to reach out and gets a connection refusal when I call chat_engine.chat
...site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ConnectError: [Errno 61] Connection refused

any idea why/what is reaching out?

ah... never mind... it's ollama... it's trying to make an http call to ollama server

Yup! 😁

@Logan M How do I match LLM model vs embedding model? In the above example, you have starling-lm as the LLM and BAAI/bge-small-en-v1.5 as the embedding model. Thank you.

there is no relation between an embedding model and an LLM. You can use any combination you want 🙂

@Logan M These pre-trained LLM models have been trained with certain embedding models. When I use LlamaIndex to perform RAG applications, don't I need to use the same embedding that a model was pre-trained with?