Hello, I am having issues with a very simple example: If I load a single page in a vector store, everything is fine. However if I split this page, like in sections of a wikipedia article and load it in a vector store, now the answers are pretty bad, like if the query engine was using a single section to answer everything. Would you have an idea where my issue is?
# create client and a new collection
client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = client.get_or_create_collection("split_doc_collection")
docs = SimpleDirectoryReader("../data/split_doc_collection", recursive=True).load_data()
# set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Create and dl embeddings instance
embed_model=LangchainEmbedding(
HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
)
model = "gpt-3.5-turbo"
service_context = ServiceContext.from_defaults(
chunk_size=1024,
chunk_overlap=50,
llm=OpenAI(model=model, temperature=0.5, system_prompt=system_prompt),
embed_model=embed_model
)
# And set the service context
set_global_service_context(service_context)
index = VectorStoreIndex.from_documents(
docs, storage_context=storage_context, service_context=service_context
)