Find answers from the community

Updated 2 months ago

Hello everyone

Hello everyone,

I strugle with combining 2 data loaders into index. How to merge competitor_index with index to be able to query both at the same time? competitor_index uses bs4 data connector, index uses youtube data connector

Plain Text
from llama_index import (
    LLMPredictor,
    PromptHelper,
    ServiceContext,
    GPTSimpleVectorIndex,
    download_loader
)
from langchain.chat_models import ChatOpenAI
import os

os.environ["OPENAI_API_KEY"] = 'xxx'

max_input_size = 4096
num_output = 512
max_chunk_overlap = 200
temperature = 0

# define prompt helper
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

# define LLM
llm_predictor = LLMPredictor(
    llm=ChatOpenAI(temperature=temperature, model_name="gpt-3.5-turbo", max_tokens=num_output))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
BeautifulSoupWebReader = download_loader("BeautifulSoupWebReader")
competitor_loader = BeautifulSoupWebReader()
competitor_documents = competitor_loader.load_data(
    urls=['https://url1.com', 'https://url2.com', 'https://url3.com'])
competitor_index = GPTSimpleVectorIndex.from_documents(competitor_documents, service_context=service_context)

YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")
loader = YoutubeTranscriptReader()
documents = loader.load_data(ytlinks=['https://www.youtube.com/watch?v=xxx',
                                      'https://www.youtube.com/watch?v=xxx',
                                      'https://www.youtube.com/watch?v=xxx'])
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

combined_competitor_index_and_index = ???
p
m
7 comments
Hi! Maybe what you want is to first append all your data into one list and then create a single index?
or perhaps this?

index = GPTSimpleVectorIndex([])
for doc in documents:
index.insert(doc)
for doc in competitor_documents:
index.insert(doc)
yes, something like this. Now sure If you can call insert function directly on GPTSimpleVectorIndex tho.
another approach might be using a graph. It is designed to be "an Index on top of Indices". So you can create separate indices (like you are doing now) and then combine them inside a graph index
whats the advantages over current solution?
In graph, you can keep your indices separated and use summary on top of each index and have a description on top graph itself, to use it only when you need to query across the indices and query separate index instead. I think this is a better structure, than having all your data in one index?
good point! thanks
Add a reply
Sign up and join the conversation on Discord