I don't want my LLM to see any of the metadata. I am aware that we can use excluded_llm_metadata_keys for this purpose. Instead of naming all metadata keys separately like this document.excluded_llm_metadata_keys = ["metadata key1", "metadata key2", ...., "metadata keyN"], is it possible to use something so that LLM excludes all metadata keys? I have different file types in my use case and they have different loaders and hence, different metadata keys. I want to see if I can use a one-liner to exclude metadata keys from LLM instead of using conditions for each file type
1) Updated SimpleNodeParser to SentenceSplitter. Is that okay? 2) Reader/Tool not found: SimpleCSVReader - what do I need to do to fix this? 3) Ran without installing these libraries and got errors so I guess I need to install them separately?
Hi, I have a working Q&A RAG pipeline which can answer questions based on documents in a folder. What if I want to extend it to answer questions like "How many documents are there in this folder?". Can I do this using llamaindex? Is there a tutorial for it?
I am using local models for a Q&A RAG pipeline. I am trying to use multiprocessing to get the best use of my resources. I am able to make it work but I have to load models for each process separately but that slows things down. I tried using multiprocessing queue to share the service_context among processes but got this error: cannot pickle 'builtins.CoreBPE' object
Can I just do a retrieve on a postgres vector store based on metadata filters? I am trying to store these results somewhere else so that I don't have to filter at the query time? Does that make sense?
Where does download_loader save the loader by default? I thought under readers/llamahub_modules/ but for a recent instance, I downloaded s3Loader and don't see it in that directory?
I am using a postgres vector DB to store my vector store. Can I interact with my vector_store using another structured table in my postgres DB? This is what I want to do: I want to test if joining my structured table with the vector_store table to filter records works faster than metadata filtering
# Trying huggingface Llama-s chat instead to see if inference speeds up! DEFAULT_TEXT_QA_PROMPT_TMPL = ( "Context information is below.\n" "---------------------\n" "{context_str}\n" "---------------------\n" "Given the context information and not prior knowledge, " "answer the query.\n" "Query: {query_str}\n" "Answer: " ) query_wrapper_prompt = PromptTemplate( DEFAULT_TEXT_QA_PROMPT_TMPL, prompt_type=PromptType.QUESTION_ANSWER )
llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.1, "do_sample": False}, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="meta-llama/Llama-2-7b-chat-hf", model_name="meta-llama/Llama-2-7b-chat-hf", device_map="auto", tokenizer_kwargs={"max_length": 512}, # uncomment this if using CUDA to reduce memory usage model_kwargs={"torch_dtype": torch.float16} )
I am building a RAG pipeline over a set of the documents. For now, I am only allowing pdf, docx and txt files. I am using the SimpleDirectoryReader to load the files. By default, pdfs have file name and page label as metadata. docx have just file name. txts have no metadata. I want all 3 file types to have consistent metadata.
After some research, I realized it's not easy/possible to get page labels for .txt and .docx files. I still want docx and txts to have file names in metadata. pdfs can have default file name and page label in metadata. What's the best way to achieve it without having to make changes in readers/file/base.py or readers/file/docs_reader.py?
I am building a hybrid Q&A RAG pipeline (using semantic and keyword search) over a set of documents. Currently, it takes too long to answer a question. I want to store StorageContext in advance to improve processing time. Is that a good practice? What are some things I need to keep in mind for this purpose? Some other questions I have:
1) I understand that StorageContext has 4 components: index_store, vector_store, graph_store, and docstore. For my use case, there's no graph_store. Where can I store the remaining 3 stores? Is it a best practice to store all of them in a vector database?
2) I am using SimpleKeywordTableIndex for keyword search. Where can I store this index if I want to do it in advance? Can this also be stored in a vector database?
I would really appreciate if you can point me to a documentation around this use case. Thanks!
@Logan M Question on hybrid (embedding + keyword) based retrieval vs document summary index based retrieval:
I built a Q&A system using hybrid retrieval. Now my next task is to get summary of the same documents over which I built this Q&A system on. I will not be using this summary of retrieval. My task is to simply present users with summary of the documents. 1) Is this something I can do with LlamaIndex? 2) If yes, I want to do it with minimal code changes. I am currently using Ingestion pipeline to persist things into docstore and vector_store. Is there a way for me to include document summaries in the same pipeline?
Let me know if you need anything additional. Thanks!
I have built a Q&A RAG application using llamaindex that answers questions based on documents present in a folder. Currently, it always answers questions even if the question is not related to any of the documents present in the folder. In most of these cases, it will make up an answer/hallucinate. Any tips to control this behavior? For example, a user simply typed hello and the app returned some random answer.
@Logan M, I am using postgres for my vector store. I think the code creates a table for storing the embeddings. Is it possible to disable that? I want to create the table in advance and not let the code create the table
I am using llama-2 locally for a RAG pipeline using llama-cpp-python. I don't want to use the default system_prompt. How do I change it? I tried using the system_prompt argument in LlamaCPP() but it didn't work:
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", device = device, cache_folder=models_dir) n_gpu_layers = 0 if device == "cpu" else -1 llm = LlamaCPP( model_url=None, model_path=f'{models_dir}/llama-2-7b-chat.Q4_K_M.gguf', temperature=0.1, max_new_tokens=256, # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room context_window=3900, generate_kwargs={}, model_kwargs={"n_gpu_layers": n_gpu_layers, "offload_kqv": True}, # transform inputs into Llama2 format messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, verbose=False, system_prompt = "" ) service_context = ServiceContext.from_defaults( llm=llm, embed_model=embed_model ) set_global_service_context(service_context)
Is anyone aware of an SLM good enough for a Q&A RAG usecase? I was using llama-2 7B (4-bit quantized) so far but it still consumes a lot of computing resources. The goal is to find something that can work completely on CPU. I have recently discovered TinyLlama/TinyLlama-1.1B-Chat-v1.0 but would like to know my other options. Thank you!
from llama_index.node_parser import SimpleNodeParser from llama_index.ingestion import IngestionPipeline from llama_index.vector_stores import PGVectorStore from llama_index.embeddings import HuggingFaceEmbedding from llama_index import SimpleDirectoryReader
loader = SimpleDirectoryReader( input_files = [""], # Any file that can be broken into at least 2 nodes filename_as_id=True, )
from llama_index import VectorStoreIndex from llama_index.node_parser import SimpleNodeParser from llama_index.ingestion import IngestionPipeline, IngestionCache from llama_index.ingestion.cache import RedisCache from llama_index.vector_stores import PGVectorStore
Is this example from the documentation complete? I tried this with my postgres db and local redis but I don't see the data in my vector DB:
from llama_index import Document from llama_index.embeddings import OpenAIEmbedding from llama_index.text_splitter import SentenceSplitter from llama_index.extractors import TitleExtractor from llama_index.ingestion import IngestionPipeline, IngestionCache from llama_index.ingestion.cache import RedisCache
I tried but couldn't get it work. Most likely, because I need to use 'spawn' method for GPU and llama_index might be setting it to 'fork' somewhere in the code? Getting this error: RuntimeError: context has already been set