Anurag Agrawal

AAnurag Agrawal

·

@Logan M

7 comments

A

L

AAnurag Agrawal

·

I don't want my LLM to see any of the

I don't want my LLM to see any of the metadata. I am aware that we can use excluded_llm_metadata_keys for this purpose. Instead of naming all metadata keys separately like this document.excluded_llm_metadata_keys = ["metadata key1", "metadata key2", ...., "metadata keyN"], is it possible to use something so that LLM excludes all metadata keys? I have different file types in my use case and they have different loaders and hence, different metadata keys. I want to see if I can use a one-liner to exclude metadata keys from LLM instead of using conditions for each file type

3 comments

a

A

L

AAnurag Agrawal

·

1) Updated SimpleNodeParser to

1) Updated SimpleNodeParser to SentenceSplitter. Is that okay?
2) Reader/Tool not found: SimpleCSVReader - what do I need to do to fix this?
3) Ran without installing these libraries and got errors so I guess I need to install them separately?

8 comments

k

d

A

L

AAnurag Agrawal

·

Hi, I have a working Q&A RAG pipeline

Hi, I have a working Q&A RAG pipeline which can answer questions based on documents in a folder. What if I want to extend it to answer questions like "How many documents are there in this folder?". Can I do this using llamaindex? Is there a tutorial for it?

3 comments

A

L

AAnurag Agrawal

·

I am using local models for a Q&A RAG

I am using local models for a Q&A RAG pipeline. I am trying to use multiprocessing to get the best use of my resources. I am able to make it work but I have to load models for each process separately but that slows things down. I tried using multiprocessing queue to share the service_context among processes but got this error:
cannot pickle 'builtins.CoreBPE' object

Any advice?

5 comments

L

A

AAnurag Agrawal

·

Can I just do a retrieve on a postgres

Can I just do a retrieve on a postgres vector store based on metadata filters? I am trying to store these results somewhere else so that I don't have to filter at the query time? Does that make sense?

4 comments

L

A

AAnurag Agrawal

·

Where does download_loader save the

Where does download_loader save the loader by default? I thought under readers/llamahub_modules/ but for a recent instance, I downloaded s3Loader and don't see it in that directory?

2 comments

A

W

AAnurag Agrawal

·

I am using a postgres vector DB to store

I am using a postgres vector DB to store my vector store. Can I interact with my vector_store using another structured table in my postgres DB? This is what I want to do: I want to test if joining my structured table with the vector_store table to filter records works faster than metadata filtering

5 comments

A

L

AAnurag Agrawal

·

Hi, I am using llama-2 chat model from

Hi, I am using llama-2 chat model from huggingface as my LLM but getting an error that {context_str} is not identified...Here's the code:

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", device = device, cache_folder="local_models")

# Trying huggingface Llama-s chat instead to see if inference speeds up!
DEFAULT_TEXT_QA_PROMPT_TMPL = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information and not prior knowledge, "
"answer the query.\n"
"Query: {query_str}\n"
"Answer: "
)
query_wrapper_prompt = PromptTemplate(
DEFAULT_TEXT_QA_PROMPT_TMPL, prompt_type=PromptType.QUESTION_ANSWER
)

llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=256,
generate_kwargs={"temperature": 0.1, "do_sample": False},
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
model_name="meta-llama/Llama-2-7b-chat-hf",
device_map="auto",
tokenizer_kwargs={"max_length": 512},
# uncomment this if using CUDA to reduce memory usage
model_kwargs={"torch_dtype": torch.float16}
)

service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model
)
set_global_service_context(service_context)

3 comments

L

A

AAnurag Agrawal

·

This is what I am currently doing:

This is what I am currently doing:

pg_vector_store = PGVectorStore.from_params(
database=db_name,
host=url.host,
password=url.password,
port=url.port,
user=url.username,
table_name="nodes",
embed_dim=384, # bge-small-v1.5 embedding dimension
hybrid_search=True,
text_search_config="english",
)

7 comments

A

L

AAnurag Agrawal

·

RAG

Hi,

I am building a RAG pipeline over a set of the documents. For now, I am only allowing pdf, docx and txt files. I am using the SimpleDirectoryReader to load the files. By default, pdfs have file name and page label as metadata. docx have just file name. txts have no metadata. I want all 3 file types to have consistent metadata.

After some research, I realized it's not easy/possible to get page labels for .txt and .docx files. I still want docx and txts to have file names in metadata. pdfs can have default file name and page label in metadata. What's the best way to achieve it without having to make changes in readers/file/base.py or readers/file/docs_reader.py?

Thanks!

2 comments

A

W

AAnurag Agrawal

·

Hi All,

Hi All,

I am building a hybrid Q&A RAG pipeline (using semantic and keyword search) over a set of documents. Currently, it takes too long to answer a question. I want to store StorageContext in advance to improve processing time. Is that a good practice? What are some things I need to keep in mind for this purpose? Some other questions I have:

1) I understand that StorageContext has 4 components: index_store, vector_store, graph_store, and docstore. For my use case, there's no graph_store. Where can I store the remaining 3 stores? Is it a best practice to store all of them in a vector database?

2) I am using SimpleKeywordTableIndex for keyword search. Where can I store this index if I want to do it in advance? Can this also be stored in a vector database?

I would really appreciate if you can point me to a documentation around this use case. Thanks!

5 comments

d

A

T

AAnurag Agrawal

·

I am using latest LlamaIndex features

I am using latest LlamaIndex features such as Ingestion pipeline for my RAG use case. I don't think there have been any major release since.

2 comments

A

L

AAnurag Agrawal

·

@Logan M @WhiteFang_Jr Any comments?

1 comment

L

AAnurag Agrawal

·

@Logan M

@Logan M
Question on hybrid (embedding + keyword) based retrieval vs document summary index based retrieval:

I built a Q&A system using hybrid retrieval. Now my next task is to get summary of the same documents over which I built this Q&A system on. I will not be using this summary of retrieval. My task is to simply present users with summary of the documents.
1) Is this something I can do with LlamaIndex?
2) If yes, I want to do it with minimal code changes. I am currently using Ingestion pipeline to persist things into docstore and vector_store. Is there a way for me to include document summaries in the same pipeline?

Let me know if you need anything additional. Thanks!

6 comments

L

A

AAnurag Agrawal

·

I have built a Q&A RAG application using

I have built a Q&A RAG application using llamaindex that answers questions based on documents present in a folder. Currently, it always answers questions even if the question is not related to any of the documents present in the folder. In most of these cases, it will make up an answer/hallucinate. Any tips to control this behavior? For example, a user simply typed hello and the app returned some random answer.

6 comments

T

A

AAnurag Agrawal

·

@Logan M , I hope you are looking into

@Logan M , I hope you are looking into this? 😬

14 comments

A

L

AAnurag Agrawal

·

@Logan M, I am using postgres for my

@Logan M, I am using postgres for my vector store. I think the code creates a table for storing the embeddings. Is it possible to disable that? I want to create the table in advance and not let the code create the table

5 comments

a

A

AAnurag Agrawal

·

Redis

I am using ingestion pipeline with redis docstore but my redis is in AWS. Does the current implementation support that?

2 comments

A

L

AAnurag Agrawal

·

I am using llama-2 locally for a RAG

I am using llama-2 locally for a RAG pipeline using llama-cpp-python. I don't want to use the default system_prompt. How do I change it? I tried using the system_prompt argument in LlamaCPP() but it didn't work:

device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", device = device, cache_folder=models_dir)
n_gpu_layers = 0 if device == "cpu" else -1
llm = LlamaCPP(
model_url=None,
model_path=f'{models_dir}/llama-2-7b-chat.Q4_K_M.gguf',
temperature=0.1,
max_new_tokens=256,
# llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
context_window=3900,
generate_kwargs={},
model_kwargs={"n_gpu_layers": n_gpu_layers, "offload_kqv": True},
# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=False,
system_prompt = ""
)
service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model
)
set_global_service_context(service_context)

7 comments

L

A

AAnurag Agrawal

·

Is anyone aware of an SLM good enough

Is anyone aware of an SLM good enough for a Q&A RAG usecase? I was using llama-2 7B (4-bit quantized) so far but it still consumes a lot of computing resources. The goal is to find something that can work completely on CPU. I have recently discovered TinyLlama/TinyLlama-1.1B-Chat-v1.0 but would like to know my other options. Thank you!

5 comments

R

b

A

L

AAnurag Agrawal

·

Here's the min reproducible code:

Here's the min reproducible code:

from llama_index.node_parser import SimpleNodeParser
from llama_index.ingestion import IngestionPipeline
from llama_index.vector_stores import PGVectorStore
from llama_index.embeddings import HuggingFaceEmbedding
from llama_index import SimpleDirectoryReader

loader = SimpleDirectoryReader(
input_files = [""], # Any file that can be broken into at least 2 nodes
filename_as_id=True,
)

documents = loader.load_data()

embed_model = embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", device = "cuda:0")
pg_vector_store = PGVectorStore.from_params(
database= "",
host= "",
password= "",
port="5432",
user="",
table_name= "nodes",
embed_dim=384, # bge-small-v1.5 embedding dimension
hybrid_search=True,
text_search_config="english",
)

pipeline = IngestionPipeline(
transformations=[
SimpleNodeParser(chunk_size=512, chunk_overlap=20),
embed_model,
],
vector_store = pg_vector_store,
)

nodes = pipeline.run(documents = documents)

50 comments

L

A

AAnurag Agrawal

·

and here's my code for postgres vector

and here's my code for postgres vector store:

from llama_index import VectorStoreIndex
from llama_index.node_parser import SimpleNodeParser
from llama_index.ingestion import IngestionPipeline, IngestionCache
from llama_index.ingestion.cache import RedisCache
from llama_index.vector_stores import PGVectorStore

pg_vector_store = PGVectorStore.from_params(
database=config["Database"]["DB_NAME"],
host=config["Database"]["DB_HOST"],
password=config["Database"]["DB_PASSWORD"],
port=config["Database"]["DB_PORT"],
user=config["Database"]["DB_USER"],
table_name= db_table_name,
embed_dim=384, # bge-small-v1.5 embedding dimension
hybrid_search=True,
text_search_config="english",
)

ingest_cache = IngestionCache(
cache=RedisCache.from_host_and_port(host="127.0.0.1", port=6379),
collection="my_test_cache",
)
pipeline = IngestionPipeline(
transformations=[
SimpleNodeParser(chunk_size=512, chunk_overlap=20),
embed_model,
],
vector_store = pg_vector_store,
cache = ingest_cache,
)

pipeline.run(documents)

# build index
vector_index = VectorStoreIndex.from_vector_store(vector_store = pg_vector_store, show_progress=True)

11 comments

A

L

AAnurag Agrawal

·

Is this example from the documentation

Is this example from the documentation complete? I tried this with my postgres db and local redis but I don't see the data in my vector DB:

from llama_index import Document
from llama_index.embeddings import OpenAIEmbedding
from llama_index.text_splitter import SentenceSplitter
from llama_index.extractors import TitleExtractor
from llama_index.ingestion import IngestionPipeline, IngestionCache
from llama_index.ingestion.cache import RedisCache

pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=25, chunk_overlap=0),
TitleExtractor(),
OpenAIEmbedding(),
],
cache=IngestionCache(
cache=RedisCache(
redis_uri="redis://127.0.0.1:6379", collection="test_cache"
)
),
)

Ingest directly into a vector db

nodes = pipeline.run(documents=[Document.example()])

5 comments

L

AAnurag Agrawal

·

I tried but couldn't get it work. Most

I tried but couldn't get it work. Most likely, because I need to use 'spawn' method for GPU and llama_index might be setting it to 'fork' somewhere in the code? Getting this error: RuntimeError: context has already been set

2 comments

A

L

Find answers from the community

@Logan M

I don't want my LLM to see any of the

1) Updated SimpleNodeParser to

Hi, I have a working Q&A RAG pipeline

I am using local models for a Q&A RAG

Can I just do a retrieve on a postgres

Where does download_loader save the

I am using a postgres vector DB to store

Hi, I am using llama-2 chat model from

This is what I am currently doing:

RAG

Hi All,

I am using latest LlamaIndex features

@Logan M @WhiteFang_Jr Any comments?

@Logan M

I have built a Q&A RAG application using

@Logan M , I hope you are looking into

@Logan M, I am using postgres for my

Redis

I am using llama-2 locally for a RAG

Is anyone aware of an SLM good enough

Here's the min reproducible code:

and here's my code for postgres vector

Is this example from the documentation

Ingest directly into a vector db

I tried but couldn't get it work. Most