I have a hard time using some different gguf models with llama-cpp in llama-index. Some work just find and some only answer garbage. So my first guess is like maybe its the prompting structure and I played with that but only with very little to none success... now I found when initializing the llm with llamacpp there is this:
# transform inputs into Llama2 format messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt,
Ok so it is kinda "hard coded" to llama2 prompting, even tho it still does not work with llama-2-chat...
is there any other things I could add here to point to different types of models?
Hello, I'm using a query engine to fetch data from qdrant and then generate a response. I found that the user input sometimes is very poor and so I try to enhance that by asking the llm to give a list of semantic keywords to the users input. that as such works well... now I wonder how can I manage to use the semantic keywords as search parameters while still use the users input as query. or is there a way where I prepare the context myself and then do the llm response in a second step ?
Is it possible to import a csv file and use every single line as a single document? So that after reading that file I would have a object with as many documents as there was lines in the CSV file?
Hello, I like to be able to add custom meta data to any of the documents I embed. The example I find looks like instead of the simple dataloader I just create manualy documents add my custom meta data and then run them into the embedding. Is that the right way or did I not find the sophisticated way?
is there a build in way to handle conversation history? I have kind of build my own thing but I guess that's not how it is supposed to be 🙂 asked that, is there a way to count tokens so that history will not grow to big?
Hi there, I learned that PDF parsing seems to be a very complex task, how is it about word parsing? Is that the same story in different cloth or is that less complex? what would be the easiest to parse to get the most best results beside pure text?
Hi there, for my company project I try to ingest data from confluence to then create a RAG assistant helping to find information in the vast amount of data we got. the first tests show that we only got like very crappy data to ingest into RAG, its tables, many images and little text if any and its mostly just words. I learn the people are very very lazy writing useful documentation.. thats for the ranting part 😉 Is there any strategy how to deal with such messy data? Is there maybe tutorials or tipps around? I would guess its not just my company having such lazy people writing only crap that only themself understand the day they write it
but here is another question, for a unusual use case I like to run a query against the vector store and only want result that are more far than a given distance. yes you read right I'm looking for results far away from my query 🙂 is it somehow possible to define what the minimum distance should be when doing a query?
I'm running into errors with some of the embedding models
sentence-transformers/all-distilroberta-v1 it states it has 768 dimensions and then during generating the embeddings I get this error: ../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [59,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed. it happens with all those embedding models stating to have 768 Dimensions I have tested so far 384, 512 dimensions was no issue and the embedding worked fine.
I run tests which embedding model would be best for our data and so I came across this issue. Do I do something wrong or is 768 Dimensions not working at all?
I just found that huggingface offers embeddings to be stored there, that would be a good place for one like me who like to share the vector data, is there a integration yet or is there any plan to ever do that?
We like to create a system that could create test questions for students based on documents we upload. Think of it like an automated school test generator. To help it a little it should be possible to add topics which are important to be asked about as well as give the number of questions to be created. Would something like this be possible with Llama index like out of the box or would you expect loads of custom code to be generated. Ignoring ui and stuff on the side that would be needed anyways.
Ok otherone, that one is odd as the same import works in my cstom script but not here File "C:\Users\user\miniconda3\envs\llama_index_new\lib\site-packages\llama_index\core\prompts\base.py", line 500, in get_template from llama_index.core.llms.langchain import LangChainLLM ModuleNotFoundError: No module named 'llama_index.core.llms.langchain'
I guess I found a way, I check if doc.text is != '' and then add it to a seperate docs array when I then path to the embedding process. so far it did get around the error
Thanks to Logan M I found some good examples to get RAG working with huggingface models. Now I like to get this usecase working, I'm working on a system that should help making text2image prompts based on millions of prompts I stored in the vector. So I like to put in a simple text2image prompt let it search for matching prompts in the vector store and then have the llm make a nice prompt out of the context, my guess is for this I need to play with the prompting in some way, as with the sample from logan M it comes back and tells my prompt would not result in any possible answer based on the context, thats true as I had no chance to tell what to do with the context as there was just my query which in my example is like: sail to italy. thats mainly the search term, how I make it do create a prompt from the context now?
its maybe a off topic question but I somehow breaking my fingers here so I just ask 😉 I try to get llama-cpp running on my GPU instead of CPU, im on a windows 11 and I followed the instructions to get it working but maybe I got the wrong instructions, does anyone here get it working on a windows 11 with nvidia GPU?