Find answers from the community

Updated 3 months ago

Models

it seams like no matter what i do it wants an openai api... i dont see the issue
L
M
71 comments
There's two models right, the LLM and embedding model

You are only setting the LLM, so the embedding model is falling back to defaults
oh ok... i think a light bulb just came on lol
πŸ’‘ :dotsCATJAM:
so does it have to be an embedding model?
Yes πŸ˜… or at least it has to be a model supported by our embeddings abstractions

In a large majority of the best RAG systems, the embedding model is specifically fine tuned for embedding text for retrieval (I.e. text-embedding-ada-002, BAAI/bge-base-en, etc.)

Then, the LLM is responsible for writing text and following instructions
the embedding model is like the reader for the vector store?
maybe thats where im getting lost because i thought llama_index reads the pdf files
how complicated would it be to have the embedder write to a sql db and the llm take from the db? or would that require the embedding model to read the db for the llm?
The embedding model is not a reader, it generates vectors though.

So it generates all the vectors for your vector db

Then at query time, it generates a vector for your query, and that's used to fetch similar nodes from the vector db
im getting a weird errror: "ImportError: cannot import name 'BaseCache' from 'langchain'" any idea what causes that?
or could it be anything? lol
seems like a langchain version issue? They like to break things lol
File "/home/ng/.local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script
exec(code, module.dict)
File "/home/ng/llama-index/bob.py", line 3, in <module>
from llama_index import ServiceContext
i might be wet behind the ears with python but that looks like to me its the servicecontext?
try pip install langchain==0.0.315
importing service context imports a lot of other things under the hood, including langchain
I'm guessing langchain updated and broke an import somewhere
but my servicecontext is imported from llama_index
so should i change it to langchain?
i dont understand... i can get it to work with langchain but you have to upload the pdf then wait for it to be embedded... I really want it to pull from a dir and maybe a db...
but as soon as i try to use llama_index instead of langchain i mess it up lol
212.178.231.251:8502
that is the langchain version that im trying to remodel into llama_index lol
I know. But with python, when you import a file, that file also runs all it's imports. So the import tree eventually gets to a file that imports langchain
I don't think this is a langchain version? πŸ˜…
Did you try running that pip command above? Or can you share what you are working on, I can help correct the code if needed
the version running at that ip is this
i ran the pip command but i already have langchain installed
the llama_index version i am trying to make pulls from a 'data' dir instead of the 'upload'. i startted with the streamlit version from their github and was trying to modify it to use models other than openai lol
what I have on theat is this
Plain Text
pip uninstall langchain
pip install langchain==0.0.315


That should solve the langchain errors. The rest of the code looks fine at first glance
na the error is still there
thats the error
I'm trying to learn lol
Hmm, it works on my end. But maybe try to use a venv for your packages, it's the ideal approach with python dependencies

Plain Text
python -m venv venv
source venv/bin/activate
pip install llama-index streamlit ...
that didn't change anything for me
πŸ€·β€β™‚οΈ Idk man, I'm lost now. You project is cursed
start over i guess lol
do you know of a good tutorial that shows how to change llama_index to use a huggingface llm and embedding model?
im fixing to wipe and reinstall ubuntu on my lab machine... just incase it is a requirement issue lol
in the service_context can we pass both a llm and an embeded?
Ok so I started over... Lol is there a way to get it to print to logs? It trys to work then it says a warning about huggingface_hub_api being depreciated then just "killed"...
Sounds like it's loading the LLM and running out of memory πŸ‘€
Is 16gb not enough?
It's an 8 core with 16 GB ram
Probably not enough. Like, it would work on google colab, but since your laptio is already using some of the ram, its probably runnng out
If your don't care to take a quick look I can share my .my and requirements.txt
I probably have something wrong. Lol
I think the issue is not with the requirements.txt, mostly with your hardware πŸ˜… Running LLMs locally is hard.
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/ng/.local/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:127: FutureWarning: 'init' (from 'huggingface_hub.inference_api') is deprecated and will be removed from version '0.19.0'. InferenceApi client is deprecated in favor of the more feature-complete InferenceClient. Check out this guide to learn how to convert your script to use it: https://huggingface.co/docs/huggingface_hub/guides/inference#legacy-inferenceapi-client.
warnings.warn(warning_message, FutureWarning)"
is that error normal?
so huggingface is local?
is there a way to see how much memory its using?
does it look like I did the redis vectorstore correctly?
@M00nshine not quite

Plain Text
from llama_index import StorageContext, VectorStoreIndex

...
vector_store = ...
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)
And then to "load" it later, after doing from_documents

Plain Text
index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)
ok so like the first index loads the documents into the vectorstore then the 2nd index reads them from the vectorstore?
if so that would be where I would want to split it into 2 seperate files and I could actually run sister systems so one indexes the pdfs and the other system uses them for a query... If I could figure out the redis part I could host it on the embedding system, write to it from the embedding system, and read it from the query system... maybe.
would Milvus be a better choice than redis?
Hmm, I think so lol
what about chromadb? any thoughts?
Chromadb is alright, but I think others like milvus and qdrant are more optimized. But tbh they are all mostly the same lol
I think I finally got it lol
well kinda i need to figure out the page formatting to allow longer answers from the "assistant" and some streamlit stuff to add a sidebar for uploading and saving files to the disk.
runs pretty fast on my little local dev machine "orangepi 5 plus 8 core cpu and 16gb ram with a m2 nvme ssd." ubuntu 22.04 server
So the response is being cut off in the response window. Would you say that is a streamlit, llm, or index limitation?
You'll want to set max_new_tokens in the model kwargs instead of max length I think? Maybe?

You'll also need to change num_ouputs in the service context to match
Add a reply
Sign up and join the conversation on Discord