Find answers from the community

Updated 4 months ago

Models

At a glance

it seams like no matter what i do it wants an openai api... i dont see the issue

71 comments

There's two models right, the LLM and embedding model

You are only setting the LLM, so the embedding model is falling back to defaults

MM00nshine

oh ok... i think a light bulb just came on lol

MM00nshine

LLogan M

💡 :dotsCATJAM:

MM00nshine

so does it have to be an embedding model?

LLogan M

Yes 😅 or at least it has to be a model supported by our embeddings abstractions

In a large majority of the best RAG systems, the embedding model is specifically fine tuned for embedding text for retrieval (I.e. text-embedding-ada-002, BAAI/bge-base-en, etc.)

Then, the LLM is responsible for writing text and following instructions

MM00nshine

the embedding model is like the reader for the vector store?

MM00nshine

maybe thats where im getting lost because i thought llama_index reads the pdf files

MM00nshine

how complicated would it be to have the embedder write to a sql db and the llm take from the db? or would that require the embedding model to read the db for the llm?

LLogan M

The embedding model is not a reader, it generates vectors though.

So it generates all the vectors for your vector db

Then at query time, it generates a vector for your query, and that's used to fetch similar nodes from the vector db

MM00nshine

im getting a weird errror: "ImportError: cannot import name 'BaseCache' from 'langchain'" any idea what causes that?

MM00nshine

or could it be anything? lol

LLogan M

seems like a langchain version issue? They like to break things lol

MM00nshine

File "/home/ng/.local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script
exec(code, module.dict)
File "/home/ng/llama-index/bob.py", line 3, in <module>

MM00nshine

from llama_index import ServiceContext

MM00nshine

i might be wet behind the ears with python but that looks like to me its the servicecontext?

LLogan M

try pip install langchain==0.0.315

LLogan M

importing service context imports a lot of other things under the hood, including langchain

LLogan M

I'm guessing langchain updated and broke an import somewhere

MM00nshine

but my servicecontext is imported from llama_index

MM00nshine

so should i change it to langchain?

MM00nshine

i dont understand... i can get it to work with langchain but you have to upload the pdf then wait for it to be embedded... I really want it to pull from a dir and maybe a db...

MM00nshine

but as soon as i try to use llama_index instead of langchain i mess it up lol

MM00nshine

212.178.231.251:8502

MM00nshine

that is the langchain version that im trying to remodel into llama_index lol

LLogan M

I know. But with python, when you import a file, that file also runs all it's imports. So the import tree eventually gets to a file that imports langchain

LLogan M

I don't think this is a langchain version? 😅

LLogan M

Did you try running that pip command above? Or can you share what you are working on, I can help correct the code if needed

MM00nshine

the version running at that ip is this

MM00nshine

i ran the pip command but i already have langchain installed

MM00nshine

the llama_index version i am trying to make pulls from a 'data' dir instead of the 'upload'. i startted with the streamlit version from their github and was trying to modify it to use models other than openai lol

MM00nshine

what I have on theat is this

LLogan M

Plain Text

pip uninstall langchain
pip install langchain==0.0.315

That should solve the langchain errors. The rest of the code looks fine at first glance

MM00nshine

na the error is still there

MM00nshine

Attachment

MM00nshine

thats the error

MM00nshine

I'm trying to learn lol

LLogan M

Hmm, it works on my end. But maybe try to use a venv for your packages, it's the ideal approach with python dependencies

Plain Text

python -m venv venv
source venv/bin/activate
pip install llama-index streamlit ...

MM00nshine

that didn't change anything for me

LLogan M

🤷‍♂️ Idk man, I'm lost now. You project is cursed

MM00nshine

start over i guess lol

MM00nshine

do you know of a good tutorial that shows how to change llama_index to use a huggingface llm and embedding model?

MM00nshine

im fixing to wipe and reinstall ubuntu on my lab machine... just incase it is a requirement issue lol

MM00nshine

in the service_context can we pass both a llm and an embeded?

LLogan M

Yessir

MM00nshine

Ok so I started over... Lol is there a way to get it to print to logs? It trys to work then it says a warning about huggingface_hub_api being depreciated then just "killed"...

LLogan M

Sounds like it's loading the LLM and running out of memory 👀

MM00nshine

Is 16gb not enough?

MM00nshine

It's an 8 core with 16 GB ram

LLogan M

Probably not enough. Like, it would work on google colab, but since your laptio is already using some of the ram, its probably runnng out

MM00nshine

If your don't care to take a quick look I can share my .my and requirements.txt

MM00nshine

I probably have something wrong. Lol

LLogan M

I think the issue is not with the requirements.txt, mostly with your hardware 😅 Running LLMs locally is hard.

MM00nshine

"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/ng/.local/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:127: FutureWarning: 'init' (from 'huggingface_hub.inference_api') is deprecated and will be removed from version '0.19.0'. InferenceApi client is deprecated in favor of the more feature-complete InferenceClient. Check out this guide to learn how to convert your script to use it: https://huggingface.co/docs/huggingface_hub/guides/inference#legacy-inferenceapi-client.
warnings.warn(warning_message, FutureWarning)"

MM00nshine

is that error normal?

MM00nshine

so huggingface is local?

MM00nshine

is there a way to see how much memory its using?

MM00nshine

does it look like I did the redis vectorstore correctly?

LLogan M

@M00nshine not quite

Plain Text

from llama_index import StorageContext, VectorStoreIndex

...
vector_store = ...
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

LLogan M

And then to "load" it later, after doing from_documents

Plain Text

index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

MM00nshine

ok so like the first index loads the documents into the vectorstore then the 2nd index reads them from the vectorstore?

MM00nshine

if so that would be where I would want to split it into 2 seperate files and I could actually run sister systems so one indexes the pdfs and the other system uses them for a query... If I could figure out the redis part I could host it on the embedding system, write to it from the embedding system, and read it from the query system... maybe.

MM00nshine

would Milvus be a better choice than redis?

LLogan M

Hmm, I think so lol

MM00nshine

what about chromadb? any thoughts?

LLogan M

Chromadb is alright, but I think others like milvus and qdrant are more optimized. But tbh they are all mostly the same lol

MM00nshine

I think I finally got it lol

MM00nshine

well kinda i need to figure out the page formatting to allow longer answers from the "assistant" and some streamlit stuff to add a sidebar for uploading and saving files to the disk.

MM00nshine

runs pretty fast on my little local dev machine "orangepi 5 plus 8 core cpu and 16gb ram with a m2 nvme ssd." ubuntu 22.04 server

MM00nshine

So the response is being cut off in the response window. Would you say that is a streamlit, llm, or index limitation?

LLogan M

You'll want to set max_new_tokens in the model kwargs instead of max length I think? Maybe?

You'll also need to change num_ouputs in the service context to match

Add a reply