Looking closer, what I would do is break your code into 2 files.
- Parse & build vector db
- run the queries
It looks like every time you want to ask a question, you have to do step 1
You make a good point! I came across a post on Langfuse earlier that seemed really interesting.
Perfect, thanks for your recommendation btw!
Have you tried Flowise AI before? I hope they will integrate langfuse soon
I'm pretty new to getting back into AI, so learning all this stuff around LLMs
It allows you to build an end-to-end large language model with a simple drag-and-drop interface
my vim motions are faster that DnD π€£
The main problem with those class of solutions is customization, you always have to drop into real code
I just got my RAG + Chat working, so now I have to deploy qdrant & langfuse (devops guys always be self-hosting π€£ )
@Nam Tran I think this code will create the index on every user message? You probably only want to create it once right?
that's correct, I am reworking on it!
I tried to isolate the index creation but it still did not seem to work. Could you please have a look and tell me what I was doing wrong?
I would have used st.session_state to store the query engine π
Thanks, I will give it a try
@Logan M It worked! Thank you for your help!
@verdverm I keep getting an issue when integrating langfuse to my app. I was wondering if you have ever encountered this
ERROR:langfuse:An error occurred in _handle_LLM_events: 'NoneType' object has no attribute 'generation'
Traceback (most recent call last):
File "/home/adminuser/venv/lib/python3.10/site-packages/langfuse/utils/error_logging.py", line 14, in wrapper
return func(*args, **kwargs)
File "/home/adminuser/venv/lib/python3.10/site-packages/langfuse/llama_index/llama_index.py", line 352, in _handle_LLM_events
generation = parent.generation(
Can you rename that file to have a .py
ending? (should code highlight)
I haven't seen this exact error, but it seems like parent
is None
I saw something like this in VS Code when I didn't use my direct calls to langfuse correctly
If you aren't making any calls yourself, this probably gets filed under the bug
department
I tried to reach out to the langfuse team for help but I have never got any response
One thing I did do was to add callback_manager = Settings.callback_manager
in a number of places, like when you create llm
, there an extra keyword arg
also, I don't think it is tracing the cost correctly
I suspect this is more likely a llama-index bug, Langfuse support is lacking...
Right, so by adding in the callback_handler
keyword arg in more places def improves the coverage
I'm still not seeing costs when I call VertexAI, was planning to file a bug at some point
Alright, thank you for your help! Will keep you updated if I hear back from them!
Here is a few places I put the extra kwarg, there are more
Pow! just found another from this convo, so I owe you a bit for the indirect help :]
my llm
line was also missing it
does the bge_large_en perform much better than the small one?
I'm not sure, but jinaai
was terrible
I saw some benchmarks that it was supposed to be SoTA with reranking, but the baseline was trash compared to the published numbers
not sure if I'm holding it wrong, I also saw some people saying they were getting different results from local vs cloud and the Jina team was working to correct that
I went with small because I'm deploying it to the cloud for the first time and...
- anecdotally I didn't see meaninful difference
- error on the small side
How long does it take for your model to return the answer? mine would take at least 15-20 seconds
I'm planning to get an evaluation pipeline in order to answer, is A better than B, for all the components
Which model? The embedding is typically very fast, Vertex / OpenAI take some time
The model and input/output size both impact timings
When I send a query to my rag model to get an answer, it will take at least 15 seconds before generating anything
I am now able to see most of it in Langfuse! So I can start to get a better answer to this
So I think there may be more than once call to the LLMs, depending on your setup (I haven't looked close enough at yours)
Getting that callback_manager in all the places will help uncover this sort of thing
not sure if you have seen this, but I am planning to apply truera on my models for comparision purpose
seems like a good one to me
I have not, but will probably check it out!
there are two sides to this, which makes it different from normal programming
- DevOps / LogMon type visibility, which is how I am holding Langfuse (and maybe UpTrain)
- Quality control, unique to LLM systems because there is fuzziness to it that SQL DBs don't experience
@Nam Tran I figured out more about langfuse, cleaned up my calling methods, got pricing working for the whole thing
And a second message that skips the RAG step
@verdverm awesome! I will look closer into the set up later. Glad to know that it worked!
I may pull apart the embedding / lookup phase, as I expect to hit similar issues with cost calculation (note, you have to set total_cost
manually, the comment is wrong in langfuse docs)
@verdverm To follow up on the issue I mentioned earlier, this is what Langfuse relied me
"Llamaindex has issues with thwir concurrency model. If you execute multiple API requests at the same time, you run into this issue. I would advise you to add some sort of locking so that only one API request can be executed at a time"
Do you have a link to what they relayed* to you?
(*relayed, not relied, is the word I think you were after)
I sent a text message to them on their web page
are you calling .flush()
yourself?
this is for the code that I showed you at beginning