Hi, I'd like to expose my chat that I use with repl chat_engine.chat_repl() to the Internet. The idea is to have a React app for the frontend and a node.js app in the middle that would make the petitions to the llamaindex python code. What's the best way to do it? I'd say there has to be some socket to support the interaction with the chatbot but I'm lost at how to serve the python code to the node.js or directly on over the Internet.
I think making a fastapi api and just waiting for an input to a call like: response = chat_engine.chat("What did Paul Graham do in the summer of 1995?") could make the trick. Then it's about making the petition from react frontend and having a node.js servers as a middleware
I already have a project in skaffold to deploy node.js subprojects into a k3s cluster and have the authentication prepared with node.js. So I could have the fastapi in another machine more powerful to make the inference for llamaindex. But in principle, yes I could just query from react to fastapi. In fact, there's not much gain in puting node.js in K8s for scalability as the bottleneck will be in fastapi in a single machine
Depending on your end goals, I would consolidate onto one server. Either keep the node server and use the llama-index ts library (you can still build your index in python ad-hoc), or go the other way and kill the node server and replace with a python server. Get to the point of needing to scale, rather than scaling from the start. As a nice side-effect, you'll save some $
ok, you mean one option is to just use the fastapi as a server or the other is to use node.js and put in the same code the llamaindex calls using the TS llamaindex library?