Hey guys,
I'm using Google Cloud Run to deploy my RAG app. I finding the app quite slow, sometimes takes 10 seconds to execute the code.
It seems to be related to index storage. Locally, I'm saving the index storage in a folder and also my .txt files.
What people are doing out there in a real production app ?
Here is my docker. I define my VOLUMES but I don't think this is the best approach.
# Use the official Python 3.11 image as the base image
FROM --platform=linux/amd64 python:3.11
# Set the working directory in the container
WORKDIR /code
VOLUME /code/data
VOLUME /code/storage
# Copy the requirements.txt file into the container at /code
COPY requirements.txt .
# Install any needed dependencies specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application code into the container at /app
COPY . .
# Specify the command to run your application
# CMD [ "python", "app.py" ]
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "9090"]
# CMD ["uvicorn", "app.main:app", "--reload"]