Find answers from the community

Home
Members
idontneedonetho
i
idontneedonetho
Offline, last seen 3 months ago
Joined September 25, 2024
i
idontneedonetho
·

Sql

Any one have any idea why my tools would be failing? I'm using a locally hosted postgres database that's running via docker, and I'm connecting to it via the 0.0.0.0 ip address, and I'm getting sql errors like
Plain Text
=== Calling Function ===                                                                                                Calling function: Wiki_Tool with args: {"input":"smoother braking"}                                                     Got output: Error: (sqlalchemy.dialects.postgresql.asyncpg.InterfaceError) <class 'asyncpg.exceptions._base.InterfaceError'>: cannot perform operation: another operation is in progress                                                        [SQL: SELECT public.data_wiki_docs.id, public.data_wiki_docs.node_id, public.data_wiki_docs.text, public.data_wiki_docs.metadata_, public.data_wiki_docs.embedding <=> $1 AS distance                                                           FROM public.data_wiki_docs ORDER BY distance asc                                                                         LIMIT $2::INTEGER]                                                                                                     [parameters: ('[-0.04321499168872833,-0.008070970885455608,0.038734838366508484,0.034068379551172256,-0.03698354959487915,0.08398095518350601,-0.007821937091648579, ... (7816 characters truncated) ... -0.027220504358410835,-0.04467884078621864,0.007395491935312748,-0.04819626361131668,0.009278454817831516,0.012993157841265202,-0.007883192971348763]', 5)]    (Background on this error at: https://sqlalche.me/e/20/rvf5)                                                            ========================
29 comments
i
L
t
Anyone else getting lots of 500 server errors when using an OpenAIAgent?
7 comments
i
L
How would I speed up the part between the Generating embeddings sections? Right now it can take up to 15 min before the next set of embeddings is generated. Which is making the whole process take up to 48 hours. This is using hybrid qdrant vector store setup. I'm on an SSD btw.
Plain Text
device = "cuda" if torch.cuda.is_available() else "cpu"
print("GPU available:", torch.cuda.is_available())
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", device=device)
#Settings.chunk_size = 512
qdrantclient = qdrant_client.QdrantClient(path="./qdrant_db")

'''DISCORD DATA'''
print("Loading local files...")
dir_path = 'DiscordDocs'
reader = SimpleDirectoryReader(input_dir=dir_path, required_exts=[".txt"])
discord_docs = reader.load_data()

print("Local files loaded successfully. Setting up vector store for Discord data...")
discord_vector_store = QdrantVectorStore(client=qdrantclient, enable_hybrid=True, batch_size=20, collection_name="discord-data")
discord_storage_context = StorageContext.from_defaults(vector_store=discord_vector_store)

discord_index = VectorStoreIndex.from_documents(discord_docs, storage_context=discord_storage_context, show_progress=True)
print("Discord data setup complete.")

Plain Text
GPU available: True
Loading local files...
Local files loaded successfully. Setting up vector store for Discord data...
Fetching 5 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<?, ?it/s]
Fetching 5 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<?, ?it/s]
Parsing nodes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [03:59<00:00,  2.16s/it]
Generating embeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2048/2048 [00:15<00:00, 131.86it/s]
Generating embeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2048/2048 [00:13<00:00, 151.84it/s]
(I'm still generating embeddings right now)
41 comments
L
i
Trying to use Gemini with my reply chain function, which works with GPT, but gemini keeps spitting out An error occurred: <MessageRole.MODEL: 'model'>.
Plain Text
async def fetch_reply_chain(message, max_tokens=4096):
    context = []
    tokens_used = 0
    current_prompt_tokens = len(message.content) // 4
    max_tokens -= current_prompt_tokens
    while message.reference is not None and tokens_used < max_tokens:
        try:
            message = await message.channel.fetch_message(message.reference.message_id)
            role = Role.MODEL if message.author.bot else Role.USER
            message_content = f"{message.content}\n"
            message_tokens = len(message_content) // 4
            if tokens_used + message_tokens <= max_tokens:
                context.append(HistoryChatMessage(message_content, role))
                tokens_used += message_tokens
            else:
                break
        except Exception as e:
            print(f"Error fetching reply chain message: {e}")
            break
    return context[::-1]

I am trying to set custom chat history via,
Plain Text
memory = ChatMemoryBuffer.from_defaults(token_limit=8192)
                context = await fetch_reply_chain(message)
                memory.set(context + [HistoryChatMessage(f"{content}", Role.USER)])
                chat_engine = index.as_chat_engine(
                    chat_mode="condense_plus_context",
                    similarity_top_k=2,
                    sparse_top_k=12,
                    vector_store_query_mode="hybrid",
                    memory=memory,
                    -
20 comments
i
L
I'm playing around with the WholeSiteReader and I was wondering, cause I can't find anything in the code, if anyone knows a way to filter out parts of a site. this Doesn't seem to show anything for filters, but I'm hoping someone knows a way to add a filter through other means.
27 comments
s
i
For some reason, I cannot get low level chat engine's to work anymore, I tried CondensePlusContextChatEngine and CondenseQuestionChatEngine, neither one works with retrieving info. I made sure to try setting the retriever and query_engine for both. I know it's getting the prompt and memory, but not searching the info.
Plain Text
client = QdrantClient(os.getenv('QDRANT_URL'), api_key=os.getenv('QDRANT_API'))
vector_store = QdrantVectorStore(client=client, collection_name="openpilot-data")
Settings.llm = OpenAI(model="gpt-4-turbo-preview", max_tokens=1000)
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)

async def process_message_with_llm(message, client):
    content = message.content.replace(client.user.mention, '').strip()
    if content:
        try:
            async with message.channel.typing():
                memory = ChatMemoryBuffer.from_defaults(token_limit=8192)
                context = await fetch_context_and_content(message, client, content)
                memory.set(context + [HistoryChatMessage(f"{content}", Role.USER)])
                chat_engine = CondensePlusContextChatEngine.from_defaults(
                    retriever=index.as_retriever(),
                    memory=memory,
                    context_prompt=(
                        "prompt"
                    )
                )
                chat_response = await asyncio.to_thread(chat_engine.chat, content)
1 comment
i
Getting this error when trying to use the GithubRepositoryReader:
Plain Text
GithubClient.get_branch() got an unexpected keyword argument 'timeout'

I just checked the bug reports for GithubRepositoryReader and saw FilterType was added back, so I updated and now I'm getting this error.
7 comments
i
L
i
idontneedonetho
·

Thread

Weird one, more help than issue. I’ve noticed that while running llama index, ram usage is pretty noticeable, I understand this is for reasons. But I’m wondering if there’s a way to use llama index to just query data, without using ram, maybe sql? I looked at the sql docs in the docs and it seems possible, but I wanted to come and ask opinions for if that’s the best route or if there’s something better? I’m trying to host this on a 1 gig of ram free server, so I have a limit.
24 comments
i
L
Hello, Trying to use the latest gpt-4-turbo-preview but it's not saying it's an option, there is also no option for gpt-4-0125-preview. Is there a way around this? Or are we stuck with gpt-4-0613-preview?
11 comments
i
L
Back again, still going down the local model only path using llama-cpp-python. Getting this same error:
Plain Text
ValueError:
******
Could not load OpenAI model. If you intended to use OpenAI, please check your OPENAI_API_KEY.
Original error:
No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys

To disable the LLM entirely, set llm=None.
******

This time tho, I'm trying to introduce Multi-Step Query:
Plain Text
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

# Index setup
PERSIST_DIR = "storage-data"
if not os.path.exists(PERSIST_DIR):
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents, service_context=service_context)
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context, service_context=service_context)

query_engine = index.as_query_engine(response_mode="compact_accumulate")

# Multi-step query engine setup
step_decompose_transform = StepDecomposeQueryTransform(llm=llm, verbose=True)
multi_step_query_engine = MultiStepQueryEngine(
    query_engine=query_engine,
    query_transform=step_decompose_transform,
    index_summary="Index summary for context"
)

@app.get("/", response_class=HTMLResponse)
async def get_form(request: Request):
    return templates.TemplateResponse("index.html", {"request": request})

@app.post("/query")
async def query(user_input: str = Form(...)):
    response = multi_step_query_engine.query(user_input)
    response_text = str(response)
    return {"response": response_text}

I tried doing step_decompose_transform = StepDecomposeQueryTransform(service_context=service_context) but that gave me an error about not expecting that
3 comments
i
L
I tried asking KapaGPT, https://discord.com/channels/1059199217496772688/1194708617564270704, for help and it told me to reach out to the maintainers, so, I'm getting this error when trying to use local LLMs for loading an index, the index persistent directory has been deleted, and then remade using the local models, but when I try to run the exact same code again to access the index I get this:
Plain Text
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\llms\utils.py", line 29, in resolve_llm
    validate_openai_api_key(llm.api_key)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\llms\openai_utils.py", line 379, in validate_openai_api_key
    raise ValueError(MISSING_API_KEY_ERROR_MESSAGE)
ValueError: No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "s:\local-indexer\flask_server.py", line 53, in <module>
    index = load_index_from_storage(storage_context)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\indices\loading.py", line 33, in load_index_from_storage
    indices = load_indices_from_storage(storage_context, index_ids=index_ids, **kwargs)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\indices\loading.py", line 78, in load_indices_from_storage
    index = index_cls(
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\indices\vector_store\base.py", line 52, in __init__
    super().__init__(
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\indices\base.py", line 62, in __init__
    self._service_context = service_context or ServiceContext.from_defaults()
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\service_context.py", line 178, in from_defaults
    llm_predictor = llm_predictor or LLMPredictor(
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\llm_predictor\base.py", line 109, in __init__
    self._llm = resolve_llm(llm)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\llms\utils.py", line 31, in resolve_llm
    raise ValueError(
ValueError:
******
Could not load OpenAI model. If you intended to use OpenAI, please check your OPENAI_API_KEY.
Original error:
No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys

To disable the LLM entirely, set llm=None.
******

I followed the tutorials on the docs for all of this:
Plain Text
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
set_global_tokenizer(
    AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)
model_url = "{url}"
llm = LlamaCPP(
    model_url=model_url,
    temperature=0.1,
    max_new_tokens=256,
    context_window=3900,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": 41},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
)
PERSIST_DIR = "storage-data"
if not os.path.exists(PERSIST_DIR):
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents, service_context=service_context)
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)
7 comments
i
L
Hello, I've been trying to find an answer in the docs, but, I'm not very well versed in this stuff yet. Would I be able to skip OpenAI all together and use google's gemini-pro/embedding llm for everything?
2 comments
i
L
Been searching for fast db methods and found hyperdb, searched around on discord and the web to see if it was compatible with llama-index, only things I found were twitter posts from a while ago. Any update on compatibility being added? Or should we be able to build it using the current tool sets we're given?
2 comments
i
L
I'm trying to figure out docker, but when I'm setting up, through not docker, but venv, a new copy of my stuff it's throwing this error
Plain Text
D:\Documents\GitHub\DockerTest\Scripts\python.exe D:\Documents\GitHub\DockerTest\core.py 
D:\Documents\GitHub\DockerTest\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "D:\Documents\GitHub\DockerTest\core.py", line 4, in <module>
    from modules.utils.GPT import process_message_with_llm
  File "D:\Documents\GitHub\DockerTest\modules\utils\GPT.py", line 26, in <module>
    Settings.embed_model = HuggingFaceEmbedding(model_name="avsolatorio/NoInstruct-small-Embedding-v0")
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Documents\GitHub\DockerTest\Lib\site-packages\llama_index\embeddings\huggingface\base.py", line 86, in __init__
    self._model = SentenceTransformer(
                  ^^^^^^^^^^^^^^^^^^^^
  File "D:\Documents\GitHub\DockerTest\Lib\site-packages\sentence_transformers\SentenceTransformer.py", line 197, in __init__
    modules = self._load_sbert_model(
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Documents\GitHub\DockerTest\Lib\site-packages\sentence_transformers\SentenceTransformer.py", line 1309, in _load_sbert_model
    module = module_class.load(module_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Documents\GitHub\DockerTest\Lib\site-packages\sentence_transformers\models\Pooling.py", line 230, in load
    return Pooling(**config)
           ^^^^^^^^^^^^^^^^^
TypeError: Pooling.__init__() got an unexpected keyword argument 'output_key'

Process finished with exit code 1

I'm assuming I may need to specify which version of SentenceTransformer I install?
6 comments
i
L
Updated to 0.10.1, uninstalled and reinstalled llama-index and llama-index-core, but when I do from llama_index.core import VectorStoreIndex it won't import VectorStoreIndex.
43 comments
i
L
Is there a way to turn off Gemini's safety filter when using it as the LLM for a chat engine?
63 comments
L
n
i
Was following the https://docs.llamaindex.ai/en/stable/examples/query_engine/sub_question_query_engine.html tutorial and received this error output:
Plain Text
**********
Trace: query
    |_query ->  6.062075 seconds
      |_templating ->  0.0 seconds
      |_llm ->  6.062075 seconds
**********
Traceback (most recent call last):
  File "S:\Gemini-Coder\local-indexer\cmd_local_index_chat.py", line 83, in <module>
    respnose = query_engine.query(
  File "C:\Users\thecr\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\core\base_query_engine.py", line 40, in query
    return self._query(str_or_query_bundle)
  File "C:\Users\thecr\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\query_engine\sub_question_query_engine.py", line 129, in _query
    sub_questions = self._question_gen.generate(self._metadatas, query_bundle)
  File "C:\Users\thecr\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\question_gen\llm_generators.py", line 78, in generate
    parse = self._prompt.output_parser.parse(prediction)
  File "C:\Users\thecr\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\question_gen\output_parser.py", line 13, in parse
    raise ValueError(f"No valid JSON found in output: {output}")
ValueError: No valid JSON found in output:   Understood! I'll do my best to help you with your questions and provide relevant sub-questions based on the tools provided. Please go ahead and ask your user question, and I'll generate the list of sub-questions accordingly.

I am using local embedding model and local language model, but everything else I kept the same. I didn't read anything bout linking a json file in that doc.
7 comments
V
i
L
Is it just me, or is the chat_engine weaker than the query_engine?
8 comments
L
i