Find answers from the community

B
Byron
Offline, last seen 3 months ago
Joined September 25, 2024
Hi - I'm having trouble getting refresh() to work, it creates a new document every time. is there an error in here? thank you!
Plain Text
db_documents = db.load_data(query=query)
for document in db_documents:
   document.doc_id = VERSION_NUMBER + "_"+"string"

vector_store = PGVectorStore.from_params(
    database="postgres",
    host=HOSTNAME,
    password=PASS,
    port=5432,
    user=USER,
    table_name=TABLE,
    embed_dim=1536,
    hybrid_search=True,
)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

refreshed_docs = index.refresh(
    db_documents,
)
index.storage_context.persist()

(there's only one row loaded from the database currently)
13 comments
L
B
t
Sorry for all the questions, hopefully useful for others!

I also tried using Supabase instead of Postgres, but when i run
Plain Text
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
index.refresh_ref_docs(documents)

i get the error:
Plain Text
  File "/...../supabase.py", line 117, in delete
    raise NotImplementedError("Delete not yet implemented for vecs.")


is it possible that vecs cannot support index-refreshing because the library doesn't support "delete" yet?
https://supabase.github.io/vecs/api/
2 comments
L
B
Hi - I'm getting this error when using Postgres as the vector_store (it works as DatabaseReader):
"sqlalchemy.exc.InvalidRequestError: Attribute name 'metadata' is reserved when using the Declarative API."

Plain Text
vector_store = PGVectorStore.from_params(
            database="postgres",
            host="",
            password="",
            port=5432,
            user="",
            table_name="mytable"
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

any ideas? thank you!
9 comments
B
L
Hi - I'm trying to set a custom "doc_id" for each record from my database load, so that I can easily identify rows and hopefully not duplicate them later.

any idea why the "doc_id"s are still coming out as the long default ids?
Plain Text
documents = db.load_data(query = query)

for document in documents:
    # split the text by comma and take the first value
    first_value = document.get_text().split(',')[0]
    print(first_value) #this does work
    # assign the first_value as the doc_id of the document
    document.id_ = first_value

print(documents)
4 comments
B
L
Hi - I'm having trouble reading from my SupabaseVectorStore. when i try to do this:
Plain Text
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
query_engine = index.as_query_engine()
response = query_engine.query("my query is here")

i get this error:
Plain Text
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: extensions.vector <=> unknown
LINE 1: SELECT vecs.test1.id, vecs.test1.vec <=> '[-0.01009003072977...
                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

[SQL: SELECT vecs.test1.id, vecs.test1.vec <=> %(vec_1)s AS anon_1, vecs.test1.metadata 
FROM vecs.test1 ORDER BY vecs.test1.vec <=> %(vec_1)s 
 LIMIT %(param_1)s]

somehow it's not seeing the passed in vector as a postgres vector data type. it works if i manually paste in a vector and run this query though.

any ideas? thank you!
9 comments
B
L
any best practice for scraping web pages? i assume we'd want to convert it to plain text, stripping out the html. even so, a big portion of the text ends up being for the navbar, header, footer, etc.

is there any more automated method vs writing a custom parser (ie using beautifulsoup) for each type of page?

🙏
4 comments
D
B