Find answers from the community

Updated 2 months ago

I am using pgvector as my vector

I am using pgvector as my vector database , when I load data into vector store it is not adding index on pgvector .. Below is my code .. Data is there in the table but there is no index created for this data in table.
Plain Text
from sqlalchemy import make_url

url = make_url(connection_string)
vector_store = PGVectorStore.from_params(
    database=db_name,
    host=url.host,
    password=url.password,
    port=url.port,
    user=url.username,
    table_name=“testable”,
    embed_dim=1536,  # openai embedding dimension
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, show_progress=True
)
L
n
37 comments
Sounds weird. Are you able to query the data?

Plain Text
nodes = index.as_retriever().retrieve("test"
print(len(nodes))
Yes Logan , I am able to query
Plain Text
vector_db=# \d data_testtable
                                   Table "public.data_testtable”
  Column   |       Type        | Collation | Nullable |                    Default
-----------+-------------------+-----------+----------+------------------------------------------------
 id        | bigint            |           | not null | nextval('data_testtable_id_seq'::regclass)
 text      | character varying |           | not null |
 metadata_ | json              |           |          |
 node_id   | character varying |           |          |
 embedding | vector(1536)      |           |          |
Indexes:
    "data_testtable_pkey" PRIMARY KEY, btree (id)
Above is the table it created , as you can see there is no index on embedding column
So it is not creating IVF flat index or HNSW index - no index at all
Does the code sample I gave above work? If so, then it must be created somehow, retrieving won't work if there's no index
You can check the source code yourself if you want. I've used this vector store before without issue though
https://github.com/run-llama/llama_index/blob/1dc5cd595717925b82a15a99622f3d260f4d2bb2/llama_index/vector_stores/postgres.py#L284
Plain Text
root@ip-172-31-2-6:/home/ubuntu/pdftest# python3 querydocument.py
2 
code you gave is working fine -- It is showing 2 as output for number of nodes
But when I check the table definition in pgvector , it is not showing index on embedding column for some reason
You can test it using
Plain Text
\d tablename
from your side
ngl I know barely anything about pgvector lol but I do know I've used it in the past and it seems to work fine.

I recommend exploring the source code Ilinked above and seeing if you find anything odd about the way tables are being setup
Plain Text
select indexname from pg_indexes where tablename='<tablename>
';
I also tried above - it is only creating index on id column
Probably a bug that I discovered .. Will do some research and get back to you
This will adversely impact performance of queries
if we are not using HNSW index of pgvector
recent versions of pgvector support HNSW
I remember reading about that. If you can fix/improve the table creation then, that would be awesome! ❤️

Happy to review/merge and PRs you make
Sure Logan .. Will do some research and get back to you .. Thanks for quick response
Logan , While I have your attention how can we add PDF URL that routes to specific page in metadata for PDF chat bot
Currently we have below metadata by default
Plain Text
{"page_label": "6", "file_name": "testone.pdf
We want to have the exact PDF section URL in retrieved result so that end user can click that URL if they need to confirm the resource
citation with specific page URL
If you want the URL, you can manually attach it to the document/node

document.metadata["url"] = "..."

Then you can get the nodes back from the response object and check the metadata

response.source_nodes[0].node.metadata
ok.. But if a PDF has 100 pages , How can we associate specific page URL to retrieved chunk automatically
for example if retrieved chunk belongs to 52nd page
in PDF we need to give URL pointing to 52nd page
Is there a way to do this automatically(using llamaindex)
I mean, you have the page number and file name/overall URL, isn't that enough? I might be missing though point here though
Yeah.. I think that should be enough .. Thanks Logan
Add a reply
Sign up and join the conversation on Discord