i wonder if someone could help me with the following issue. I am using the GithubRepositoryReader to read markdown files from my repository. When i run the code multiple times it creates another set of embeddings within my pgvector database. How can i get it to replace the existing embeddings for a particular file
Here is the code
def index_repository(org, repo, branch, use_wiki):
# create a GH client
gh_client = GithubClient()
# create a GH reader
reader = GithubRepositoryReader(
github_client=gh_client,
owner=org,
repo=repo,
verbose=False,
retries=3,
filter_file_extensions=(
[".md"],
GithubRepositoryReader.FilterType.INCLUDE
)
)
# load the documents
docs = reader.load_data(branch=branch)
embed_model = OpenAIEmbedding()
vector_store = get_vector_store(table_name="wiki_docs")
extractors = [
SemanticSplitterNodeParser(
buffer_size=1, breakpoint_percentile_threshold=90, embed_model=embed_model
),
TitleExtractor(nodes=5),
SummaryExtractor(summaries=["prev", "self", "next"]),
QuestionsAnsweredExtractor(questions=15, metadata=MetadataMode.EMBED),
KeywordExtractor(keywords=10),
embed_model,
]
pipeline = IngestionPipeline(transformations=extractors, vector_store=vector_store)
nodes = pipeline.run(documents=docs)
Part 2 of the question is how do i use the same reader to process the wiki documents attached to the repo