Find answers from the community

Updated 3 months ago

is there anyone doing a similar thing

is there anyone doing a similar thing?
L
l
13 comments
Ive seen a few people try to index/query using code

It takes a lot of work it seems like. You have to be very careful how you chunk documents (don't want to cut off functions), design custom prompts.

It can work, but I hope llama index has better support for this in the future though.
Thanks @Logan M , do you have any references for it?
I tried to spill the code base into code snippets, and then ask OpenAI, it almost works as I want, but it seems not easy to maintain "code base to code snippets" works
Mmm, not really anything reference yet for this lol

I would try using a small test set of data.

Split it into snippets like you already are doing, create a Document object for each snippet, and insert it into a vector or list index.

Then you can try querying and see if anything else needs adjusting.
I'm not very sure how to do a similar thing with the llama-index
Try checking out the llama index docs and notebooks, lots of good examples

Good starting point in the docs:
https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html

A good starting notebook (there is like 100 in the repo LOL)
https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo-ChatGPT.ipynb
Thanks a lot
Hello @Logan M , I tried the guideline you shared, I got the raise ValueError("Reference doc id is None.") any idea of this, here's my code, the codeSnippet.source it's the code snippet that I spilt myself.
Plain Text
def indexCodeSnippets(self, codeSnippets: List[CodeSnippet]):
        nextDocId = 1
        previousNode: Node = None
        nodes: List[Node] = []
        
        for codeSnippet in codeSnippets:
            node = Node(text=codeSnippet.source, doc_id=str(nextDocId))
            nodes.append(node)
            
            if previousNode is not None:
                previousNode.relationships[DocumentRelationship.NEXT] = node.get_doc_id()
                node.relationships[DocumentRelationship.PREVIOUS] = previousNode.get_doc_id()
                
            previousNode = node
            nextDocId += 1
            
        # define LLM
        llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))

        # define prompt helper
        # set maximum input size
        max_input_size = 4096
        # set number of output tokens
        num_output = 10000
        # set maximum chunk overlap
        max_chunk_overlap = 20
        prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

        service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, 
prompt_helper=prompt_helper)
        self.__gptSimpleVectorIndex = GPTSimpleVectorIndex(nodes=nodes, service_context=service_context)
What's the full stack trace?
@Logan M here's the full stacktrace
Try setting ref_doc_id="my id" or similar when creating the node objects
This is used to track which node the document came from
Add a reply
Sign up and join the conversation on Discord