is there anyone doing a similar thing

is there anyone doing a similar thing?

13 comments

Ive seen a few people try to index/query using code

It takes a lot of work it seems like. You have to be very careful how you chunk documents (don't want to cut off functions), design custom prompts.

It can work, but I hope llama index has better support for this in the future though.

ll9

Thanks @Logan M , do you have any references for it?
I tried to spill the code base into code snippets, and then ask OpenAI, it almost works as I want, but it seems not easy to maintain "code base to code snippets" works

LLogan M

Mmm, not really anything reference yet for this lol

I would try using a small test set of data.

Split it into snippets like you already are doing, create a Document object for each snippet, and insert it into a vector or list index.

Then you can try querying and see if anything else needs adjusting.

ll9

I also try https://github.com/openai/openai-cookbook/blob/main/examples/Code_search.ipynb, but it just seems not easy to use

ll9

I'm not very sure how to do a similar thing with the llama-index

LLogan M

Try checking out the llama index docs and notebooks, lots of good examples

Good starting point in the docs:
https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html

A good starting notebook (there is like 100 in the repo LOL)
https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo-ChatGPT.ipynb

ll9

Thanks a lot

ll9

Hello @Logan M , I tried the guideline you shared, I got the raise ValueError("Reference doc id is None.") any idea of this, here's my code, the codeSnippet.source it's the code snippet that I spilt myself.

Plain Text

def indexCodeSnippets(self, codeSnippets: List[CodeSnippet]):
        nextDocId = 1
        previousNode: Node = None
        nodes: List[Node] = []
        
        for codeSnippet in codeSnippets:
            node = Node(text=codeSnippet.source, doc_id=str(nextDocId))
            nodes.append(node)
            
            if previousNode is not None:
                previousNode.relationships[DocumentRelationship.NEXT] = node.get_doc_id()
                node.relationships[DocumentRelationship.PREVIOUS] = previousNode.get_doc_id()
                
            previousNode = node
            nextDocId += 1
            
        # define LLM
        llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))

        # define prompt helper
        # set maximum input size
        max_input_size = 4096
        # set number of output tokens
        num_output = 10000
        # set maximum chunk overlap
        max_chunk_overlap = 20
        prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

        service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, 
prompt_helper=prompt_helper)
        self.__gptSimpleVectorIndex = GPTSimpleVectorIndex(nodes=nodes, service_context=service_context)

LLogan M

What's the full stack trace?

ll9

@Logan M here's the full stacktrace

LLogan M

Ohhhh I see

LLogan M

Try setting ref_doc_id="my id" or similar when creating the node objects

LLogan M

This is used to track which node the document came from

Add a reply

Find answers from the community

is there anyone doing a similar thing