I am using a code-completion model for (will be open sourced very soon).
Qwen2.5-coder 1.5b though tends to repeat what has already been written, or change it slightly. (See the video)
Is this intentional? I am passing the prefix and suffix correctly to ollama, so it knows where it currently is. I’m also trimming the amount of lines it can see, so the time-to-first-token isn’t too long.
Do you have a recommendation for a better code model, better suited for this?
I hope someone else chimes in and can offer some advice. You could have a look at the ollama log / debug output and see if the
<|fim_prefix|>
,<|fim_suffix|>
and<|fim_middle|>
tokens are at the correct spots when fed into the LLM. (as per https://github.com/QwenLM/Qwen2.5-Coder?tab=readme-ov-file#3-file-level-code-completion-fill-in-the-middle ) Other than that, I don’t have a clue. You could also try a different model. But I guess there is something wrong somewhere. I mean coding sometimes is repetetive. But it shouldn’t do it like that.