GitAra: How We Trained a 3B Function-Calling Git Agent for Local Use
GitAra = git + ara: your local stochastic parrot for git commands (with a knack for music).
We fine-tuned a small, tool-calling language model to turn plain English into git commands with the accuracy of a cloud LLM. Because it’s small, you can run it locally on your own machine — no API keys, no cloud dependencies, full privacy.
Results
| Model | Parameters | Accuracy | Model link |
|---|---|---|---|
| GPT-OSS 120B (teacher) | 120B | 0.92 +/- 0.02 | |
| Llama 3.2 3B Instruct (tuned) | 3B | 0.92 +/- 0.01 | HuggingFace |
| Llama 3.2 1B Instruct (tuned) | 1B | 0.90 +/- 0.01 | HuggingFace |
| Llama 3.2 3B Instruct (base) | 3B | 0.12 +/- 0.05 | |
| Llama 3.2 1B Instruct (base) | 1B | 0.0 +/- 0.01 |
The tuned 3B model matches the 120B teacher while being 25x smaller. The 1B model is within one standard deviation while being 120x smaller.
All models available in the HuggingFace collection.
The Task
A practical Git assistant that interprets natural language requests and outputs appropriate Git commands:
- “what’s in the latest stash, show diff” →
git stash show --patch - “push feature-x to origin, override any changes there and track it” →
git push origin feature-x --force --set-upstream
We support 13 core Git commands: status, add, commit, push, pull, branch, switch, restore, merge, stash, rebase, reset, and log — deliberately excluding older checkout in favor of more modern alternatives.
Tool Calling Overview
The implementation uses JSON schemas following OpenAI’s function-calling format:
{
"type": "function",
"function": {
"name": "git_add",
"description": "Stage files for commit",
"parameters": {
"type": "object",
"properties": {
"files": {
"type": "array",
"description": "List of file paths to stage (use ['.'] for all files)",
"items": { "type": "string" },
"minItems": 1
}
},
"required": ["files"],
"additionalProperties": false
}
}
}
The model returns responses like:
{"name": "git_add", "parameters": {"files": ["README.md"]}}
A crucial feature is a do_nothing tool that allows the model to decline unreasonable requests instead of generating arbitrary commands.
Creating the Seed Dataset
We created approximately 100 examples showing requests paired with expected tool calls:
| Input | Output |
|---|---|
| apply stash@{5} | {"name": "git_stash", "parameters": {"action": "apply", "stash_ref": "stash@{5}"}} |
| merge vendor branch preferring ours | {"name": "git_merge", "parameters": {"branch": "vendor", "strategy": "ours"}} |
| show 8 commits for current branch with graph | {"name": "git_log", "parameters": {"limit": 8, "graph": true}} |
Training the Student
We expanded the 100 seed examples into 10,000 training pairs using the distil labs platform’s generation capabilities. This allowed fine-tuning Llama 3.2 3B Instruct — a model with 25x fewer parameters than the teacher — to match performance.
Most queries take less than 2 seconds to return a response on an M4 MacBook Pro once the model is loaded.
Future Improvements
- Constrained decoding to guarantee syntactically-valid JSON output
- Multi-turn workflows for handling complex, iterative tasks
- Quantization to reduce model size without significant performance loss
Conclusion
GitAra demonstrates a generalizable workflow for tool-calling scenarios applicable beyond Git assistance. While manual implementation requires substantial effort, the distil labs platform abstracts away most of the difficult parts.