🐱Ministral 3: How to Run Guide
Guide for Mistral Ministral 3 models, to run or fine-tune locally on your device
istral releases Ministral 3, their new multimodal models in Base, Instruct, and Reasoning variants, available in 3B, 8B, and 14B sizes. They offer best-in-class performance for their size, and are fine-tuned for instruction and chat use cases. The multimodal models support 256K context windows, multiple languages, native function calling, and JSON output.
The full unquantized 14B Ministral-3-Instruct-2512 model fits in 24GB RAM/VRAM. You can now run, fine-tune and RL on all Ministral 3 models with Unsloth:
We've also uploaded Mistral Large 3 GGUFs here. For all Ministral 3 uploads (BnB, FP8), see here.
⚙️ Usage Guide
To achieve optimal performance for Instruct, Mistral recommends using lower temperatures such as temperature = 0.15 or 0.1
For Reasoning, Mistral recommends temperature = 0.7 and top_p = 0.95.
Temperature = 0.15 or 0.1
Temperature = 0.7
Top_P = default
Top_P = 0.95
Adequate Output Length: Use an output length of 32,768 tokens for most queries for the reasoning variant, and 16,384 for the instruct variant. You can increase the max output size for the reasoning model if necessary.
The maximum context length Ministral 3 can reach is 262,144
The chat template format is found when we use the below:
tokenizer.apply_chat_template([
{"role" : "user", "content" : "What is 1+1?"},
{"role" : "assistant", "content" : "2"},
{"role" : "user", "content" : "What is 2+2?"}
], add_generation_prompt = True
)Ministral Reasoning chat template:
Ministral Instruct chat template:
📖 Run Ministral 3 Tutorials
Below are guides for the Reasoning and Instruct variants of the model.
Instruct: Ministral-3-Instruct-2512
To achieve optimal performance for Instruct, Mistral recommends using lower temperatures such as temperature = 0.15 or 0.1
✨ Llama.cpp: Run Ministral-3-14B-Instruct Tutorial
Obtain the latest llama.cpp on GitHub here. You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.
You can directly pull from Hugging Face via:
Download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose UD_Q4_K_XL or other quantized versions.
Reasoning: Ministral-3-Reasoning-2512
To achieve optimal performance for Reasoning, Mistral recommends using temperature = 0.7 and top_p = 0.95.
✨ Llama.cpp: Run Ministral-3-14B-Reasoning Tutorial
Obtain the latest llama.cpp on GitHub. You can also use the build instructions below. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.
You can directly pull from Hugging Face via:
Download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose UD_Q4_K_XL or other quantized versions.
🛠️ Fine-tuning Ministral 3
Unsloth now supports fine-tuning of all Ministral 3 models, including vision support. To train, you must use the latest 🤗Hugging Face transformers v5 and unsloth which includes our our recent ultra long context support. The large 14B Ministral 3 model should fit on a free Colab GPU.
We made free Unsloth notebooks to fine-tune Ministral 3. Change the name to use the desired model.
Ministral-3B-Instruct Vision notebook (vision)
Ministral-3B-Instruct GRPO notebook
Ministral Vision finetuning notebook
Ministral Sudoku GRPO RL notebook
✨Reinforcement Learning (GRPO)
Unsloth now supports RL and GRPO for the Mistral models as well. As usual, they benefit from all of Unsloth's enhancements and tomorrow, we are going to release a notebook soon specifically for autonomously solving the sudoku puzzle.
Ministral-3B-Instruct GRPO notebook
To use the latest version of Unsloth and transformers v5, update via:
The goal is to auto generate strategies to complete Sudoku!


For the reward plots for Ministral, we get the below. We see it works well!




Last updated
Was this helpful?

