🐱Ministral 3: How to Run Guide

Guide for Mistral Ministral 3 models, to run or fine-tune locally on your device

istral releases Ministral 3, their new multimodal models in Base, Instruct, and Reasoning variants, available in 3B, 8B, and 14B sizes. They offer best-in-class performance for their size, and are fine-tuned for instruction and chat use cases. The multimodal models support 256K context windows, multiple languages, native function calling, and JSON output.

The full unquantized 14B Ministral-3-Instruct-2512 model fits in 24GB RAM/VRAM. You can now run, fine-tune and RL on all Ministral 3 models with Unsloth:

Run Ministral 3 TutorialsFine-tuning Ministral 3

We've also uploaded Mistral Large 3 GGUFs here. For all Ministral 3 uploads (BnB, FP8), see here.

Ministral-3-Instruct GGUFs:
Ministral-3-Reasoning GGUFs:

3B8B14B

3B8B14B

⚙️ Usage Guide

To achieve optimal performance for Instruct, Mistral recommends using lower temperatures such as temperature = 0.15 or 0.1

For Reasoning, Mistral recommends temperature = 0.7 and top_p = 0.95.

Instruct:
Reasoning:

Temperature = 0.15 or 0.1

Temperature = 0.7

Top_P = default

Top_P = 0.95

Adequate Output Length: Use an output length of 32,768 tokens for most queries for the reasoning variant, and 16,384 for the instruct variant. You can increase the max output size for the reasoning model if necessary.

The maximum context length Ministral 3 can reach is 262,144

The chat template format is found when we use the below:

tokenizer.apply_chat_template([
    {"role" : "user", "content" : "What is 1+1?"},
    {"role" : "assistant", "content" : "2"},
    {"role" : "user", "content" : "What is 2+2?"}
    ], add_generation_prompt = True
)

Ministral Reasoning chat template:

Ministral Instruct chat template:

📖 Run Ministral 3 Tutorials

Below are guides for the Reasoning and Instruct variants of the model.

Instruct: Ministral-3-Instruct-2512

To achieve optimal performance for Instruct, Mistral recommends using lower temperatures such as temperature = 0.15 or 0.1

Llama.cpp: Run Ministral-3-14B-Instruct Tutorial

1

Obtain the latest llama.cpp on GitHub here. You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.

2

You can directly pull from Hugging Face via:

3

Download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose UD_Q4_K_XL or other quantized versions.

Reasoning: Ministral-3-Reasoning-2512

To achieve optimal performance for Reasoning, Mistral recommends using temperature = 0.7 and top_p = 0.95.

Llama.cpp: Run Ministral-3-14B-Reasoning Tutorial

1

Obtain the latest llama.cpp on GitHub. You can also use the build instructions below. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.

2

You can directly pull from Hugging Face via:

3

Download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose UD_Q4_K_XL or other quantized versions.

🛠️ Fine-tuning Ministral 3

Unsloth now supports fine-tuning of all Ministral 3 models, including vision support. To train, you must use the latest 🤗Hugging Face transformers v5 and unsloth which includes our our recent ultra long context support. The large 14B Ministral 3 model should fit on a free Colab GPU.

We made free Unsloth notebooks to fine-tune Ministral 3. Change the name to use the desired model.

Ministral Vision finetuning notebook

Ministral Sudoku GRPO RL notebook

Reinforcement Learning (GRPO)

Unsloth now supports RL and GRPO for the Mistral models as well. As usual, they benefit from all of Unsloth's enhancements and tomorrow, we are going to release a notebook soon specifically for autonomously solving the sudoku puzzle.

To use the latest version of Unsloth and transformers v5, update via:

The goal is to auto generate strategies to complete Sudoku!

For the reward plots for Ministral, we get the below. We see it works well!

Last updated

Was this helpful?