• 📖 Introduction • 🎉 News • ✨ Pipeline • ⚡️ Evaluation
• 📈 Decoding Trajectory • 💻 Algorithm • 📧 Contact
Uɴᴄᴏᴅᴇ is a novel decoding strategy for Masked Diffusion Models (MDMs) that unifies global trajectory planning with content-aware informativeness maximization. It addresses the key limitations of traditional uncertainty-based samplers when applied to MDMs: a rigid boundary bias and a bias toward "trivial tokens." By using a position-aware weighting mechanism and a calibrated confidence score, Uɴᴄᴏᴅᴇ guides the decoding path and prevents the premature selection of unimportant tokens, significantly improving generation quality.
- 20250912: This release provides enhanced support for decoding with LLaDA, integrating a variety of recent semi- and non-autoregressive sampling strategies, including: ReMDM, Fast-dLLM, Semi-AR, Margin-based sampler, Entropy-based sampler and Confidence-based sampler.
- 20250819: Released our Paper on arXiv. Released our Code on GitHub.
Uɴᴄᴏᴅᴇr is a novel decoding strategy designed for advanced Masked Diffusion Models (MDMs) like LLaDA and Dream. These models are powerful non-autoregressive alternatives for sequence generation, enabling flexible decoding through the iterative denoising of masked tokens.
git clone
conda create --name Uɴᴄᴏᴅᴇ python==3.10
conda activate Uɴᴄᴏᴅᴇ
cd Uɴᴄᴏᴅᴇ
pip install -r requirements.txtOur method, along with all baseline methods, can be applied for prediction across mathematical reasoning, code generation, and question-answering datasets.
This is an example of evaluation on the HumanEval dataset using Uɴᴄᴏᴅᴇ.
And you can run the change the --task and --mode to evaluate on other datasets and decoding methods.
cd scripts
python eval.py \
--task 'humaneval' \
--model_name 'GSAI-ML/LLaDA-8B-Instruct' \
--device 'cuda:5' \
--gen_length 256 \
--steps 256 \
--block_length 256 \
--mode pc_sampler \
--lambd 0.25 \
--alpha 10 \
--data_path ../data/humaneval.jsonl \
--result_path results/humaneval_pc_samplerFollowing are the evaluation bash scripts for all decoding methods.
| Decoding Method | Evaluation Command | Decoding Method | Evaluation Command |
|---|---|---|---|
| Semi-Autoregressive | cd scripts |
Entropy | cd scripts |
| EB-Sampler | cd scripts |
Fast-dLLM | cd scripts |
| Margin | cd scripts |
PC-sampler | cd scripts |
| ReMDM | cd scripts |
Linear_Position | cd scripts |
All decoding methods are evaluated on the same set of datasets: HumanEval, MBPP, GSM8K, MATH-500, GPQA, Countdown, and Sudoku. Evaluation results are saved in the results folder.
- For the GSM8K and GPQA datasets, we use
lm-evalfor evaluation. - For the remaining datasets, please refer to
scripts/eval.pyfor more details.
All methods are evaluated using the same set of evaluation scripts (including both lm-eval and our custom script) to ensure consistent assessment.
We provide a script to generate heatmaps for the decoding trajectories of different decoding methods. The script is located in scripts/heatmap.sh.
cd scripts
bash heatmap.shThe heatmap results are saved in the heatmap_results folder.
The choice of decoding strategy significantly impacts the generation order of Masked Diffusion Models (MDMs). A critical limitation of existing uncertainty-based methods is their tendency to exhibit a "U-shaped" trajectory (namely rigid boundary bias), where tokens at sequence boundaries are prioritized early in decoding, followed by convergence toward the center. This bias stems from the premature unmasking of boundary tokens (BOS and EOS), where the attention mechanism's local positional bias leads to elevated confidence for tokens near the sequence boundaries.
In contrast, our Uɴᴄᴏᴅᴇ introduces explicit trajectory control through position-aware weighting, enabling adaptive generation order tailored to task requirements. Below, we visualize the decoding trajectories on the GSM8K dataset for four representative sampling strategies:
| Sampling Strategy | Decoding Trajectory Heatmap | Sampling Strategy | Decoding Trajectory Heatmap |
|---|---|---|---|
| Confidence-based | ![]() |
Entropy-based | ![]() |
| Margin-based | ![]() |
Uɴᴄᴏᴅᴇ | ![]() |
-
Rigid Boundary Bias in Uncertainty-based Methods: Confidence, entropy, and margin-based samplers consistently exhibit the characteristic U-shaped pattern, with early decoding of tokens at both sequence boundaries. This behavior limits their ability to capture global dependencies required for complex reasoning tasks like mathematical problem-solving.
-
Trivial Token Bias: Uncertainty-based samplers tend to prioritize semantically trivial, high-frequency tokens (e.g., newline characters, spaces, common words like "the", punctuation marks such as ".", and exclamation marks "!") during the decoding process, leading to suboptimal reasoning paths.
-
Debias with Uɴᴄᴏᴅᴇ: Our method eliminates the U-shaped bias by regulating the decoding path through exponential positional weighting. This enables a more natural progression that aligns with the logical flow of reasoning tasks, as demonstrated by the sequential trajectory in the GSM8K dataset.
The adaptive trajectory control of Uɴᴄᴏᴅᴇ directly contributes to its superior performance on GSM8K (82.2% accuracy) compared to uncertainty-based alternatives, highlighting the importance of aligning decoding order with task-specific structural demands.
Uɴᴄᴏᴅᴇ is a novel decoding strategy for Masked Diffusion Models (MDMs) that addresses key limitations of existing uncertainty-based sampling methods. It unifies global trajectory planning with content-aware informativeness maximization through two core components:
-
Position-Aware Weighting Mechanism: Regulates the decoding path using an exponential decay function to enable flexible control over the generation order, adapting to task-specific structural demands.
-
Calibrated Confidence Score: Suppresses premature selection of trivial tokens (e.g., punctuation, filler words) by incorporating frequency-based adjustment from a reference corpus, promoting semantically rich content generation.
Extensive experiments across seven benchmarks demonstrate that Uɴᴄᴏᴅᴇ consistently outperforms existing MDM decoding strategies by more than 10% on average, narrowing the performance gap with state-of-the-art autoregressive models .
The complete workflow of Uɴᴄᴏᴅᴇ is summarized in the following algorithm:
Require: Predictor
$p_{\mathcal{D}'} \gets \text{FreqDist}(\mathcal{D}')$ $x \gets \text{Concat}(p_0, \text{[MASK]} \times L)$ -
for
$t = 1$ to$T$ do-
$\mathcal{M}_t \gets {i \mid x^i = \text{[MASK]}}$ // Get mask indices -
if
$\mathcal{M}_t = \emptyset$ then- break
- $\hat{x}0, \hat{p}^i \gets p{\theta}(\cdot \mid x)$
-
for each position
$i \in \mathcal{M}_t$ do$\mathcal{C}^{(i)} \gets \hat{p}^i \cdot \log p_{\mathcal{D}'}(x^i)$ -
$\mathcal{C}^{(i)} \gets \min(\mathcal{C}^{(i)}, \alpha)$ // Clip salience score $w^{(i)} \gets e^{-\lambda \cdot (i - |p_0|)}$ $\text{score}^{(i)} \gets w^{(i)} \cdot \mathcal{C}^{(i)}$
$n_k \gets \text{NumToReveal}(k, N, |\mathcal{M}_k|)$ -
$\mathcal{S}_t \gets \text{TopK}(\text{score}, n_k)$ // Select best tokens -
for each index
$j \in \mathcal{S}_t$ do-
$x^j \gets \hat{x}_0^j$ // Reveal selected token
-
-
-
return
$x$
-
$\lambda$ (lambda_val): Controls positional bias strength. Typical values range from 0 (no positional bias) to 1.0 (strong left-to-right bias). Recommended: 0 for Sudoku, 0.25 for most tasks, 0.5 for Countdown . -
$\alpha$ : Clipping threshold for confidence scores. Recommended value: 10 (provides stable results across tasks) . -
Background frequency distribution (
$p_{\mathcal{D}'}$ ): Constructed from a comprehensive corpus combining general text, mathematical reasoning problems, and evaluation datasets (see /data/baseline) .
If you have questions, suggestions, and bug reports, please email:
pengcheng.neu@outlook.com



