Diffusivity uncertainty estimates are invalid or systematically underestimated

## Summary

The `DiffusionAnalyzer` class returns uncertainty estimates (`diffusivity_std_dev`, etc.) that may significantly underestimate the true statistical uncertainty in the estimated diffusion coefficient. This affects both smoothed (default) and non-smoothed analyses, though for different reasons.

## Issue 1: Smoothed analysis (default)

The default analysis (`smoothed="max"`) returns uncertainty values that the docstring notes "make sense only for non-smoothed analyses." However, since these values are returned without warning, users may use them without reading the docstring carefully.

## Issue 2: Non-smoothed analysis

For non-smoothed analysis (`smoothed=False`), the uncertainty is calculated using the standard OLS formula for variance in slope (lines 969-979), which assumes independent, identically distributed residuals. However, MSD data points are serially correlated, as the displacement at time $t_2$ is not independent of the displacement at $t_1$ < $t_2$. This correlation means the OLS standard error formula systematically underestimates the true uncertainty. See e.g., McCluskey et al., *J. Chem. Theory Comput.* 2024, 21, 1, 79–87 (https://doi.org/10.1021/acs.jctc.4c01249) for discussion of this issue.

## Issue 3: WLS weighting

The smoothed analysis uses weighted least squares with w = 1/t (implemented by multiplying both sides of the regression by 1/\sqrt{t}). This accounts for the approximate reduction in the number of time origins at longer lag times, but does not account for the intrinsic scaling of MSD variance, which for an ideal random walk is \propto t^2. This means the current weighting is not formal WLS (which would require w \propto 1 / Var(MSD)). The code comment on line 953 describes this as "weight by variance", which may be misleading.

## Impact

Users relying on the reported uncertainties may underestimate the true statistical uncertainty in their diffusion coefficients, potentially leading to over-confident conclusions.

## Possible remediation

One relatively simple-to-implement option would be to replace the uncertainty attributes (`diffusivity_std_dev`, etc.) with `@property` methods that raise informative warnings or errors when accessed, explaining the limitation and directing users to alternative approaches, such as:

- Estimating uncertainties from multiple independent simulations
- Using packages designed for accurate estimation of uncertainties from MSD regression, such as [kinisi](https://kinisi.readthedocs.io/)

This would be a breaking change, but would prevent users from unknowingly using unreliable uncertainty estimates.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diffusivity uncertainty estimates are invalid or systematically underestimated #449

Summary

Issue 1: Smoothed analysis (default)

Issue 2: Non-smoothed analysis

Issue 3: WLS weighting

Impact

Possible remediation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Diffusivity uncertainty estimates are invalid or systematically underestimated #449

Description

Summary

Issue 1: Smoothed analysis (default)

Issue 2: Non-smoothed analysis

Issue 3: WLS weighting

Impact

Possible remediation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions