ml forecaster fails when not using default sensor

# mlforecaster fails with "Unable to obtain: 96 lags_opt values" despite sufficient statistics data

## EMHASS Version
v0.16.1

## Environment
- Home Assistant OS (Proxmox)
- EMHASS installed as HA addon
- `use_websocket: true`
- `optimization_time_step: 30`

## Description

When using `load_forecast_method: mlforecaster` with a trigger-based template sensor (firing every 30 minutes) as `sensor_power_load_no_var_loads` and `var_model`, the naive-mpc-optim action consistently fails with:

```
ERROR in forecast: Unable to obtain: 96 lags_opt values from sensor: power load no var loads,
check optimization_time_step/freq and historic_days_to_retrieve/days_to_retrieve parameters
```

This occurs despite:
- The sensor having 3.5+ days of long-term statistics (well exceeding the required 96 × 30-min slots = 48 hours)
- The sensor appearing correctly in Developer Tools → Statistics with continuous data
- `historic_days_to_retrieve: 2` being set in both config and runtime params
- `var_model` and `sensor_power_load_no_var_loads` both pointing to the same correct sensor
- The ML model having been successfully trained against the same sensor (R² ~0.29 with 2 days data)
- Switching to `load_forecast_method: naive` working perfectly with identical config

## Sensor Setup

The load sensor is a trigger-based template sensor firing every 30 minutes:

```yaml
template:
  - trigger:
      - trigger: time_pattern
        minutes: "/30"
    sensor:
      - name: "House Load 30min Average"
        unit_of_measurement: "W"
        device_class: power
        state_class: measurement
        state: >
          {{ states('sensor.alphaess_current_house_load') | float(0) }}
```

This sensor:
- Has `state_class: measurement`
- Appears in Developer Tools → Statistics
- Has continuous data from creation date
- Returns sensible wattage values

## Relevant Config

```json
{
  "load_forecast_method": "mlforecaster",
  "sensor_power_load_no_var_loads": "sensor.house_load_30min_average",
  "sensor_power_photovoltaics": "sensor.pv_30min_average",
  "var_model": "sensor.house_load_30min_average",
  "historic_days_to_retrieve": 2,
  "num_lags": 48,
  "optimization_time_step": 30,
  "sklearn_model": "KNeighborsRegressor",
  "use_websocket": true,
  "sensor_replace_zero": [
    "sensor_power_photovoltaics",
    "sensor_power_load_no_var_loads"
  ],
  "sensor_linear_interp": [
    "sensor.pv_30min_average",
    "sensor.house_load_30min_average"
  ]
}
```

## Runtime Parameters Passed to naive-mpc-optim

```json
{
  "pv_power_forecast": [...],
  "load_cost_forecast": [...],
  "prod_price_forecast": [...],
  "extra_var_model": [...],
  "soc_init": 0.36,
  "prediction_horizon": 96,
  "historic_days_to_retrieve": 2,
  "days_to_retrieve": 2
}
```

## ML Training Call (succeeds)

```json
{
  "var_load": "sensor.house_load_30min_average",
  "historic_days_to_retrieve": 2,
  "num_lags": 48,
  "sklearn_model": "KNeighborsRegressor",
  "extra_var_model": ["sensor.current_apparent_temperature"]
}
```

Training completes successfully in under 1 second via the statistics API (no REST fallback).

## Logs

### Successful ML training (for reference)
```
INFO in retrieve_hass: Statistics data retrieval took 0.76 seconds
INFO in machine_learning_forecaster: Training a KNeighborsRegressor model
INFO in machine_learning_forecaster: Elapsed time for model fit: 0.01
INFO in machine_learning_forecaster: Prediction R2 score of fitted model on test data: 0.289
```

### Failed naive-mpc-optim with mlforecaster
```
INFO in retrieve_hass: Statistics data retrieval took 0.08 seconds
WARNING in retrieve_hass: Unable to find all the sensors in sensor_replace_zero parameter
WARNING in retrieve_hass: Confirm sure all sensors in sensor_replace_zero are sensor_power_photovoltaics and/or sensor_power_load_no_var_loads
INFO in forecast: Retrieving data from hass for load forecast using method = mlforecaster
INFO in retrieve_hass: Statistics data retrieval took 0.07 seconds
WARNING in retrieve_hass: Unable to find all the sensors in sensor_replace_zero parameter
WARNING in retrieve_hass: Confirm sure all sensors in sensor_replace_zero are sensor_power_photovoltaics and/or sensor_power_load_no_var_loads
ERROR in forecast: Unable to obtain: 96 lags_opt values from sensor: power load no var loads,
check optimization_time_step/freq and historic_days_to_retrieve/days_to_retrieve parameters
```

### Successful naive-mpc-optim with naive forecaster (identical config, same run)
```
INFO in retrieve_hass: Statistics data retrieval took 0.08 seconds
INFO in forecast: Retrieving data from hass for load forecast using method = naive
INFO in retrieve_hass: Statistics data retrieval took 0.07 seconds
INFO in web_server:  >> Performing naive-mpc-optim...
INFO in optimization: Total value of the Cost function = 2.27
INFO in retrieve_hass: Successfully posted to sensor.p_batt_forecast = -5000.0
INFO in retrieve_hass: Successfully posted to sensor.soc_batt_forecast = 44.32
INFO in retrieve_hass: Successfully posted to sensor.optim_status = Optimal
```

## Persistent sensor_replace_zero Warning

Throughout all testing, the `sensor_replace_zero` warning fires on every run despite the config containing the correct key names (`sensor_power_photovoltaics` and `sensor_power_load_no_var_loads`). This warning persists even when `set_zero_min: false` is passed at runtime. It is unclear whether this warning is related to the lags failure or is a separate issue.

## What Was Ruled Out

- **Insufficient data**: Sensor has 3.5+ days of statistics, confirmed in Developer Tools → Statistics
- **Wrong sensor**: Both `var_model` and `sensor_power_load_no_var_loads` correctly point to `sensor.house_load_30min_average`
- **High-frequency sensor timeout**: Previous attempts using the raw AlphaESS source sensor (`sensor.alphaess_current_house_load`, updating every ~5 seconds) caused REST API hangs. The 30-min template sensor was created specifically to avoid this.
- **JSON syntax errors**: Payload parses correctly, confirmed by runtime params appearing in logs
- **Model/sensor name mismatch**: Model was retrained after each sensor change
- **historic_days_to_retrieve too high**: Tested with values of 2 and 3, both fail identically

## Hypothesis

The mlforecaster lags check may be using a different code path than the training call to retrieve the last-window data, possibly looking up the sensor via `sensor_power_load_no_var_loads` config key rather than `var_model`, and either retrieving a different sensor or failing to obtain sufficient resampled rows despite the statistics being present. The naive forecaster succeeds with identical config, suggesting the issue is isolated to the mlforecaster lags retrieval logic.

## Workaround

Running with `load_forecast_method: naive` works correctly and produces valid optimisation results.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml forecaster fails when not using default sensor #726

mlforecaster fails with "Unable to obtain: 96 lags_opt values" despite sufficient statistics data

EMHASS Version

Environment

Description

Sensor Setup

Relevant Config

Runtime Parameters Passed to naive-mpc-optim

ML Training Call (succeeds)

Logs

Successful ML training (for reference)

Failed naive-mpc-optim with mlforecaster

Successful naive-mpc-optim with naive forecaster (identical config, same run)

Persistent sensor_replace_zero Warning

What Was Ruled Out

Hypothesis

Workaround

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ml forecaster fails when not using default sensor #726

Description

mlforecaster fails with "Unable to obtain: 96 lags_opt values" despite sufficient statistics data

EMHASS Version

Environment

Description

Sensor Setup

Relevant Config

Runtime Parameters Passed to naive-mpc-optim

ML Training Call (succeeds)

Logs

Successful ML training (for reference)

Failed naive-mpc-optim with mlforecaster

Successful naive-mpc-optim with naive forecaster (identical config, same run)

Persistent sensor_replace_zero Warning

What Was Ruled Out

Hypothesis

Workaround

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions