Skip to content

Activity Regularizer Not Normalized by Batch Size #22018

@fwesel

Description

@fwesel

The documentation in keras/src/regularizers/regularizers.py lines 29-53 states:

"The value returned by the activity_regularizer is divided by the input batch size"

This is not true. The example in the docs claims ops.sum(layer.losses) equals 5.25, but it actually returns 25.25.

Minimal reproduction following the example in the docs in keras/src/regularizers/regularizers.py:

import os
os.environ["KERAS_BACKEND"] = "jax"
from keras.layers import Dense
from keras.regularizers import L1, L2
from keras import ops

layer = Dense(5, input_dim=5, kernel_initializer='ones',
              kernel_regularizer=L1(0.01), activity_regularizer=L2(0.01))
tensor = ops.ones(shape=(5, 5)) * 2.0
out = layer(tensor)

print(f"Total: {float(ops.sum(layer.losses))}")
# Expected: 5.25 (with normalization)
# Actual: 25.25 (without normalization)

The activity regularizer returns 25.0 instead of 5.0 (not divided by batch_size=5). This means activity regularization strength scales with batch size, making hyperparameter tuning inconsistent across different batch sizes.

Environment: Keras 3.x, JAX backend

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions