The documentation in keras/src/regularizers/regularizers.py lines 29-53 states:
"The value returned by the activity_regularizer is divided by the input batch size"
This is not true. The example in the docs claims ops.sum(layer.losses) equals 5.25, but it actually returns 25.25.
Minimal reproduction following the example in the docs in keras/src/regularizers/regularizers.py:
import os
os.environ["KERAS_BACKEND"] = "jax"
from keras.layers import Dense
from keras.regularizers import L1, L2
from keras import ops
layer = Dense(5, input_dim=5, kernel_initializer='ones',
kernel_regularizer=L1(0.01), activity_regularizer=L2(0.01))
tensor = ops.ones(shape=(5, 5)) * 2.0
out = layer(tensor)
print(f"Total: {float(ops.sum(layer.losses))}")
# Expected: 5.25 (with normalization)
# Actual: 25.25 (without normalization)
The activity regularizer returns 25.0 instead of 5.0 (not divided by batch_size=5). This means activity regularization strength scales with batch size, making hyperparameter tuning inconsistent across different batch sizes.
Environment: Keras 3.x, JAX backend
The documentation in
keras/src/regularizers/regularizers.pylines 29-53 states:This is not true. The example in the docs claims
ops.sum(layer.losses)equals5.25, but it actually returns25.25.Minimal reproduction following the example in the docs in
keras/src/regularizers/regularizers.py:The activity regularizer returns
25.0instead of5.0(not divided by batch_size=5). This means activity regularization strength scales with batch size, making hyperparameter tuning inconsistent across different batch sizes.Environment: Keras 3.x, JAX backend