whatr does the reward mode score mean?

The score of the reward model is obtained directly from the final linear layer without undergoing normalization, which feels somewhat peculiar. What is the value range of the reward model's score, and what does it represent?