Since utterances serve as training data for embedding model, how many utterances should I provide? How large will it impact the accuracy?