allenai · 2015aroras · May 21, 2024 · May 21, 2024
diff --git a/README.md b/README.md
@@ -62,13 +62,13 @@ Details about the other types of OLMo checkpoints (including OLMo HF Transformer
 
 ## Inference
 
-You can utilize our Hugging Face integration to run inference on the olmo checkpoints:
+You can utilize our Hugging Face integration to run inference on the OLMo Transformers checkpoints:
 
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 
-olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-hf")
-tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-hf")
+olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
+tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf")
 
 message = ["Language modeling is "]
 inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
@@ -80,7 +80,7 @@ Alternatively, with the Hugging Face pipeline abstraction:
 
 ```python
 from transformers import pipeline
-olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-hf")
+olmo_pipe = pipeline("text-generation", model="allenai/OLMo-1.7-7B-hf")
 print(olmo_pipe("Language modeling is"))
 ```
 
@@ -95,7 +95,7 @@ python scripts/convert_olmo_to_hf_new.py --input_dir /path/to/olmo/checkpoint --
 ### Quantization
 
 ```python
-olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-hf", torch_dtype=torch.float16, load_in_8bit=True)  # requires bitsandbytes
+olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf", torch_dtype=torch.float16, load_in_8bit=True)  # requires bitsandbytes
 ```
 
 The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as inputs.input_ids.to('cuda') to avoid potential issues.