[Usage]: How to perform right truncation on the input prompt if it is too long?

### Your current environment

```
My env is fine, so I did not put anything here.
```


### How would you like to use vllm

I want to run an inference of a [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).   I want to use LLama 3.1 for inference, and its context length is 128k. I use the following code for chat:

```
    sampling_params = SamplingParams(temperature=0.8,
                                     top_p=0.95,
                                     max_tokens=512)

    # Create an LLM.
    llm = LLM(model=''meta-llama/Meta-Llama-3.1-8B-Instruct'',
              quantization="fp8",
              task='generate',
              tensor_parallel_size=1,
              enforce_eager=True,
              enable_expert_parallel=False)
    outputs = llm.chat(rank_prompts, sampling_params, use_tqdm=True)
```

Sometimes my prompt is too long, causing an error. I want to ask if it's possible to set it to keep only the first k tokens of the prompt.

I found that the current vLLM settings for limiting the maximum input tokens seem to keep only the last k tokens, not the first k tokens. I noticed the following settings:

```
1. The `--max-model-len` parameter in the vLLM engine:
   Model context length. If unspecified, will be automatically derived from the model config. Supports k/m/g/K/M/G in human-readable format. Examples:
   1k → 1000
   1K → 1024
```
I'm not entirely sure how this method performs truncation.

```
2. The `truncate_prompt_tokens` parameter in Sampling Parameters:
   If set to an integer k, will use only the last k tokens from the prompt (i.e., left truncation). Defaults to None (i.e., no truncation).
```

Neither of these options seems to keep the first k tokens. I understand that left truncation (keeping the last k tokens) is a more common setting in LLMs. However, since my task description is at the beginning of the prompt, I would like to know if there is a parameter setting for the right truncation to keep the first k tokens. Thank you very much!


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Usage]: How to perform right truncation on the input prompt if it is too long? #17324

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: How to perform right truncation on the input prompt if it is too long? #17324

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions