@@ -56,12 +56,23 @@ before_run:
5656 - cargo build
5757 - npm install
5858
59- # Required: Command to execute for each test case
59+ # Required: Command(s) to execute for each test case
60+ # Single command
61+ run : ../../target/debug/forge -p '{{prompt}}'
62+
63+ # Or multiple commands (executed sequentially)
6064run :
61- command : ../../target/debug/forge -p '{{prompt}}'
62- parallelism : 10 # Number of tasks to run in parallel (default: 1)
63- timeout : 60 # Timeout in seconds (optional)
64- cwd : /path/to/working/dir # Working directory (optional)
65+ - echo "Step 1 : {{task}}"
66+ - ../../target/debug/forge -p '{{prompt}}'
67+ - echo "Step 2 : Complete"
68+
69+ # Execution configuration
70+ parallelism : 10 # Number of tasks to run in parallel (default: 1)
71+ timeout : 60 # Timeout in seconds (optional)
72+ early_exit : true # Stop execution when validations pass (optional)
73+
74+ # Optional: Working directory for command execution
75+ cwd : /path/to/working/dir # Defaults to parent directory of eval
6576
6677# Optional: Validations to run on output
6778validations :
@@ -77,14 +88,26 @@ sources:
7788#### Task File Schema
7889
7990**` before_run`** (optional): Array of shell commands to execute before running tasks
80- - Runs sequentially in the parent directory of the eval
91+ - Runs sequentially before the main command execution
92+ - Uses the same working directory as specified in `cwd` (defaults to parent directory of eval)
8193- Useful for building binaries or setting up dependencies
8294
83- **`run`** (required): Configuration for task execution
84- - `command` : Command template with placeholders (e.g., `{{variable}}`)
85- - `parallelism` : Number of tasks to run concurrently (default: 1)
86- - `timeout` : Maximum execution time in seconds per task (optional)
87- - `cwd` : Working directory for command execution (optional, defaults to parent directory of eval)
95+ **`run`** (required): Command(s) to execute for each test case
96+ - Can be a single string or an array of strings
97+ - Commands support template placeholders (e.g., `{{variable}}`)
98+ - Multiple commands are executed sequentially
99+ - If any command fails, subsequent commands are skipped
100+
101+ **`parallelism`** (optional): Number of tasks to run concurrently (default: 1)
102+
103+ **`timeout`** (optional): Maximum execution time in seconds per task
104+
105+ **`early_exit`** (optional): Stop command execution when all validations pass
106+
107+ **`cwd`** (optional): Working directory for command execution
108+ - Defaults to parent directory of eval
109+ - Applies to both `before_run` commands and the main `run` command
110+ - All commands within the task will run in this directory
88111
89112**`validations`** (optional): Array of validation rules
90113- `name` : Human-readable description
@@ -181,8 +204,7 @@ LOG_LEVEL=debug npm run eval ./evals/my_eval/task.yml
181204# ## Example 1: Simple Sequential Execution
182205
183206` ` ` yaml
184- run:
185- command: echo "Processing {{name}}"
207+ run: echo "Processing {{name}}"
186208sources:
187209 - csv: names.csv
188210` ` `
@@ -197,20 +219,31 @@ Charlie
197219# ## Example 2: Parallel Execution with Timeout
198220
199221` ` ` yaml
200- run:
201- command: ./slow_task --id {{task_id}}
202- parallelism: 5
203- timeout: 30
222+ run: ./slow_task --id {{task_id}}
223+ parallelism: 5
224+ timeout: 30
204225sources:
205226 - csv: tasks.csv
206227` ` `
207228
208- # ## Example 3: Shell Command Validation
229+ # ## Example 3: Multiple Commands
209230
210231` ` ` yaml
211232run:
212- command: echo "{{message}}"
213- parallelism: 3
233+ - echo "Starting task {{id}}"
234+ - ./process --input {{file}}
235+ - echo "Task {{id}} complete"
236+ parallelism: 3
237+ timeout: 120
238+ sources:
239+ - csv: tasks.csv
240+ ` ` `
241+
242+ # ## Example 4: Shell Command Validation
243+
244+ ` ` ` yaml
245+ run: echo "{{message}}"
246+ parallelism: 3
214247validations:
215248 # Using grep to check if output contains specific text
216249 - name: "Contains 'test' word"
@@ -232,11 +265,10 @@ sources:
232265 - csv: messages.csv
233266` ` `
234267
235- # ## Example 4 : Regex Validation
268+ # ## Example 5 : Regex Validation
236269
237270` ` ` yaml
238- run:
239- command: cargo test {{test_name}}
271+ run: cargo test {{test_name}}
240272validations:
241273 - name: "All tests passed"
242274 type: regex
@@ -260,14 +292,12 @@ sources:
260292
2612933. **Start with low parallelism** : Test with `parallelism: 1` first, then increase:
262294 ` ` ` yaml
263- run:
264- parallelism: 1 # Start here
295+ parallelism: 1 # Start here
265296 ` ` `
266297
2672984. **Set appropriate timeouts** : Add timeouts to prevent hanging:
268299 ` ` ` yaml
269- run:
270- timeout: 60 # seconds
300+ timeout: 60 # seconds
271301 ` ` `
272302
2733035. **Check debug logs** : When tasks fail, check the debug directory for full output:
0 commit comments