Skip to content

Commit 2ae0d2f

Browse files
laststylebender14autofix-ci[bot]tusharmathamitksingh1490forge-code-agent
authored
refactor: allow multiple queries in sem search tool (#2001)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Tushar Mathur <tusharmath@gmail.com> Co-authored-by: Amit Singh <amitksingh1490@gmail.com> Co-authored-by: ForgeCode <noreply@forgecode.dev>
1 parent 32cafd5 commit 2ae0d2f

26 files changed

Lines changed: 487 additions & 308 deletions

Cargo.lock

Lines changed: 10 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

benchmarks/README.md

Lines changed: 57 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -56,12 +56,23 @@ before_run:
5656
- cargo build
5757
- npm install
5858

59-
# Required: Command to execute for each test case
59+
# Required: Command(s) to execute for each test case
60+
# Single command
61+
run: ../../target/debug/forge -p '{{prompt}}'
62+
63+
# Or multiple commands (executed sequentially)
6064
run:
61-
command: ../../target/debug/forge -p '{{prompt}}'
62-
parallelism: 10 # Number of tasks to run in parallel (default: 1)
63-
timeout: 60 # Timeout in seconds (optional)
64-
cwd: /path/to/working/dir # Working directory (optional)
65+
- echo "Step 1: {{task}}"
66+
- ../../target/debug/forge -p '{{prompt}}'
67+
- echo "Step 2: Complete"
68+
69+
# Execution configuration
70+
parallelism: 10 # Number of tasks to run in parallel (default: 1)
71+
timeout: 60 # Timeout in seconds (optional)
72+
early_exit: true # Stop execution when validations pass (optional)
73+
74+
# Optional: Working directory for command execution
75+
cwd: /path/to/working/dir # Defaults to parent directory of eval
6576

6677
# Optional: Validations to run on output
6778
validations:
@@ -77,14 +88,26 @@ sources:
7788
#### Task File Schema
7889
7990
**`before_run`** (optional): Array of shell commands to execute before running tasks
80-
- Runs sequentially in the parent directory of the eval
91+
- Runs sequentially before the main command execution
92+
- Uses the same working directory as specified in `cwd` (defaults to parent directory of eval)
8193
- Useful for building binaries or setting up dependencies
8294

83-
**`run`** (required): Configuration for task execution
84-
- `command`: Command template with placeholders (e.g., `{{variable}}`)
85-
- `parallelism`: Number of tasks to run concurrently (default: 1)
86-
- `timeout`: Maximum execution time in seconds per task (optional)
87-
- `cwd`: Working directory for command execution (optional, defaults to parent directory of eval)
95+
**`run`** (required): Command(s) to execute for each test case
96+
- Can be a single string or an array of strings
97+
- Commands support template placeholders (e.g., `{{variable}}`)
98+
- Multiple commands are executed sequentially
99+
- If any command fails, subsequent commands are skipped
100+
101+
**`parallelism`** (optional): Number of tasks to run concurrently (default: 1)
102+
103+
**`timeout`** (optional): Maximum execution time in seconds per task
104+
105+
**`early_exit`** (optional): Stop command execution when all validations pass
106+
107+
**`cwd`** (optional): Working directory for command execution
108+
- Defaults to parent directory of eval
109+
- Applies to both `before_run` commands and the main `run` command
110+
- All commands within the task will run in this directory
88111

89112
**`validations`** (optional): Array of validation rules
90113
- `name`: Human-readable description
@@ -181,8 +204,7 @@ LOG_LEVEL=debug npm run eval ./evals/my_eval/task.yml
181204
### Example 1: Simple Sequential Execution
182205

183206
```yaml
184-
run:
185-
command: echo "Processing {{name}}"
207+
run: echo "Processing {{name}}"
186208
sources:
187209
- csv: names.csv
188210
```
@@ -197,20 +219,31 @@ Charlie
197219
### Example 2: Parallel Execution with Timeout
198220

199221
```yaml
200-
run:
201-
command: ./slow_task --id {{task_id}}
202-
parallelism: 5
203-
timeout: 30
222+
run: ./slow_task --id {{task_id}}
223+
parallelism: 5
224+
timeout: 30
204225
sources:
205226
- csv: tasks.csv
206227
```
207228

208-
### Example 3: Shell Command Validation
229+
### Example 3: Multiple Commands
209230

210231
```yaml
211232
run:
212-
command: echo "{{message}}"
213-
parallelism: 3
233+
- echo "Starting task {{id}}"
234+
- ./process --input {{file}}
235+
- echo "Task {{id}} complete"
236+
parallelism: 3
237+
timeout: 120
238+
sources:
239+
- csv: tasks.csv
240+
```
241+
242+
### Example 4: Shell Command Validation
243+
244+
```yaml
245+
run: echo "{{message}}"
246+
parallelism: 3
214247
validations:
215248
# Using grep to check if output contains specific text
216249
- name: "Contains 'test' word"
@@ -232,11 +265,10 @@ sources:
232265
- csv: messages.csv
233266
```
234267

235-
### Example 4: Regex Validation
268+
### Example 5: Regex Validation
236269

237270
```yaml
238-
run:
239-
command: cargo test {{test_name}}
271+
run: cargo test {{test_name}}
240272
validations:
241273
- name: "All tests passed"
242274
type: regex
@@ -260,14 +292,12 @@ sources:
260292

261293
3. **Start with low parallelism**: Test with `parallelism: 1` first, then increase:
262294
```yaml
263-
run:
264-
parallelism: 1 # Start here
295+
parallelism: 1 # Start here
265296
```
266297

267298
4. **Set appropriate timeouts**: Add timeouts to prevent hanging:
268299
```yaml
269-
run:
270-
timeout: 60 # seconds
300+
timeout: 60 # seconds
271301
```
272302

273303
5. **Check debug logs**: When tasks fail, check the debug directory for full output:

0 commit comments

Comments
 (0)