[Enhancement] Use weighted ranking to cap refinement candidates (CF-931) #962

mohammedahmed18 · 2025-12-10T14:51:07Z

PR Type

Enhancement

Description

Rank refinements by runtime and diff
Add normalization and weighting utilities
Change refinement request payload to ints
Cap refinements to top 45% candidates

Diagram Walkthrough

flowchart LR
  A["Valid optimizations"] -- "compute diff, runtime" --> B["Normalize metrics"]
  B["Normalize metrics"] -- "apply weights (runtime,diff)" --> C["Score candidates"]
  C["Score candidates"] -- "select top 45% or <=2 all" --> D["Submit refinements"]
  D["Submit refinements"] -- "parallel futures" --> E["Queue refined candidates"]

File Walkthrough

Relevant files

Enhancement

aiservice.py `Humanize runtime fields; trim logs` codeflash/api/aiservice.py Send runtimes as humanized strings Remove extra debug/console logging	+2/-6
code_utils.py `Utilities for weighting and normalization` codeflash/code_utils/code_utils.py Add choose_weights helper Add normalize utility Add weighted score dictionary builder	+57/-0
models.py `Refiner request runtime type to int` codeflash/models/models.py Change refiner request runtimes to int	+2/-2
function_optimizer.py `Weighted, selective, parallel refinement flow` codeflash/optimization/function_optimizer.py Queue-based refinement selection by weighted score Build refiner requests locally with int runtimes Submit refinements selectively and in parallel Inject AIS client/executor into processor	+66/-50

Configuration changes

config_consts.py `Config for weighted refinement capping` codeflash/code_utils/config_consts.py Add refinement thresholds and weights Introduce top-N refinement ratio	+5/-0

github-actions · 2025-12-10T14:52:08Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue Variable name typo: 'top_indecies' appears to be a misspelling of 'top_indices'. While functional, this reduces readability and may cause confusion; consider renaming. top_indecies = sorted(score_dict, key=score_dict.get)[:top_n_candidates] for idx in top_indecies: data = self.all_refinements_data[idx] Type Mismatch Function 'create_score_dictionary_from_metrics' declares return type dict[int, int] but builds a dict of floats; adjust the return type to dict[int, float] or cast appropriately. def create_score_dictionary_from_metrics(weights: list[float], metrics: list[float]) -> dict[int, int]: """Combine multiple metrics into a single weighted score dictionary. Each metric is a list of values (smaller = better). The total score for each index is the weighted sum of its values across all metrics: score[index] = Σ (value weight) Args: weights: A list of weights, one per metric. Larger weight = more influence. metrics: Lists of values (one list per metric, aligned by index). Returns: A dictionary mapping each index to its combined weighted score. """ if len(weights) != len(metrics): raise ValueError("Number of weights must match number of metrics") combined: dict[int, float] = {} for weight, metric in zip(weights, metrics): for idx, value in enumerate(metric): combined[idx] = combined.get(idx, 0) + value weight return combined Edge Case Handling 'normalize' returns all zeros when values are constant; the ranking then ties and slicing top N may be arbitrary. Consider deterministic tie-breaking or ensuring at least one candidate is selected when TOP_N_REFINEMENTS rounds to 0 for small lists > REFINE_ALL_THRESHOLD. runtime_w, diff_w = REFINED_CANDIDATE_RANKING_WEIGHTS weights = choose_weights(runtime=runtime_w, diff=diff_w) runtime_norm = normalize(runtimes_list) diffs_norm = normalize(diff_lens_list) # the lower the better score_dict = create_score_dictionary_from_metrics(weights, runtime_norm, diffs_norm) top_n_candidates = int((TOP_N_REFINEMENTS * len(runtimes_list)) + 0.5) top_indecies = sorted(score_dict, key=score_dict.get)[:top_n_candidates] for idx in top_indecies: data = self.all_refinements_data[idx]

github-actions · 2025-12-10T14:52:37Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix score dict typing and validation The return type is declared as `dict[int, int]` but the values are floats from weighted sums, which can cause type misuse. Also validate that all metric lists have equal length to prevent silent index skew. Update the annotation to `dict[int, float]` and add a uniform-length check. codeflash/code_utils/code_utils.py [72-98] -def create_score_dictionary_from_metrics(weights: list[float], metrics: list[float]) -> dict[int, int]: - """Combine multiple metrics into a single weighted score dictionary. - - Each metric is a list of values (smaller = better). - The total score for each index is the weighted sum of its values - across all metrics: - - score[index] = Σ (value weight) - - Args: - weights: A list of weights, one per metric. Larger weight = more influence. - metrics: Lists of values (one list per metric, aligned by index). - - Returns: - A dictionary mapping each index to its combined weighted score. - - """ +def create_score_dictionary_from_metrics(weights: list[float], metrics: list[float]) -> dict[int, float]: + """Combine multiple metrics into a single weighted score dictionary.""" if len(weights) != len(metrics): raise ValueError("Number of weights must match number of metrics") + lengths = {len(m) for m in metrics} + if len(lengths) > 1: + raise ValueError("All metric lists must have the same length") combined: dict[int, float] = {} - for weight, metric in zip(weights, metrics): for idx, value in enumerate(metric): - combined[idx] = combined.get(idx, 0) + value * weight - + combined[idx] = combined.get(idx, 0.0) + value * weight return combined Suggestion importance[1-10]: 7 __ Why: Correctly identifies a typing mismatch (`dict[int, int]` vs float values) and adds a sensible length validation; both improve correctness and maintainability without altering logic.	Medium
Possible issue	Ensure at least one refinement When `TOP_N_REFINEMENTS` yields 0 (e.g., small lists or low fraction), no candidates are refined, stalling refinement. Ensure at least one candidate is selected when there are items, and correct the typo in `top_indecies` to avoid confusion. Clamp the count to `[1, len(...)]`. codeflash/optimization/function_optimizer.py [205-211] top_n_candidates = int((TOP_N_REFINEMENTS * len(runtimes_list)) + 0.5) -top_indecies = sorted(score_dict, key=score_dict.get)[:top_n_candidates] +if len(runtimes_list) > 0: + top_n_candidates = max(1, min(top_n_candidates, len(runtimes_list))) +top_indices = sorted(score_dict, key=score_dict.get)[:top_n_candidates] -for idx in top_indecies: +for idx in top_indices: data = self.all_refinements_data[idx] future_refinements.append(self.refine_optimizations([data])) Suggestion importance[1-10]: 6 __ Why: Clamping the number of selected candidates avoids a potential no-op when rounding yields 0 and fixes a typo; useful but not critical since a low TOP_N_REFINEMENTS might be intentional.	Low
General	Guard empty metric lists Lower normalized values are better, but higher runtime/diff indicate worse candidates. If lower is better, this line is fine; however confirm normalization semantics by explicitly documenting and guarding against negative or empty lists. Add an early return when lists are empty to avoid downstream errors. codeflash/optimization/function_optimizer.py [201-206] -runtime_norm = normalize(runtimes_list) -diffs_norm = normalize(diff_lens_list) -# the lower the better +if not runtimes_list or not diff_lens_list: + self.refinement_done = True + return self.get_next_candidate() + +runtime_norm = normalize(runtimes_list) # lower is better +diffs_norm = normalize(diff_lens_list) # lower is better score_dict = create_score_dictionary_from_metrics(weights, runtime_norm, diffs_norm) Suggestion importance[1-10]: 5 __ Why: Adding an explicit empty-list guard improves robustness and aligns with later use; moderate impact as upstream logic likely ensures non-empty inputs, but the check is harmless and clarifies intent.	Low

codeflash-ai · 2025-12-10T14:56:44Z

⚡️ Codeflash found optimizations for this PR

📄 115% (1.15x) speedup for `AiServiceClient.optimize_python_code_refinement` in `codeflash/api/aiservice.py`

⏱️ Runtime : 32.8 milliseconds → 15.3 milliseconds (best of 65 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method AiServiceClient.optimize_python_code_refinement by 115% in PR #962 (limit-refined-candidates) #963

If you approve, it will be merged into this PR (branch limit-refined-candidates).

KRRT7

I really like this, just implement the changes that the PR review bot gave you

KRRT7 · 2025-12-12T16:36:47Z

I suspect this will help with the long runtimes of our tracer-replay as well now that I think of it

limit the refined candidates based on the weighted ranking

e895c52

mohammedahmed18 requested review from KRRT7, aseembits93 and misrasaurabh1 and removed request for KRRT7 and aseembits93 December 10, 2025 14:51

github-actions bot added the Review effort 3/5 label Dec 10, 2025

codeflash-ai bot mentioned this pull request Dec 10, 2025

⚡️ Speed up method AiServiceClient.optimize_python_code_refinement by 115% in PR #962 (limit-refined-candidates) #963

Open

KRRT7 requested changes Dec 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Use weighted ranking to cap refinement candidates (CF-931) #962

[Enhancement] Use weighted ranking to cap refinement candidates (CF-931) #962

Uh oh!

mohammedahmed18 commented Dec 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

codeflash-ai bot commented Dec 10, 2025

⚡️ Speed up method `AiServiceClient.optimize_python_code_refinement` by 115% in PR #962 (`limit-refined-candidates`) #963

Uh oh!

KRRT7 left a comment

Uh oh!

KRRT7 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Enhancement] Use weighted ranking to cap refinement candidates (CF-931) #962

Are you sure you want to change the base?

[Enhancement] Use weighted ranking to cap refinement candidates (CF-931) #962

Uh oh!

Conversation

mohammedahmed18 commented Dec 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Dec 10, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Dec 10, 2025

PR Code Suggestions ✨

Uh oh!

codeflash-ai bot commented Dec 10, 2025

⚡️ Codeflash found optimizations for this PR

📄 115% (1.15x) speedup for AiServiceClient.optimize_python_code_refinement in codeflash/api/aiservice.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method AiServiceClient.optimize_python_code_refinement by 115% in PR #962 (limit-refined-candidates) #963

Uh oh!

KRRT7 left a comment

Choose a reason for hiding this comment

Uh oh!

KRRT7 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mohammedahmed18 commented Dec 10, 2025 •

edited by github-actions bot

Loading

📄 115% (1.15x) speedup for `AiServiceClient.optimize_python_code_refinement` in `codeflash/api/aiservice.py`

⚡️ Speed up method `AiServiceClient.optimize_python_code_refinement` by 115% in PR #962 (`limit-refined-candidates`) #963