Skip to content

Fix/cli init providers and argo url#199

Open
hshang315 wants to merge 333 commits intomainfrom
fix/cli-init-providers-and-argo-url
Open

Fix/cli init providers and argo url#199
hshang315 wants to merge 333 commits intomainfrom
fix/cli-init-providers-and-argo-url

Conversation

@hshang315
Copy link
Copy Markdown
Collaborator

Pull Request: fix(cli): load providers from ProviderRegistry and restore code generators

Branch: fix/cli-init-providers-and-argo-url -> next
Repository: als-apg/osprey
PR Link: https://github.com/als-apg/osprey/pull/new/fix/cli-init-providers-and-argo-url


Summary

  • Fix osprey interactive init failing at provider and code generator selection steps
  • get_provider_metadata() now reads from ProviderRegistry (the actual source of truth) instead of the empty config.providers list
  • get_code_generator_metadata() restored with basic and claude_code entries
  • Updated Argo provider default_base_url to https://apps.inside.anl.gov/argoapi/v1

Problem

Running osprey (interactive project creation) fails with two errors that block project creation:

! No providers could be loaded from osprey registry
✗ No code generators available
! Osprey could not load any code generators.
Check that osprey is properly installed: uv sync --all-extras

The init wizard aborts at Step 5 (Code Generator), preventing any project from being created.

Root Cause

1. Providers not loading

The next branch moved provider definitions from the registry config to a standalone ProviderRegistry in osprey.models.provider_registry. The CLI function get_provider_metadata() in interactive_menu.py still read from the old location (config.providers), which is now an empty list [].

2. Code generators not loading

The function get_code_generator_metadata() in interactive_menu.py had its implementation replaced with a hardcoded empty dict:

# Code generators were removed from the registry
generators = {}

However, the scaffolding code in scaffolding.py still expects "basic" and "claude_code" as valid generator values.

3. Argo base URL outdated

The Argo provider's default base URL (https://argo-bridge.cels.anl.gov) is no longer the active endpoint.

Changes

src/osprey/cli/interactive_menu.py

get_provider_metadata() (~line 326):

Before:

from osprey.registry.builtins import FrameworkRegistryProvider
framework_registry = FrameworkRegistryProvider()
config = framework_registry.get_registry_config()
for provider_reg in config.providers:  # empty list
    module = importlib.import_module(provider_reg.module_path)
    provider_class = getattr(module, provider_reg.class_name)
    ...

After:

from osprey.models.provider_registry import get_provider_registry
pr = get_provider_registry()
for provider_name in pr.list_providers():
    provider_class = pr.get_provider(provider_name)
    ...

get_code_generator_metadata() (~line 423):

Replaced generators = {} with entries for the two generators the scaffolding code expects:

  • basic -- always available (built-in single-pass LLM generator)
  • claude_code -- available when claude-agent-sdk is installed (checked via importlib)

src/osprey/models/providers/argo.py

Updated the base URL in two places:

Location Old New
Line 73 (fallback in _execute_argo_structured_output()) https://argo-bridge.cels.anl.gov https://apps.inside.anl.gov/argoapi/v1
Line 145 (default_base_url class attribute) https://argo-bridge.cels.anl.gov https://apps.inside.anl.gov/argoapi/v1

Verification

source /home/oxygen/SHANG/next_osprey/.venv/bin/activate
python3 -c "
from osprey.cli.interactive_menu import get_provider_metadata, get_code_generator_metadata

providers = get_provider_metadata()
print(f'Providers loaded: {len(providers)}')
for name in sorted(providers):
    print(f'  {name}')

generators = get_code_generator_metadata()
print(f'Code generators loaded: {len(generators)}')
for name, meta in sorted(generators.items()):
    print(f'  {name}: available={meta[\"available\"]}')
"

Expected output:

Providers loaded: 11
  als-apg
  amsc
  anthropic
  argo
  asksage
  cborg
  google
  ollama
  openai
  stanford
  vllm
Code generators loaded: 2
  basic: available=True
  claude_code: available=True

Test Plan

  • Run osprey and verify all 11 providers appear at Step 6 (Provider Selection)
  • Verify both code generators (basic, claude_code) appear at Step 5 (Code Generator)
  • Complete a full osprey init project creation flow end-to-end
  • Verify Argo provider health check uses the new base URL
  • Run unit tests: pytest tests/ --ignore=tests/e2e -v

thellert added 30 commits March 22, 2026 09:06
When re-creating a project at the same path, Claude Code remembers
the previous trust decision from ~/.claude/projects/<path>/.
Remove that cached state on --force so the trust prompt appears
again on first launch.
The trust decision ("hasTrustDialogAccepted") lives in
~/.claude.json → projects.<path>, not in ~/.claude/projects/.
Now --force removes the entry from both locations so the trust
prompt reappears on next Claude Code launch.
Users often rm -rf the project before re-running osprey init, so the
--force cleanup path is never reached. Move trust/session state
cleanup to run unconditionally on every osprey init.
…r chips

Replace the single-select sector dropdown with inline toggleable chip
controls for both sectors and devices. Chips are AND-combined across
dimensions (sectors union, devices union, then intersected). Includes
all/none actions per row and preserves field group collapse state
across filter toggles.
Demote INFO→DEBUG so internal yaml mutation details don't leak
into the otherwise clean Rich console output.
thumbnailHtml() was simplified to only show images and bare icons,
leaving existing iframe/placeholder/error CSS orphaned. Restore iframe
previews for HTML artifacts and notebooks, summary text for data
artifacts, styled placeholders, and image error handling.
regenerate_claude_code() used raw yaml.safe_load() without expanding
${VAR:-default} patterns, so timezone.md rendered the literal string
instead of the resolved value. Extract resolve_env_vars() as a public
function in config.py and apply it after loading config in templates.py.
Claude Code's own directory structure already isolates sessions per
project, making the .sessions.json whitelist redundant. Removing it
lets CLI-created sessions appear immediately and eliminates registration
race conditions.
… resolution

All 7 hooks now use hook_input["cwd"] for project directory (replacing
inconsistent env var fallback chains). New osprey_hook_log.py provides
get_hook_input(), get_project_dir(), and OSPREY_HOOK_DEBUG-gated log_hook().
log_hook() now prints to stderr instead of writing to
data/hooks/activity.log — simpler, no file management needed.
The env var was never reaching hooks because (1) OSPREY_CONFIG was not
set before lifespan config reads, causing CWD-dependent cache misses,
and (2) four layers of silent `except: pass` hid every failure.

- Set OSPREY_CONFIG early in lifespan + reset stale config cache
- Replace silent error swallowing with logged warnings in app.py and
  operator_session.py
- Add config.yml fallback in osprey_hook_log._is_debug_enabled() so
  hooks work even if env var propagation breaks
- Default hooks.debug to true in template config
- Add 16 tests covering the full propagation chain
Add agentsview (Go binary) as a new "SESSION ANALYTICS" tab in the web
terminal, following the same iframe panel pattern as Artifacts, ARIEL,
and Monitoring. Auto-launches on `osprey web` if installed, degrades
gracefully if not.
MCP servers called initialize_registry() on startup, loading LangGraph-era
components (capabilities, services, approval manager, prompt providers) that
no MCP tool uses at runtime. Remove the call and add startup timing
instrumentation. Saves ~3.3s per server process (~23s cumulative across 7
servers).
Send theme:set alongside osprey-theme-change so agentsview can
switch themes instantly via postMessage instead of iframe reload.
Remove session-analytics from CROSS_ORIGIN_PANELS reload list.
Keep Claude processes alive in the background when switching sessions,
enabling near-instant reattach for warm sessions. Adds LRU pool
semantics to PtyRegistry with configurable max_background_sessions
(default 5), and a switch_session WebSocket message so the frontend
never closes/reopens the connection on switch.

- PtyRegistry: OrderedDict + attached set, get_or_create_session,
  attach/detach/rekey_session, LRU eviction
- routes.py: extract _run_output_loop, parameterized _discover_and_notify,
  switch_session handler, finally detaches instead of terminating
- api.js: setUrl() on createWebSocket for reconnect URL updates
- terminal.js: switchSession() export, session_switched/error handlers
- sessions.js: fast path via switchSession with cold fallback
- 13 new unit tests for pool behavior
…ration

Add PROVIDER_API_KEYS canonical dict to provider_registry.py and replace
4 drifted inline lists across CLI modules. Extract register_builtin_connectors()
in connectors/factory.py so MCP registry delegates instead of duplicating.
zhe-slac and others added 27 commits March 26, 2026 14:56
Restructure around safety chain as the central concept. Add hook chain
diagram, build/deploy section, expanded layers table (5 → 7 layers),
and updated data flow showing PreToolUse hooks in the sequence diagram.
Remove the per-tool hook matrix table.
…kflow

Add Claude Code CLI and Node.js prerequisites, PyPI install option,
updated templates list, agent launch instructions (direct/managed/web),
MCP server overview dropdown, and revised troubleshooting. Reframe
container runtime as optional and remove placeholder section.
…ata directory

Replace all references to osprey-workspace with _agent_data across the
entire codebase. Config key changed from workspace.base_dir to
agent_data.base_dir. resolve_workspace_root() renamed to
resolve_agent_data_root() with backward-compatible alias.
Replace the deleted hello_world_weather template with a new hello_world
template that introduces Claude Code + MCP with a mock control system.
Rewrite the tutorial from scratch, removing 4 stale LangGraph PLACEHOLDERs
and documenting the current architecture. Update all cross-references in
docs, CLI help text, and CI workflow.
Missed from 9c180b9 — update README quick-start to use current CLI
commands, remove stale PLACEHOLDERs and deleted servers (AccelPapers,
MATLAB, graph tools) from MCP servers docs, add entry_publish tool.
Remove references to non-existent CLI commands (osprey tasks,
osprey claude install) and directories (.ai-tasks/) from both
the Sphinx contributing page and CONTRIBUTING.md.
The ARIEL how-to docs described a single ariel_search tool inside the
control_system MCP server, but the actual implementation is a dedicated
MCP server (osprey.mcp_server.ariel) with 11 specialized tools. Also
contained stale LangGraph/LangChain references and wrong threshold values.

- Remove 2 PLACEHOLDER: CONCEPTUAL-MAPPING admonitions
- Rewrite osprey-integration.rst with correct MCP architecture and tool table
- Delete fabricated error classification section (classify_error doesn't exist)
- Fix similarity threshold 0.7 → 0.5 to match DEFAULT_SIMILARITY_THRESHOLD
- Replace LangGraph/LangChain refs with actual implementation (custom async ReAct loop)
- Fix citation parsing description ([#id] regex → entry_id substring matching)
- Add --mode auto CLI documentation to search-modes.rst
- Add all 12 osprey ariel CLI subcommands to index.rst
- Fix logbook_search capability reference in web-interface.rst
…ural sections

- Replace unrealistic "set beam current to 500 mA" scenario with corrector
  magnet bump workflow using realistic PV names
- Add Channel Finder sub-agent step showing address resolution before read
- Remove Build & Deploy section (operational, not architectural)
- Remove Layers table (stale directory listing)
- Remove Runtime API section (better suited for API reference)
…sions

- Fix str/bytes TypeError in lifecycle timeout handler: Python's
  subprocess.TimeoutExpired stores raw bytes in stdout/stderr even
  with text=True, causing a crash when concatenating with fallback str
- Add bundled extension support to duckdb_import: prefer local FTS
  extension file at data/duckdb_extensions/ before attempting download,
  with HTTP_PROXY fallback for proxy-restricted environments
- build_cmd: add --skip-lifecycle to skip pre_build, post_build, and
  validate phases (needed for CI where no container runtime exists)
- channel_finder: register query_channels tool, add duckdb_path config
  property for DuckDB-backed channel search
Skips venv creation and dependency installation when building in CI
where OSPREY and deps are pre-installed in the container image.
POST /api/chat with SSE streaming (default) or buffered JSON response.
Creates an ephemeral OperatorSession per request, reusing the existing
operator mode infrastructure.
McpServerDef now accepts an optional `url` field for HTTP/SSE MCP
servers. Profile YAML can specify either `command` (stdio) or `url`
(HTTP) per server — validated as mutually exclusive at parse time.
_inject_mcp_servers() emits {"type": "sse", "url": "..."} entries
in .mcp.json for URL-based servers.
…n-use check

Karma (claude-code-karma) was a speculative external dependency that was
never published to PyPI. Its auto-launch on every `osprey web` start
produced a noisy ERROR traceback followed by misleading "launched" and
"available" log messages. Remove all references across registry, launcher,
web terminal (app, routes, JS), config templates, pyproject.toml, docs,
and tests.

Also add a port pre-flight check in `osprey web` foreground mode so
stale processes on port 8087 produce an actionable error message instead
of a raw uvicorn crash.
… __init_subclass__

Add automatic writes_enabled pre-check to all connector subclasses through
ControlSystemConnector.__init_subclass__ wrapping. This ensures the safety
invariant holds even when the PreToolUse hook chain is bypassed (e.g.,
approved readwrite Python subprocesses reaching the connector directly).

- Add _writes_enabled property to base class (reads from global config,
  defaults to False/fail-safe)
- Add __init_subclass__ that wraps any subclass write_channel() with a
  guard returning ChannelWriteResult(success=False) when writes disabled
- Remove MockConnector's redundant _enable_writes logic (now handled by
  base class)
- Fix MockDynamicConnector to return proper ChannelWriteResult
- Update sibling tests to mock get_config_value instead of config dict
- Add 13 new tests across 4 test classes in test_writes_enabled.py
The web terminal loaded xterm.js, highlight.js, marked.js, and Google
Fonts from cdn.jsdelivr.net, which is blocked by restrictive proxies
(e.g., ALS squid returns 403). Bundle all vendor JS/CSS/fonts into
static/vendor/ so the terminal works without external CDN access.
…ators

The interactive init wizard failed with "No providers could be loaded" and
"No code generators available" because:

1. get_provider_metadata() read from config.providers (empty after providers
   moved to provider_registry.py). Now uses ProviderRegistry directly.

2. get_code_generator_metadata() had generators hardcoded to {} after being
   removed from the registry. Restored with "basic" (always available) and
   "claude_code" (available when claude-agent-sdk is installed).

Also updates the Argo provider default_base_url to the new endpoint
https://apps.inside.anl.gov/argoapi/v1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
}

// ---- Drag-and-drop wiring across columns ----
_wireDragAndDrop(columns, colMap, markDirty, container);
Comment on lines +758 to +762
fetchJSON('/api/prompts/create', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ category, name: sanitized }),
})
import { initDashboard, loadStatus, startAutoRefresh, stopAutoRefresh } from './dashboard.js';
import { initAdvancedOptions } from './advanced-options.js';
import { initDrawers } from './drawer.js';
import { initSettings, loadConfig } from './settings.js';
import { initDashboard, loadStatus, startAutoRefresh, stopAutoRefresh } from './dashboard.js';
import { initAdvancedOptions } from './advanced-options.js';
import { initDrawers } from './drawer.js';
import { initSettings, loadConfig } from './settings.js';
import { loadFileList } from './claude-setup.js';
return a.timestamp && a.timestamp >= _sessionStart;
}

function sendToTerminal(text) {
*/

import { fetchJSON } from './api.js';
import { state } from './state.js';
if (!chartResp.ok) throw new Error(`Chart fetch failed: ${chartResp.status}`);
const chartData = await chartResp.json();
const columns = chartData.columns || [];
const summary = artifact.metadata || {};
@@ -0,0 +1,327 @@
/* OSPREY Web Terminal — Application Entry Point */

import { initTerminal, fitTerminal, focusTerminal, getTerminalDimensions, stopTerminal, startTerminal, restartTerminal, pasteToTerminal } from './terminal.js';

// ---- Empty State ----

function renderEmptyState(message) {
const aid = agent.agent_id || 'main';
const isRoot = !agent.agent_id || agent.agent_id === 'main';
const name = isRoot ? 'ROOT SESSION' : (agent.agent_type || aid);
const borderColor = isRoot ? 'var(--accent)' : `var(--srv, var(--accent))`;
assert resp.status_code == 404

def test_delete_returns_404(self, client):
assert client.delete("/api/feedback/somekey").status_code == 404
assert client.delete("/api/feedback/somekey").status_code == 404

def test_clear_returns_404(self, client):
assert client.delete("/api/feedback?confirm=true").status_code == 404

def test_delete_existing_item(store):
item_id = store.capture({"query": "magnets", "facility": "ALS"})
assert store.delete(item_id) is True


def test_delete_missing_returns_false(store):
assert store.delete("nonexistent") is False
assert resp.status_code == 200
assert resp.json()["active_panel"] is None

def test_set_panel_focus_artifacts(self, client):

async def mock_exec(*args, **kwargs):
if args[0] == "screencapture":
open(args[-1], "wb").write(b"PNG_DATA")
async def mock_exec(*args, **kwargs):
captured_args.append(args)
if args[0] == "screencapture":
open(filepath, "wb").write(b"PNG_DATA")
async def mock_exec(*args, **kwargs):
captured_args.append(args)
if args[0] == "screencapture":
open(filepath, "wb").write(b"PNG_DATA")
@cr-xu
Copy link
Copy Markdown
Collaborator

cr-xu commented Apr 9, 2026

@hshang315 I think this PR should target the next branch, not the main. Main branch is still the old backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants