feat: import OpenClaw data (memory, history, settings, credentials)#388
feat: import OpenClaw data (memory, history, settings, credentials)#388jaswinder6991 wants to merge 1 commit intonearai:stagingfrom
Conversation
Summary of ChangesHello @jaswinder6991, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances IronClaw's capabilities by introducing a robust data migration path from OpenClaw, broadening its compatibility with various LLM providers, and modularizing its core agent architecture. It also expands the ecosystem with new WASM channels and tools, while bolstering security and observability features. These changes aim to make IronClaw more extensible, easier to deploy, and more resilient against runaway costs. Highlights
Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
…ion (nearai#58) Implements a CLI command and onboarding wizard step to import data from existing OpenClaw installations into IronClaw. Five-phase import pipeline: 1. Settings — maps OpenClaw config keys (llm_backend, model, embeddings) 2. Identity files — AGENTS.md, SOUL.md, USER.md, etc. 3. Memory documents — workspace/memory/*.md files 4. Conversations — JSONL session transcripts with idempotent dedup 5. Credentials — API keys and OAuth tokens (encrypted at rest) Features: - Auto-discovers ~/.openclaw (and legacy ~/.clawdbot, ~/.moldbot, ~/.moltbot) - Parses JSON5 config files (OpenClaw's format) - Dry-run mode (--dry-run) to preview without writing - Idempotent: safe to run multiple times (skips existing data) - Wizard integration: prompts during onboarding if OpenClaw detected - 18 unit tests covering discovery, config parsing, JSONL parsing Usage: ironclaw import openclaw [--path ~/.openclaw] [--dry-run] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d3a75ab to
fefa717
Compare
There was a problem hiding this comment.
Code Review
This is a massive and impressive pull request that introduces significant refactorings and new features across the entire codebase, greatly improving maintainability, robustness, and functionality. The agent loop has been skillfully broken down into more manageable modules, and new features like the skills system, cost guardrails, and parallel tool execution are excellent additions. The code quality is high, with notable improvements in error handling, concurrency, and API design. While some functions in dispatcher.rs and thread_ops.rs are quite large, suggestions for further structural refactoring are being deferred to follow-up tasks, aligning with the primary goal of this PR (as indicated by its title/description) to achieve correctness and feature parity. Please note that the PR title and description seem to be for a different change (importing OpenClaw data), which was a bit confusing during the review.
I am having trouble creating individual review comments. Click here to see my feedback.
src/agent/dispatcher.rs (37-671)
This function is indeed very long and complex, spanning over 600 lines. While the logic for handling the agentic loop with tool calls, approvals, and hooks is quite sophisticated, its size makes it difficult to follow and maintain. However, given that the primary goal of this pull request appears to be focused on achieving correctness and feature parity (as indicated by the PR title/description), large structural refactorings like breaking down this function should be deferred to a follow-up task to keep the current PR focused.
References
- Defer large structural refactorings, like breaking down a large file, when the primary goal of a pull request is to achieve correctness and feature parity with an existing implementation. The refactoring should be tracked as a follow-up task.
src/agent/thread_ops.rs (692-1091)
Similar to run_agentic_loop, this process_approval function is quite large (around 400 lines) and contains a lot of complex logic. While breaking this down would improve maintainability, given that the primary goal of this pull request appears to be focused on achieving correctness and feature parity (as indicated by the PR title/description), large structural refactorings should be deferred to a follow-up task to keep the current PR focused.
References
- Defer large structural refactorings, like breaking down a large file, when the primary goal of a pull request is to achieve correctness and feature parity with an existing implementation. The refactoring should be tracked as a follow-up task.
| with: | ||
| configuration-path: .github/labeler.yml | ||
| sync-labels: false # additive only — never remove scope labels | ||
| sync-labels: true # remove stale scope labels on force-push / new commits |
There was a problem hiding this comment.
This prob should be in a separate PR - this labels everything on your PR
56e77ce to
fefa717
Compare
zmanian
left a comment
There was a problem hiding this comment.
Review: Import OpenClaw Data (Memory, History, Settings, Credentials)
Well-structured PR with a clean 5-phase import pipeline (settings, identity files, memory documents, conversations, credentials). Discovery is separated from parsing, the importer is stateless, progress reporting is abstracted via a trait, and idempotency is well-handled (checks for existing items before writing, safe to run multiple times). Good unit test coverage for parsing and discovery. However, there are security issues that need fixing.
Blocker: CredentialEntry Derives Debug with Plaintext Values
CredentialEntry has #[derive(Debug)] with a pub value: String field that holds raw API keys and OAuth tokens. Any tracing::debug!("{:?}", config) or accidental dbg!() will dump plaintext secrets to logs. OpenClawConfig also derives Debug and holds Vec<CredentialEntry>, so logging the config leaks all credentials.
The rest of the codebase uses secrecy::SecretString for this purpose. Either:
- Use
SecretStringinstead ofStringfor thevaluefield - Or implement
Debugmanually to redact the value
Blocker: build_secrets_store Uses NoTls for PostgreSQL
The build_secrets_store function in main.rs creates a NoTls pool, bypassing the TLS support added in commit 1f2e8c3. Users with sslmode=require will silently get an unencrypted connection for the secrets store during import.
Fix: Use connect_from_config or replicate the TLS logic from Store::new().
High: json5 Adds 5 Transitive Dependencies (pest Ecosystem)
The json5 crate pulls in pest, pest_derive, pest_generator, pest_meta, and ucd-trie -- a full parser generator framework. This is a heavy dependency chain. Questions:
- Does OpenClaw actually use JSON5 features (comments, trailing commas) in config files? If most users have standard JSON,
serde_jsonsuffices. - If JSON5 is needed, consider
serde_jsonrc(lighter, no pest) or document the justification.
Medium
-
Credentials held as
Stringinstead ofSecretStringduring parsing: TheOpenClawConfigstruct holds all credentials as plaintext for the entire import duration. Read them intoSecretStringat parse time. -
Wizard
get_db_handleunexpanded tilde path: The fallback path"~/.ironclaw/ironclaw.db"contains a tilde whichstd::path::Pathdoes NOT expand. Usedefault_libsql_path()instead. -
No integration test for full pipeline: Per-phase logic is only tested via unit tests. Consider adding at least one integration test with an in-memory database.
What's Good
- Idempotent: Settings check
get_settingbefore write, identity files checkws.exists(), conversations use deterministic UUIDs via blake3 hash, credentials checkstore.exists() - Partial failure handling: Each phase continues on per-item errors, collecting them in
report.errors - Safe discovery: Only looks in
$HOMEfor well-known paths, no path traversal risk - 10K message safety limit per conversation session -- prevents absurdly large imports
- Both backends: Uses
Arc<dyn Database>+ trait abstractions, handles both postgres and libsql feature flags - Good test coverage: JSON5 parsing, OAuth parsing, provider mapping, discovery with mock dirs, JSONL parsing, deterministic UUID stability
Low Priority
- Deterministic UUID manually constructs v4 bits on blake3 -- could use
Uuid::new_v5(NAMESPACE_OID, key)since thev5feature is already enabled - Memory file discovery is only one level deep -- recursive scanning might be needed if OpenClaw stores nested files
- OAuth import has no token format validation
Summary
Fix the CredentialEntry debug leak and the NoTls TLS regression before merge. Justify or replace the json5 dependency. Core import architecture is solid.
Comprehensive analysis of review feedback on the OpenClaw data import PR, covering zmanian's security review (2 blockers, 1 high, 3 medium), ilblackdragon's scope feedback, and Gemini's automated review. https://claude.ai/code/session_01631kLAGRENE8zjw2Fi5Bfq
|
Closing as superseded by merged #903. The OpenClaw import path is already in staging/mainline, so this older draft is no longer the active merge path. |
Summary
Implements
ironclaw import openclawCLI command and onboarding wizard integration for migrating data from OpenClaw installations into IronClaw. Addresses #58.What it does
Five-phase import pipeline:
workspace/memory/*.md→ IronClaw workspaceauth.profiles+ OAuth tokens fromcredentials/oauth.json→ encrypted secrets storeKey features
~/.openclawand legacy dirs (~/.clawdbot,~/.moldbot,~/.moltbot)--dry-runmode to preview without writingUsage
Files changed (13 files, +1855 lines)
src/import/— new module:mod.rs,discovery.rs,config_parser.rs,openclaw.rs,progress.rssrc/cli/import.rs— CLI subcommand handlersrc/cli/mod.rs,src/lib.rs,src/main.rs— wiringsrc/db/postgres.rs— addedPgBackend::from_pool()for wizard reusesrc/setup/wizard.rs— OpenClaw detection step in onboardingCargo.toml— addedjson5 = "0.4"Questions for code owners
General direction — Does this approach look right? Any architectural concerns with the import module structure or the 5-phase pipeline? Would you prefer a different abstraction (e.g., a generic
Importertrait for future sources)?Tools/skills import — This PR covers memory, history, settings, and credentials. Should we also import OpenClaw's installed tools and skills (WASM extensions, MCP server configs, SKILL.md files)? OpenClaw has similar concepts — would be straightforward to add as Phase 6/7 if desired.
Wizard placement — The OpenClaw detection runs after the database setup step in the onboarding wizard. Is this the right spot, or should it be a separate post-wizard flow?
Looking for a general direction before polishing further.
Test plan
cargo fmt— cleancargo clippy --all --all-features— zero warningscargo test --lib -- import— 18/18 passingcargo check --no-default-features --features libsql— compilescargo check --no-default-features --features postgres— compiles🤖 Generated with Claude Code