Add automatic Yuanrong startup to interface.py by dpj135 · Pull Request #60 · Ascend/TransferQueue

dpj135 · 2026-03-26T07:40:44Z

Description

Add Yuanrong startup to init()
Add Yuanrong shutdown to close()

ascend-robot · 2026-03-26T07:40:57Z

CLA Signature Pass

dpj135, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-03-28T08:02:41Z

CLA Signature Pass

dpj135, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-03-28T08:59:57Z

CLA Signature Pass

dpj135, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-03-28T09:59:13Z

CLA Signature Pass

dpj135, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-03-28T10:13:21Z

CLA Signature Pass

dpj135, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Signed-off-by: dpj135 <958208521@qq.com>

ascend-robot · 2026-03-30T03:14:14Z

CLA Signature Pass

dpj135, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Copilot

Pull request overview

Adds automatic lifecycle management for the Yuanrong backend by starting required local services during init() and stopping/cleaning them up in close(), aligning Yuanrong with the existing “auto_init” behavior used for MooncakeStore.

Changes:

Add Yuanrong auto_init support to start local etcd and a datasystem worker (dscli start) during initialization.
Add Yuanrong shutdown logic to terminate etcd, remove its temporary data dir, and stop the worker via dscli stop.
Extend default config.yaml with Yuanrong auto_init, etcd_address, and client_name.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
`transfer_queue/interface.py`	Starts/stops Yuanrong dependencies (`etcd` + `dscli`) as part of TransferQueue init/close lifecycle.
`transfer_queue/config.yaml`	Adds Yuanrong auto-init configuration knobs and adjusts default Yuanrong config fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T03:36:55Z

transfer_queue/config.yaml

-    # If enable npu transport
-    enable_yr_npu_transport: false
+    # Client name (optional, default is YuanrongStorageClient)
+    client_name: "YuanrongStorageClient"


enable_yr_npu_transport was removed from the default Yuanrong config. In transfer_queue/storage/clients/yuanrong_client.py, the NPU adapter defaults enable_yr_npu_transport to True when the key is missing, so this change flips the default behavior from disabled (previously false) to enabled. Consider re-adding enable_yr_npu_transport with an explicit default (or change the code default) to avoid an unintended behavior change for existing users.

Suggested change

client_name: "YuanrongStorageClient"

client_name: "YuanrongStorageClient"

# Whether to enable Yuanrong NPU transport (default: False for backward compatibility)

enable_yr_npu_transport: False

Copilot · 2026-03-30T03:36:55Z

transfer_queue/interface.py

+                            try:
+                                etcd_process.wait(timeout=5)
+                            except subprocess.TimeoutExpired:
+                                etcd_process.kill()


If etcd_process.kill() is used after a timeout, the code should still call wait() to reap the child process; otherwise a zombie process can be left behind until interpreter exit. After kill(), call etcd_process.wait() (possibly in a nested try/except) to ensure the process is fully collected.

Suggested change

etcd_process.kill()

etcd_process.kill()

try:

# Ensure the killed process is fully reaped to avoid zombies

etcd_process.wait(timeout=5)

except subprocess.TimeoutExpired:

logger.warning("Timed out while waiting for etcd process to exit after kill.")

except Exception as e:

logger.warning(f"Error while waiting for etcd process to exit after kill: {e}")

Add etcd_process.wait()

Copilot · 2026-03-30T03:36:56Z

transfer_queue/interface.py

+                                subprocess.run(
+                                    ["dscli", "stop", "--worker_address", worker_address],
+                                    timeout=90,
+                                    capture_output=True,
+                                )
+                                logger.info(f"Stopped datasystem worker at {worker_address} via dscli stop")


dscli stop is run but its result is ignored, and an info log is emitted unconditionally. This can report a successful shutdown even if dscli fails (non-zero return code) or prints an error. Capture the CompletedProcess, check returncode (or use check=True), and log/raise accordingly (and include stdout/stderr in the error path).

Suggested change

subprocess.run(

["dscli", "stop", "--worker_address", worker_address],

timeout=90,

capture_output=True,

)

logger.info(f"Stopped datasystem worker at {worker_address} via dscli stop")

result = subprocess.run(

["dscli", "stop", "--worker_address", worker_address],

timeout=90,

capture_output=True,

text=True,

)

if result.returncode == 0:

logger.info(f"Stopped datasystem worker at {worker_address} via dscli stop")

else:

logger.error(

"Failed to stop datasystem worker at %s via dscli stop "

"(return code %s). stdout: %r stderr: %r",

worker_address,

result.returncode,

result.stdout,

result.stderr,

)

except subprocess.TimeoutExpired as e:

logger.error(

"Timed out while stopping datasystem worker at %s via dscli stop: %s",

worker_address,

e,

)

Copilot · 2026-03-30T03:36:56Z

transfer_queue/interface.py

+                except Exception as e:
+                    # Clean up on failure
+                    if etcd_process is not None and etcd_process.poll() is None:
+                        etcd_process.terminate()


On Yuanrong startup failure, the cleanup path terminates the etcd process but doesn't wait() (or kill() on timeout) to ensure the process is actually stopped/reaped. This can leave a stray/zombie etcd process around after init() raises. Consider mirroring the shutdown logic used in close() (terminate -> wait with timeout -> kill -> wait).

Suggested change

etcd_process.terminate()

try:

etcd_process.terminate()

try:

etcd_process.wait(timeout=5)

except subprocess.TimeoutExpired:

etcd_process.kill()

etcd_process.wait()

except Exception:

# Best-effort cleanup; ignore failures here

pass

Copilot · 2026-03-30T03:36:56Z

transfer_queue/interface.py

+        if conf.backend.storage_backend == "Yuanrong":
+            if conf.backend.Yuanrong.auto_init:
+                etcd_process = None
+                etcd_data_dir = None
+                worker_address = None
+                if not shutil.which("etcd"):
+                    raise RuntimeError(
+                        "etcd executable not found in PATH. Please install etcd and make sure it's in the PATH."
+                    )
+                if not shutil.which("dscli"):
+                    raise RuntimeError(
+                        "dscli executable not found in PATH. Please run `pip install openyuanrong-datasystem`."
+                    )
+                try:


The new Yuanrong auto-init path (etcd + dscli) introduces substantial side effects during init() but currently has no targeted test coverage. Consider adding a unit test that monkeypatches shutil.which/subprocess.Popen/subprocess.run to validate the happy path and failure cleanup behavior (without requiring real etcd/dscli binaries).

0oshowero0 · 2026-03-30T03:37:41Z

transfer_queue/config.yaml

+    # Client name (optional, default is YuanrongStorageClient)
+    client_name: "YuanrongStorageClient"


We can eliminate these config since it must be YuanrongStorageClient

0oshowero0 · 2026-03-30T03:40:31Z

transfer_queue/interface.py

                            pass
+                elif key == "Yuanrong":
+                    # Stop etcd process and clean up data directory, stop datasystem worker via dscli
+                    if isinstance(value, dict):


Only close etcd when enable auto_init?

Yes. If users start etcd manually, they should close etcd manually.

ascend-robot · 2026-03-30T08:01:19Z

CLA Signature Pass

dpj135, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Signed-off-by: dpj135 <958208521@qq.com>

ascend-robot · 2026-03-30T08:44:08Z

CLA Signature Pass

dpj135, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot added the ascend-cla/yes label Mar 26, 2026

dpj135 force-pushed the yr_init branch from 917578f to 7102611 Compare March 28, 2026 08:59

dpj135 force-pushed the yr_init branch from 7102611 to de072d3 Compare March 28, 2026 09:59

dpj135 force-pushed the yr_init branch from de072d3 to 6ae4f24 Compare March 28, 2026 10:13

dpj135 added 3 commits March 30, 2026 11:12

Added yr init

89a8989

Signed-off-by: dpj135 <958208521@qq.com>

Reduced logics

5e2e534

Signed-off-by: dpj135 <958208521@qq.com>

Simplified logics about Yuanrong startup

e5c4588

Signed-off-by: dpj135 <958208521@qq.com>

dpj135 force-pushed the yr_init branch from 6ae4f24 to e5c4588 Compare March 30, 2026 03:14

dpj135 marked this pull request as ready for review March 30, 2026 03:22

0oshowero0 requested a review from Copilot March 30, 2026 03:33

Copilot started reviewing on behalf of 0oshowero0 March 30, 2026 03:34 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

0oshowero0 reviewed Mar 30, 2026

View reviewed changes

Fixed comments

6bc4c61

Signed-off-by: dpj135 <958208521@qq.com>

dpj135 force-pushed the yr_init branch from 0546216 to 6bc4c61 Compare March 30, 2026 08:43

0oshowero0 mentioned this pull request Mar 31, 2026

[Roadmap] TransferQueue Q2 roadmap #49

Open

21 tasks

0oshowero0 approved these changes Apr 1, 2026

View reviewed changes

0oshowero0 merged commit 2f24c68 into Ascend:main Apr 1, 2026
8 checks passed

-                                subprocess.run(
-                                    ["dscli", "stop", "--worker_address", worker_address],
-                                    timeout=90,
-                                    capture_output=True,
-                                )
-                                logger.info(f"Stopped datasystem worker at {worker_address} via dscli stop")
+                                result = subprocess.run(
+                                    ["dscli", "stop", "--worker_address", worker_address],
+                                    timeout=90,
+                                    capture_output=True,
+                                    text=True,
+                                )
+                                if result.returncode == 0:
+                                    logger.info(f"Stopped datasystem worker at {worker_address} via dscli stop")
+                                else:
+                                    logger.error(
+                                        "Failed to stop datasystem worker at %s via dscli stop "
+                                        "(return code %s). stdout: %r stderr: %r",
+                                        worker_address,
+                                        result.returncode,
+                                        result.stdout,
+                                        result.stderr,
+                                    )
+                            except subprocess.TimeoutExpired as e:
+                                logger.error(
+                                    "Timed out while stopping datasystem worker at %s via dscli stop: %s",
+                                    worker_address,
+                                    e,
+                                )

		# Client name (optional, default is YuanrongStorageClient)
		client_name: "YuanrongStorageClient"

Conversation

dpj135 commented Mar 26, 2026

Description

Uh oh!

ascend-robot commented Mar 26, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Mar 28, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Mar 28, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Mar 28, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Mar 28, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Mar 30, 2026

CLA Signature Pass

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ascend-robot commented Mar 30, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Mar 30, 2026

CLA Signature Pass

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants