-
Notifications
You must be signed in to change notification settings - Fork 184
Virtual disk: add migration case with vhostuser disk #6690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new libvirt migration test configuration file libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg and a new Python test module libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py. The config declares migration parameters, storage/disk attributes for a vhost-user disk, VM/network attributes, and a variant. The Python module implements end-to-end test logic: create backend images, start qemu-storage-daemon vhost-user services on source and remote, set SELinux/mmap settings, modify VM XML to attach a vhost-user disk, start and verify the VM, perform migration and post-migration validation, and cleanup services, sockets, and images. The config contains a line "no ppc64le" that appears to be invalid syntax. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (3)
51-54: Consider using consistent image creation method.The source host uses
libvirt.create_local_disk("file", ...)while the remote host usesdd if=/dev/zero. While both work, using consistent methods improves maintainability.
113-115: Strengthen vhostuser disk validation.The current check only verifies the string "vhostuser" exists in the XML dump. Consider validating the specific disk configuration (target device, socket path) for more robust verification.
- # Check if vhostuser disk is accessible in VM - if "vhostuser" not in virsh.dumpxml(vm_name).stdout_text: - test.fail("Check vhostuser disk in VM failed") + # Check if vhostuser disk is present in VM configuration + vmxml_check = vm_xml.VMXML.new_from_dumpxml(vm_name) + disk_list = vmxml_check.get_disk_all() + vhostuser_found = any(d.get('type') == 'vhostuser' for d in disk_list.values()) + if not vhostuser_found: + test.fail("Check vhostuser disk in VM failed")
88-94: Consider more robust service readiness check.The
sleep 3at the end is a timing-based approach to wait for the daemon to be ready. For more reliable tests, consider checking for socket file existence or service status instead of a fixed sleep.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg(1 hunks)libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.6)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
31-31: Function call with shell=True parameter identified, security issue
(S604)
33-33: Function call with shell=True parameter identified, security issue
(S604)
36-36: Function call with shell=True parameter identified, security issue
(S604)
38-38: Function call with shell=True parameter identified, security issue
(S604)
81-81: Probable insecure usage of temporary file or directory: "/tmp/vhost.sock"
(S108)
84-84: Local variable device_target is assigned to but never used
Remove assignment to unused variable device_target
(F841)
85-85: Use of possibly insecure function; consider using ast.literal_eval
(S307)
175-175: Function call with shell=True parameter identified, security issue
(S604)
180-180: Function call with shell=True parameter identified, security issue
(S604)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.11
- GitHub Check: Python 3.12
- GitHub Check: Python 3.8
- GitHub Check: Python 3.9
🔇 Additional comments (5)
libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg (2)
34-35: Empty variant placeholder.The
with_precopyvariant is defined but contains no variant-specific parameters. If precopy is the only migration mode being tested, this is acceptable as a placeholder for future postcopy or other variants.
29-29: Based on my investigation, I found that:
libvirt's memory backing XML element: The correct XML element is
<memoryBacking>with subelements like<source type="memfd"/>and<access mode="shared"/>— matching the structure in line 29.Limitation: I cannot access the repository to verify the exact key mapping layer (how "mb" is converted to XML), as the repository clone failed.
The structure of the vm_attrs appears consistent with libvirt's memoryBacking configuration, but without access to the framework's code that processes vm_attrs, I cannot definitively confirm whether "mb" is the correct key convention used by the testing framework.
Verify the key mapping in vm_xml processing code.
The
vm_attrsuses"mb"for memory backing configuration. While the internal structure (source_type,access_mode) matches libvirt's<memoryBacking>subelements, direct codebase verification is needed to confirm that"mb"is the correct key recognized by the testing framework's XML generator.libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (3)
20-39: LGTM with minor observation.The function correctly sets up the vhost socket service on the source host. The use of
shell=Trueis acceptable in this test framework context where command strings contain complex shell constructs.
156-187: LGTM!The cleanup function properly handles all resources: VM destruction, XML restoration, connection cleanup, and file/service cleanup on both local and remote hosts. The use of
ignore_status=Trueensures cleanup continues even if individual steps fail.
189-211: LGTM!The main execution flow follows good practices: version check, variable initialization, XML backup, and a try/finally block ensuring cleanup runs regardless of test outcome.
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Outdated
Show resolved
Hide resolved
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Outdated
Show resolved
Hide resolved
d3ec636 to
8ca0104
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (1)
80-80: Consider using a scoped, non‑/tmp socket pathUsing a fixed
/tmp/vhost.sockrisks collisions or stale leftovers because/tmpis world‑writable and shared across processes/tests. To keep things isolated, consider deriving a unique path (e.g. includevm_name, PID, or a UUID) under a test‑specific directory, such as something based ondata_dir.get_data_dir()or another per‑test temp dir.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg(1 hunks)libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-24T08:01:27.899Z
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.
Applied to files:
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
🪛 Ruff (0.14.6)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
30-30: Function call with shell=True parameter identified, security issue
(S604)
32-32: Function call with shell=True parameter identified, security issue
(S604)
35-35: Function call with shell=True parameter identified, security issue
(S604)
37-37: Function call with shell=True parameter identified, security issue
(S604)
80-80: Probable insecure usage of temporary file or directory: "/tmp/vhost.sock"
(S108)
173-173: Function call with shell=True parameter identified, security issue
(S604)
178-178: Function call with shell=True parameter identified, security issue
(S604)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.8
- GitHub Check: Python 3.9
- GitHub Check: Python 3.11
- GitHub Check: Python 3.12
🔇 Additional comments (3)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (3)
72-83: Setup flow and parameter parsing look consistent and safeUsing
ast.literal_evalfordisk_dictandvm_attrsis much safer thaneval, and the VM XML manipulation (setup_attrs,create_vm_device_by_type,add_device,sync) matches existing tp‑libvirt patterns. The initial boot/login plus vhostuser presence check viavirsh.dumpxmlprovides a reasonable pre‑migration sanity gate.Also applies to: 100-110
124-148: Post‑migration in‑guest verification is solidSwitching
vm.connect_urito the destination, recreating the serial console, checkinglsblkfordevice_target, and then doing add_data_to_vm_diskwrite are good end‑to‑end checks that the vhostuser disk survives migration and remains usable. Thefinallyblock closingvm_sessionand restoringvm.connect_urikeeps the connection handling tidy.
154-185: Overall test orchestration and cleanup are well‑structuredThe
runfunction’s structure—setup_test()→run_migration()→verify_test()withcleanup_test()infinally—aligns with other migration tests. Backing up the inactive XML and restoring it in cleanup, plus guardingvm.destroy()withvm.is_alive(), should leave the VM definition in a clean state after the test.Once the service lifecycle points above are addressed, the test looks ready to go.
Also applies to: 187-208
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Outdated
Show resolved
Hide resolved
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Show resolved
Hide resolved
8ca0104 to
822da95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (2)
27-37: SELinux booleandomain_can_mmap_filesis changed persistently on both hosts without restoreBoth locally (Line 36–37) and remotely (Line 57–58) you run
setsebool domain_can_mmap_files 1 -P, which permanently flips this SELinux boolean for the entire system and survives reboots. The test never restores the previous value, so it leaves a host‑wide configuration change behind and may affect other tests or users.Consider one of the following instead:
- Treat
domain_can_mmap_files=onas an environmental precondition and document it in the test/config, removing thesetseboolcalls from the test, or- Capture the original value via
getsebool domain_can_mmap_fileson source/remote and restore it incleanup_test(or a higher‑level teardown) so the host returns to its prior state.As written this remains a major environment side‑effect.
Also applies to: 49-58
164-174:pkill -f qemu-storage-daemonkills all daemons on both hostsIn
cleanup_test, the following commands:
- Line 165:
remote.run_remote_cmd("pkill -f qemu-storage-daemon", ...)- Line 169:
process.run("pkill -f qemu-storage-daemon", ...)will terminate every
qemu-storage-daemonprocess on the remote and local hosts, not just the instance(s) started by this test. This can interfere with parallel tests or any other services using qemu‑storage‑daemon, which is the same concern previously raised forkillall.Since you already start the daemon via
systemd-run, a more targeted approach would be to:
- Give the transient unit an explicit, unique name via
--unit=<name>instart_sock_service_cmd, and- In
cleanup_test, stop only that unit on each host, e.g.systemctl stop <name>(withignore_status=True), then drop thepkillcalls entirely.That keeps the cleanup scoped strictly to what this test started instead of indiscriminately killing all daemons.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg(1 hunks)libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-24T08:01:27.899Z
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.
Applied to files:
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
🧬 Code graph analysis (1)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (1)
libvirt/tests/src/migration/migrate_vm.py (1)
run_remote_cmd(759-776)
🪛 Ruff (0.14.6)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
30-30: Function call with shell=True parameter identified, security issue
(S604)
32-32: Local variable cmd_output is assigned to but never used
Remove assignment to unused variable cmd_output
(F841)
32-32: Function call with shell=True parameter identified, security issue
(S604)
35-35: Function call with shell=True parameter identified, security issue
(S604)
37-37: Function call with shell=True parameter identified, security issue
(S604)
55-55: Local variable remote_vsock_service_id is assigned to but never used
Remove assignment to unused variable remote_vsock_service_id
(F841)
78-78: Probable insecure usage of temporary file or directory: "/tmp/vhost.sock"
(S108)
169-169: Function call with shell=True parameter identified, security issue
(S604)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.12
- GitHub Check: Python 3.8
- GitHub Check: Python 3.11
- GitHub Check: Python 3.9
🔇 Additional comments (1)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (1)
61-151: Overall migration flow and VM‑side verification look solidThe high‑level orchestration in
run()—setup_testto configure the vhost-user backend and disk,migration_obj.run_migration(),verify_testto switch to the destination and exercise the disk vialsblk+dd_data_to_vm_disk, andmigration_obj.verify_default()—is coherent and matches existing migration patterns in tp-libvirt. The use ofast.literal_evalfordisk_dictandvm_attrs, plus serial console login on the destination, is a nice touch for safety and robustness.Aside from the SELinux/daemon‑lifecycle concerns noted elsewhere, the core test logic reads correctly.
Also applies to: 178-197
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Outdated
Show resolved
Hide resolved
822da95 to
805abd7
Compare
|
@cliping Can you help me to review this migration PR? |
cliping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others LGTM.
| vm_session = vm.wait_for_serial_login(timeout=120) | ||
|
|
||
| try: | ||
| # Verify vhostuser disk is still accessible after migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use 'check_disk_after_mig = yes' in cfg to verify vhostuser disk.
migration_obj.verify_default() -> self.migration_test.post_migration_check()
post_migration_check() in avocado-vt/virttest/migration.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes the script more clear. I've added check_disk_after_mig and moved the duplicated function now. Thanks.
| test.log.info("Cleanup steps for vhostuser disk migration.") | ||
| if vm.is_alive(): | ||
| vm.destroy(gracefully=False) | ||
| vmxml_backup.sync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In migration_obj.cleanup_connection(), vm xml will be recovered and the vm will be cleaned up on the source and target hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these steps have already been included in function. So removed them now.
Automate case: RHEL-200524: Migrate VM with a vhostuser disk Signed-off-by: Meina Li <[email protected]>
805abd7 to
15ad416
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (1)
83-89: Unresolved: Tighten qemu‑storage‑daemon lifecycle to avoid killing unrelated processes.As noted in the previous review,
pkill -f qemu-storage-daemon(lines 126, 130) terminates all qemu-storage-daemon instances system-wide, not just the ones started by this test. This breaks test isolation and can interfere with other tests or legitimate services on the same host.The root cause is that the
systemd-runcommand (lines 83-89) doesn't specify a unit name, making it impossible to stop only this test's service.Recommended solutions:
Option 1: Use explicit systemd unit names (preferred)
# In setup_test, around line 83: import time unit_name = f"vhost-test-{vm_name}-{int(time.time())}" remote_unit_name = f"vhost-test-remote-{vm_name}-{int(time.time())}" start_sock_service_cmd = ( f'systemd-run --unit={unit_name} --uid qemu --gid qemu /usr/bin/qemu-storage-daemon' ' --blockdev \'{"driver":"file","filename":"%s","node-name":"libvirt-1-storage",...}\'' ... ) % (image_path, sock_path) # Pass unit names to cleanup or store them for later use # In cleanup_test: process.run(f"systemctl stop {unit_name}", ignore_status=True, shell=True) remote.run_remote_cmd(f"systemctl stop {remote_unit_name}", params, ignore_status=True)Option 2: Track daemon PIDs
Capture the PID from the process and kill only that specific PID during cleanup.Based on learnings, VM cleanup is handled by the framework, but this comment is about qemu-storage-daemon processes which are separate from VMs and require explicit cleanup.
Also applies to: 126-127, 130-130
🧹 Nitpick comments (2)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (2)
31-31: Remove leftover method chain.The
.stdout_text.strip()call serves no purpose since the result isn't assigned. This appears to be leftover from removing the unusedcmd_outputvariable.🔎 Suggested fix
- process.run(start_sock_service_cmd, ignore_status=False, shell=True).stdout_text.strip() + process.run(start_sock_service_cmd, ignore_status=False, shell=True)
109-110: Consider a more robust vhost-user disk presence check.The string search
"vhostuser" not in virsh.dumpxml(vm_name).stdout_textis fragile and could produce false positives/negatives if XML formatting changes or if "vhostuser" appears in comments or other attributes.Consider parsing the XML and checking for the specific disk device type or source attributes:
vmxml_check = vm_xml.VMXML.new_from_dumpxml(vm_name) disk_devices = vmxml_check.get_devices("disk") vhostuser_disk_found = any( hasattr(disk.source, "type") and disk.source.type == "unix" and "vhost" in str(disk) for disk in disk_devices ) if not vhostuser_disk_found: test.fail("Check vhostuser disk in VM failed")This approach explicitly checks disk device properties rather than relying on substring matching.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfglibvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
🚧 Files skipped from review as they are similar to previous changes (1)
- libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: yanglei-rh
Repo: autotest/tp-libvirt PR: 6675
File: libvirt/tests/cfg/virtual_network/qemu/nic_bonding.cfg:16-16
Timestamp: 2025-12-12T10:00:16.768Z
Learning: In tp-libvirt test cfg files, the cartesian config parser supports variable substitution using `${variable_name}` syntax. Variables defined in the cfg file (e.g., `filesize = 4000`) are automatically substituted into strings that reference them (e.g., `count=${filesize}`) before the Python test code receives the parameters.
📚 Learning: 2025-09-24T08:01:27.899Z
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.
Applied to files:
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
📚 Learning: 2025-11-18T01:20:50.891Z
Learnt from: qiankehan
Repo: autotest/tp-libvirt PR: 6654
File: libvirt/tests/src/virtual_network/qemu/boot_with_multiqueue.py:56-58
Timestamp: 2025-11-18T01:20:50.891Z
Learning: In the tp-libvirt project, when calling session command methods and the status code is not needed, prefer using `session.cmd_output()` over `session.cmd_status_output()` to avoid unused variables.
Applied to files:
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
📚 Learning: 2025-11-03T13:30:16.458Z
Learnt from: smitterl
Repo: autotest/tp-libvirt PR: 6531
File: libvirt/tests/src/virtual_network/qemu/pktgen_burst_mode_test.py:64-67
Timestamp: 2025-11-03T13:30:16.458Z
Learning: In tp-libvirt tests, prefer using a list of cleanup lambda functions that are appended when actions requiring cleanup are performed, then executed in a finally block with individual error handling. Example: `cleanup_actions.append(lambda: undo_action())` followed by iterating through the list in reverse order in the finally block.
Applied to files:
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
🪛 Ruff (0.14.10)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
29-29: Function call with shell=True parameter identified, security issue
(S604)
31-31: Function call with shell=True parameter identified, security issue
(S604)
34-34: Function call with shell=True parameter identified, security issue
(S604)
36-36: Function call with shell=True parameter identified, security issue
(S604)
77-77: Probable insecure usage of temporary file or directory: "/tmp/vhost.sock"
(S108)
130-130: Function call with shell=True parameter identified, security issue
(S604)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.9
- GitHub Check: Python 3.11
- GitHub Check: Python 3.8
- GitHub Check: Python 3.12
Automate case:
RHEL-200524: Migrate VM with a vhostuser disk
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.