We had some bad line continuations of longer string literals using a
pattern like:
'foo"
."bar'
I.e., a 'line1"."line2' instead of an actually working 'line1'.'line2'
code.
This still resulted in valid perl by luck, making it go unnoticed, but
the resulting string was rather broken as it included the newline and
.' code part for the line continuation.
I noticed them due to bad indentation still using tabs, which was due
to perltidy not touching string literals.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
In that the target node is meant, as while we printed the node name
itself a user might miss that it refers to the target node when
reading this error, especially with bigger clusters and rather
similar node names like e.g. pve1, ..., pve11, pve12, ..., pve21.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
In Proxmox VE 9, the default behavior for VirtIO network devices is to
inherit the MTU from the bridge. This means that most migrations are
potentially problematic when the nets-host-mtu parameter is not set,
see commit 20c91f7f ("migration: preserve host_mtu for virtio-net
devices"). While setting the parameter could be avoided in some cases,
the information what MTU the target node bridges have is not readily
available. Upgrading is already required to avoid actual problematic
cases, so just tell people to upgrade when the target does not support
preserving the VirtIO-net MTU yet in all cases.
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250909091918.32254-2-f.ebner@proxmox.com
Until QEMU warns about this itself, inform the users here. Commit
message below copied from [1].
If a virtual machine is setup with an intel-iommu device, QEMU
allocates and maps the (virtual) I/O address space (IOAS) for a VFIO
passthrough device with iommufd.
In case of a mismatch of the address width of the host CPU and IOMMU
CPU, the guest physical address space (GPAS) and memory-type range
registers (MTRRs) are setup to the host CPU's address width, which
causes IOAS to be allocated and mapped outside of the IOMMU's maximum
guest address width (MGAW) and causes the following error from QEMU
(the error message is copied from the user forum [0]):
kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)
[0]: https://forum.proxmox.com/threads/169586/page-3#post-795717
[1]: https://lore.proxmox.com/pve-devel/20250902112307.124706-5-d.kral@proxmox.com/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Since QEMU 9.2 [0], the default I/O address space bit width was raised
from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
aw-bits check introduced in [1] to trip for host CPUs with less than 48
bits physical address width from QEMU 9.2 onwards:
vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39
For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
the behavior of the check.
Therefore, expose the 'aw-bits' option of the intel-iommu and
virtio-iommu QEMU drivers to allow users to set the value.
[0] qemu ddd84fd0c1 ("intel_iommu: Set default aw_bits to 48 starting from QEMU 9.2")
[1] qemu 77f6efc0ab ("intel_iommu: Check compatibility with host IOMMU capabilities")
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Link: https://lore.proxmox.com/20250905141529.215689-1-d.kral@proxmox.com
For VirtIO network devices, it is necessary to preserve the values and
presence of the host_mtu setting when restoring a snapshot. See commit
"migration: preserve host_mtu for virtio-net devices" for details.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Link: https://lore.proxmox.com/20250904124113.81772-7-f.ebner@proxmox.com
The call get_current_qemu_machine() already depends on the virtual
machine running, so not being able to obtain the PID is very
unexpected. Quietly not including the running CPU in the snapshot can
lead to not being able to restore the snapshot later, so die early
instead.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Link: https://lore.proxmox.com/20250904124113.81772-6-f.ebner@proxmox.com
The virtual hardware is generated differently (at least for i440fx
machines) when host_mtu is set or not set on the netdev command line
[0]. When the MTU is the same value as the default 1500, Proxmox VE
did not add a host_mtu parameter. This is problematic for migration
where host_mtu is present on one end of the migration, but not on the
other [1]. Moreover, the effective setting in the guest (state) will
still be the host_mtu from the source side, even if a different value
is used for host_mtu on the target instance's commandline. This will
not lead to an error loading the migration stream in QEMU, but having
a larger host_mtu than the bridge MTU is still problematic for certain
network traffic like
> iperf3 -c 10.10.10.11 -u -l 2k
when host_mtu=9000 and bridge MTU=1500.
Pass the values from the source to the target during migration to be
able to preserve them.
[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1449346
[1]: https://forum.proxmox.com/threads/live-vm-migration-fails.169537/post-796379
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Link: https://lore.proxmox.com/20250904124113.81772-4-f.ebner@proxmox.com
The virtual hardware is generated differently (at least for i440fx
machines) when host_mtu is set or not set on the netdev command line
[0]. When the MTU is the same value as the default 1500, Proxmox VE
did not add a host_mtu parameter. This is problematic for migration
where host_mtu is present on one end of the migration, but not on the
other [1]. Moreover, the effective setting in the guest (state) will
still be the host_mtu from the source side, even if a different value
is used for host_mtu on the target instance's commandline. This will
not lead to an error loading the migration stream in QEMU, but having
a larger host_mtu than the bridge MTU is still problematic for certain
network traffic like
> iperf3 -c 10.10.10.11 -u -l 2k
when host_mtu=9000 and bridge MTU=1500. Starting a VM cold with such a
configuration is already prohibited, so also prevent it for migration.
Add the necessary parameter for VM start to allow preserving the
values going forward.
[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1449346
[1]: https://forum.proxmox.com/threads/live-vm-migration-fails.169537/post-796379
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Link: https://lore.proxmox.com/20250904124113.81772-3-f.ebner@proxmox.com
The virtual hardware is generated differently (at least for i440fx
machines) when host_mtu is set or not set on the netdev command line
[0]. When the MTU is the same value as the default 1500, Proxmox VE
did not add a host_mtu parameter. This is problematic for migration
where host_mtu is present on one end of the migration, but not on the
other [1].
Always set the host_mtu parameter starting with machine version
10.0+pve1 to avoid this issue going forward. Handling migrations with
older machine versions is more involved and will be done in separate
patches. Thanks to Stefan Hanreich and Fabian Grünbichler for
discussing this with me!
Since print_netdevice_full() is also called for hotplug, it cannot
always use the $version_guard helper and needs to fallback to
min_version() then.
[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1449346
[1]: https://forum.proxmox.com/threads/live-vm-migration-fails.169537/post-796379
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Link: https://lore.proxmox.com/20250904124113.81772-2-f.ebner@proxmox.com
To have some more context for the users reading these, also drop those
that are just referencing refactoring/preparation commits.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The throttle block driver does not support issuing ioctls. While the
driver could be patched in QEMU to pass along ioctls to the child,
that seems like a hack, because with scsi-block, throttle limits
already do not apply. This was already the case with '-drive' in
Proxmox VE 8.
Note that live mirroring would require special handling to make the
target be below a throttle node with the correct node name, but it
already is not supported for such disks:
> # qm disk move 103 scsi1 lvm
> unable to parse volume ID '/dev/sdg'
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250814164529.694979-1-f.ebner@proxmox.com
The throttle node is generated later above the alloc-track block node,
so generating the alloc-track backing block node needs to happen with
'no-throttle' to avoid a duplicate node name and avoid an additional
throttle node in the graph.
Reported in the community forum:
https://forum.proxmox.com/threads/169766/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
there's no need to have a separate Makefile and directory for these, it's just
files being copied. the missing handling of $PACKAGE in the old Makefile
resulted in the files being installed in the wrong place when building the
source package..
Reported-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Link: https://lore.proxmox.com/20250814084409.182322-1-f.gruenbichler@proxmox.com
Only scsi-cd and scsi-hd have a 'write-cache' option, scsi-block and
scsi-generic do not.
Setting the 'cache' setting on such a drive in the VM configuration is
still valid and should not be prohibited, because it affects the
blockdev settings.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250813085845.25516-2-f.ebner@proxmox.com
The drive_is_read_only() helper only applies to '-drive', but not
'-blockdev' and is only used in a single place. Inline it to avoid
accidental usages popping up in the future.
This also gets rid of a hidden dependency from Drive to QemuConfig.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812143900.138723-5-f.ebner@proxmox.com
With ide-hd, the inserted block node needs to be marked as writable
too, but -blockdev will complain if it's marked as writable but the
actual backing device is read-only (e.g. read-only base LV).
IDE/SATA do not support being configured as read-only, the most
similar is using ide-cd instead of ide-hd, with most of the code and
configuration shared in QEMU.
Since a template is never actually started, the front-end device is
never accessed. The backup only accesses the inserted block node, so
it does not matter for the backup if the type is 'ide-cd' instead.
The same issue did not manifest for '-drive', because the '-snapshot'
option is used for template backups. The '-snapshot' option does not
affect '-blockdev', from 'man kvm':
> snapshot is incompatible with -blockdev
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812143900.138723-4-f.ebner@proxmox.com
This is in preparation to remove the hidden dependency from the Drive
module to QemuConfig.
Note that the drive_is_read_only() can be replaced with $is_template
for OVMF, because the helper only behaves differently for IDE and
SATA, but not for EFI disks.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812143900.138723-2-f.ebner@proxmox.com
Re-using the detach() helper has the side effect of avoiding logging
errors to syslog for automatically removed child nodes. This should be
the case for all file nodes here. None are explicitly added via
blockdev-add and thus QEMU already auto-removes them.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812115652.79330-4-f.ebner@proxmox.com
Without passing 'noerr' to mon_cmd(), errors are logged to the system
journal. In attach() and detach(), there are two mon_cmd() calls that
are expected to fail in some scenarios for which the errors should not
be logged.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812115652.79330-3-f.ebner@proxmox.com
Require that snapshot-as-volume-chain qcow2 images are always used in
combination with '-blockdev', rather than '-drive'. With '-drive', the
'discard-no-unref' option is not set and the fragmentation can lead to
the same issue that for '-blockdev', was solved by commit a3a9a2ab
("fix #6543: use qcow2 'discard-no-unref' option when using
snapshot-as-volume-chain").
While it would be possible to set the flag for '-drive' too, the
snapshot-as-volume-chain feature already only works with machine type
>= 10.0, see commit 6b2b45fd ("snapshot create/delete: die early for
snapshot-as-volume-chain for pre-10.0 machine version") and it's only
tested for those. Avoid accidents and other unknown issues by being
strict and prohibiting usage without '-blockdev'.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250811135154.253817-1-f.ebner@proxmox.com
As reported in the community forum [0], a running VM with pre-10.0
machine version using a storage with snapshot-as-volume-chain will run
into issues when creating a snapshot. Similarly deleting the snapshot
of such a VM would fail. Having '-blockdev' is a hard requirement for
the implementation of the snapshot-as-volume-chain feature for running
VMs, so die and suggest upgrading the machine version.
[0]: https://forum.proxmox.com/threads/lvm-thick-with-iscsi-pve-9-0-3.169319/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Hannes Laimer <h.laimer@proxmox.com>
Link: https://lore.proxmox.com/20250807104832.51784-1-f.ebner@proxmox.com
At this point, the dbus-vmstate helper is not expected to be running
anymore.
Using $noerr here didn't really make sense - as it never should be
running anymore at this point, plus the VM should also be stopped - thus
the "happy" path here is to fail removing the dbus-vmstate helper.
It resulted in another spurious warning _after_ a migration on the
source node.
Fixes: 067a0f55 ("vmstate: improve cleaning up dbus-vmstate and avoid spurious warning")
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Link: https://lore.proxmox.com/20250805095828.301188-1-c.heiss@proxmox.com
First, moving to vm_stop_cleanup(), which is a better fit for this.
It gets called by the cleanup API method in case of unclean shutdown or
from inside the guest.
In every case, the dbus-vmstate daemon should _never_ be running at this
point, as it is started only before migration and stopped directly after
migration, before vm_stop_cleanup() is even called. So it should only be
left running in case of a crash during migration.
Calling it anyway here ensures that the daemon is always (cleanly) shut
down. As the dbus-vmstate is part of the VM scope unit, that would it
tear it down too as a last resort.
Fixes the following spurious warning when a VM was shutdown from inside
the guest:
`failed to retrieve org.qemu.VMState1 owners: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get owners of name 'org.qemu.VMState1': no such name`
Reported-by: Hannes Duerr <h.duerr@proxmox.com>
Reported-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Link: https://lore.proxmox.com/20250804133002.1625925-1-c.heiss@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
We actually query if there are any guests positive affinity rule for
the to-be migrated VM, while that normally means the they will be
migrated, it doesn't has to be (e.g., node constraints might interfere
here), and "comigrated" is not as much used compared to
"dependencies", so that might be easier to understand for non-native
speakers or users (vs devs, these details tend to leak).
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>