Occasionally, the mmap syscall will crash with:
CR0: 0x80050033 CR3: 0xA000
CR2: 0x0 CR4: 0x350E20
RAX: 0x21403000 RBX: 0x0 RCX: 0x1006338
RDX: 0x3 RSI: 0x3F000 RDI: 0x0
RIP: 0x21AA RBP: 0x1198A10 RSP: 0x11989E8
SS: 0x10 CS: 0x8 DS: 0x23 FS: 0x0 GS: 0x0
FS BASE: 0x0 GS BASE: 0x6050
Machine exception: page_at: pt entry not user writable Data: 0x21403000
terminate called after throwing an instance of 'tinykvm::MemoryException'
what(): page_at: pt entry not user writable
with a callstack pointing to
|
auto* page = memory.get_writable_page(addr & ~PageMask(), memory.expectedUsermodeFlags(), true, false); |
collect_state_guest = master_vm.mmap_allocate(0x1000, 0x7, false);
tinykvm::page_at(master_vm.main_memory(), collect_state_guest, [] (uint64_t addr, uint64_t& entry, size_t size) {
// Make the page executable by the user (There is probably a better way to do this?)
entry = entry & ~PDE64_NX | PDE64_DIRTY;
});
// Emulate the relevant mmap
auto new_page = master_vm.mmap_allocate(258048, 3);
master_vm.memzero(new_page, 258048);
is a reduced reproducer, although is a symptom of the issue showing up from a userspace program executing mmap(0x0, 258048, prot=3, flags=22, vfd=-1) = 0x21403000 instead. The collect_state page is an executable memory page that I'm allocating from the VMM - the issue "goes away" if you don't set PDE64_DIRTY in page_at, however removing all of the PDE64_DIRTY flags from the original program still causes a crash in a (slightly later) mmap call instead.
I believe the issue is due to the above page_at resolving to a hugepage that the newly mmap'd region is embedded within, and so it sees that collect_state_guest has the dirty bit set and thus must_be_zeroed = true, but then the later get_writable_page gets the hugepage which has PDE64_NX cleared and fails the flag against vMemory::expectedUsermodeFlags
This is maybe a case of me holding tinykvm wrong, and executable pages should somehow be allocated separately from non-executable pages? But mmap_allocate throws away prot, and so I'm not sure how else I'm supposed to allocate code pages from the VMM. It seems like the executable_heap MachineOption configures !NX everywhere via the vMemory::expectedUsermodeFlags, and so would cause or hide this issue depending on e.g. if you have a dynamic ELF or not and turn it off - but even in the non-executable case it seems like you could get unlucky and have the initial machine mapped .text page for your ELF coalesce with the first user serviced mmap and get sad.
Occasionally, the
mmapsyscall will crash with:with a callstack pointing to
tinykvm/lib/tinykvm/machine_utils.cpp
Line 33 in fe757a7
is a reduced reproducer, although is a symptom of the issue showing up from a userspace program executing
mmap(0x0, 258048, prot=3, flags=22, vfd=-1) = 0x21403000instead. Thecollect_statepage is an executable memory page that I'm allocating from the VMM - the issue "goes away" if you don't setPDE64_DIRTYinpage_at, however removing all of thePDE64_DIRTYflags from the original program still causes a crash in a (slightly later)mmapcall instead.I believe the issue is due to the above
page_atresolving to a hugepage that the newlymmap'd region is embedded within, and so it sees thatcollect_state_guesthas the dirty bit set and thusmust_be_zeroed = true, but then the laterget_writable_pagegets the hugepage which hasPDE64_NXcleared and fails the flag againstvMemory::expectedUsermodeFlagsThis is maybe a case of me holding tinykvm wrong, and executable pages should somehow be allocated separately from non-executable pages? But
mmap_allocatethrows awayprot, and so I'm not sure how else I'm supposed to allocate code pages from the VMM. It seems like theexecutable_heapMachineOptionconfigures!NXeverywhere via thevMemory::expectedUsermodeFlags, and so would cause or hide this issue depending on e.g. if you have a dynamic ELF or not and turn it off - but even in the non-executable case it seems like you could get unlucky and have the initial machine mapped.textpage for your ELF coalesce with the first user servicedmmapand get sad.