This patch adds three separate per-process fault counters:
- Inode faults
An inode fault happens when we've memory-mapped a file from disk
and we end up having to load 1 page (4KB) of the file into memory.
- Zero faults
Memory returned by mmap() is lazily zeroed out. Every time we have
to zero out 1 page, we count a zero fault.
- CoW faults
VM objects can be shared by multiple mappings that make their own
unique copy iff they want to modify it. The typical reason here is
memory shared between a parent and child process.
Instead of allocating and populating a Copy-on-Write bitmap for each
Region up front, wait until we actually clone the Region for sharing
with another process.
In most cases, we never need any CoW bits and we save ourselves a lot
of kmalloc() memory and time.
We were just blindly trusting that the bootloader would only give us
page-aligned memory regions. This is apparently not always the case,
so now we can try to repair those regions.
Fixes#601
We were always returning the full VM range of the partially-unmapped
Region to the range allocator. This caused us to re-use those addresses
for subsequent VM allocations.
This patch also skips creating a new VMObject in partial munmap().
Instead we just make split regions that point into the same VMObject.
This fixes the mysterious GCC ICE on large C++ programs.
This simplifies the ownership model and makes Region easier to reason
about. Userspace Regions are now primarily kept by Process::m_regions.
Kernel Regions are kept in various OwnPtr<Regions>'s.
Regions now only ever get unmapped when they are destroyed.
This reverts commit 11896d0e26.
This caused a race where other processes using the same InodeVMObject
could end up accessing the newly-mapped physical page before we've
actually filled it with bytes from disk.
It would be nice to avoid these copies without breaking anything.
Remove the global hash tables and replace them with InlineLinkedLists.
This significantly reduces the kernel heap pressure from doing many
small mmap()'s.
Using a HashTable to track "all instances of Foo" is only useful if we
actually need to look up entries by some kind of index. And since they
are HashTable (not HashMap), the pointer *is* the index.
Since we have the pointer, we can just use it directly. Duh.
This increase sizeof(VMObject) by two pointers, but removes a global
table that had an entry for every VMObject, where the cost was higher.
It also avoids all the general hash tabling business when creating or
destroying VMObjects. Generally we should do more of this. :^)
We were only doing this in Process::deallocate_region(), which meant
that kernel-only Regions never gave back their VM.
With this patch, we can start reusing freed-up address space! :^)
InodeVMObject is a VMObject with an underlying Inode in the filesystem.
AnonymousVMObject has no Inode.
I'm happy that InodeVMObject::inode() can now return Inode& instead of
VMObject::inode() return Inode*. :^)
This wasn't really thought-through, I was just trying anything to see
if it would make WindowServer faster. This doesn't seem to make much of
a difference either way, so let's just not do it for now.
It's easy to bring back if we think we need it in the future.
KBuffers are now zero-filled on demand instead of up front. This means
that you can create a huge KBuffer and it will only take up VM, not
physical pages (until you access them.)
We were short-circuiting the page fault handler a little too eagerly
for page-not-present faults in kernel memory.
If the current page directory already has up-to-date mapps for kernel
memory, allow it to progress to checking for zero-fill conditions.
This will enable us to have lazily populated kernel regions.
This allows the page fault code to find the owning PageDirectory and
corresponding process for faulting regions.
The mapping is implemented as a global hash map right now, which is
definitely not optimal. We can come up with something better when it
becomes necessary.
Sometimes you're only interested in either user OR kernel regions but
not both. Let's break this into two functions so the caller can choose
what he's interested in.
If we were using a ProcessPagingScope to temporarily go into another
process's page tables, things would fall apart when hitting a kernel
NP fault, since we'd clone the kernel page directory entry into the
*currently active process's* page directory rather than cloning it
into the *currently active* page directory.
Generate a special page containing the "return from signal" trampoline code
on startup and then route signalled threads to it. This avoids a page
allocation in every process that ever receives a signal.
Region now has is_user_accessible(), which informs the memory manager how
to map these pages. Previously, we were just passing a "bool user_allowed"
to various functions and I'm not at all sure that any of that was correct.
All the Region constructors are now hidden, and you must go through one of
these helpers to construct a region:
- Region::create_user_accessible(...)
- Region::create_kernel_only(...)
That ensures that we don't accidentally create a Region without specifying
user accessibility. :^)
This is obviously more readable. If we ever run into a situation where
ref count churn is actually causing trouble in the future, we can deal with
it then. For now, let's keep it simple. :^)
If we get an NP page fault in a process, and the fault address is in the
kernel address range (anywhere above 0xc0000000), we probably just need
to copy the page table info over from the kernel page directory.
The kernel doesn't allocate address space until it's needed, and when it
does allocate some, it only puts the info in the kernel page directory,
and any *new* page directories created from that point on. Existing page
directories need to be updated, and that's what this patch fixes.
Instead of PDE's and PTE's being weird wrappers around dword*, just have
MemoryManager::ensure_pte() return a PageDirectoryEntry&, which in turn has
a PageTableEntry* entries().
I've been trying to understand how things ended up this way, and I suspect
it was because I inadvertently invoked the PageDirectoryEntry copy ctor in
the original work on this, which must have made me very confused..
Anyways, now things are a bit saner and we can move forward towards a better
future, etc. :^)
This makes no functional difference, but it makes it clear that
MemoryManager and PhysicalRegion take over the actual physical
page represented by this PhysicalPage instance.
This significantly reduces the pressure on the kernel heap when
allocating a lot of pages.
Previously at about 250MB allocated, the free page list would outgrow
the kernel's heap. Given that there is no longer a page list, this does
not happen.
The next barrier will be the kernel memory used by the page records for
in-use memory. This kicks in at about 1GB.
This makes it possible to run Serenity with more than 64 MB of RAM.
Because each physical page is represented by a PhysicalPage object, and such
objects are allocated using kmalloc_eternal(), more RAM means more pressure
on kmalloc_eternal(), so we're gonna need a better strategy for this.
But for now, let's just celebrate that we can use the 128 MB of RAM we've
been telling QEMU to run with. :^)
Since we transition to a new PageDirectory on exec(), we need a matching
RangeAllocator to go with the new directory. Instead of juggling this in
Process and MemoryManager, simply attach the RangeAllocator to the
PageDirectory instead.
Fixes#61.