Replace the AK::String used for Region::m_name with a KString.
This seems beneficial across the board, but as a specific data point,
it reduces time spent in sys$set_mmap_name() by ~50% on test-js. :^)
Problem:
- `static` variables consume memory and sometimes are less
optimizable.
- `static const` variables can be `constexpr`, usually.
- `static` function-local variables require an initialization check
every time the function is run.
Solution:
- If a global `static` variable is only used in a single function then
move it into the function and make it non-`static` and `constexpr`.
- Make all global `static` variables `constexpr` instead of `const`.
- Change function-local `static const[expr]` variables to be just
`constexpr`.
By constraining two implementations, the compiler will select the best
fitting one. All this will require is duplicating the implementation and
simplifying for the `void` case.
This constraining also informs both the caller and compiler by passing
the callback parameter types as part of the constraint
(e.g.: `IterationFunction<int>`).
Some `for_each` functions in LibELF only take functions which return
`void`. This is a minimal correctness check, as it removes one way for a
function to incompletely do something.
There seems to be a possible idiom where inside a lambda, a `return;` is
the same as `continue;` in a for-loop.
Currently, when passing buffers into VirtIOQueues, we use scatter-gather
lists, which contain an internal vector of buffers. This vector is
allocated, filled and the destroy whenever we try to provide buffers
into a virtqueue, which would happen a lot in performance cricital code
(the main transport mechanism for certain paravirtualized devices).
This commit moves it over to using VirtIOQueueChains and building the
chain in place in the VirtIOQueue. Also included are a bunch of fixups
for the VirtIO Console device, making it use an internal VM::RingBuffer
instead.
We want to move this out of the AHCI subsystem into the VM system,
since other parts of the kernel may need to perform scatter-gather IO.
We rename the current VM::ScatterGatherList impl that's used in the
virtio subsystem to VM::ScatterGatherRefList, since its distinguishing
feature from the AHCI scatter-gather list is that it doesn't own its
buffers.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
This allows converting a single virtual buffer into its non-physically
contiguous parts, this is especially useful for DMA-based devices that
support scatter/gather-like functionality, as it eliminates the need
to clone outgoing buffers into one physically contiguous buffer.
Alot of code is shared between i386/i686/x86 and x86_64
and a lot probably will be used for compatability modes.
So we start by moving the headers into one Directory.
We will probalby be able to move some cpp files aswell.
Increase type-safety moving the MemoryManager APIs which take a
Region::Access to actually use that type instead of a `u8`.
Eventually the actually m_access can be moved there as well, but
I hit some weird bug where it wasn't using the correct operators
in `set_access_bit(..)` even though it's declared (and tested).
Something to fix-up later.
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
You can now declare functions with UNMAP_AFTER_INIT and they'll get
segregated into a separate kernel section that gets completely
unmapped at the end of initialization.
This can be used for anything we don't need to call once we've booted
into userspace.
There are two nice things about this mechanism:
- It allows us to free up entire pages of memory for other use.
(Note that this patch does not actually make use of the freed
pages yet, but in the future we totally could!)
- It allows us to get rid of obviously dangerous gadgets like
write-to-CR0 and write-to-CR4 which are very useful for an attacker
trying to disable SMAP/SMEP/etc.
I've also made sure to include a helpful panic message in case you
hit a kernel crash because of this protection. :^)
You can now use the READONLY_AFTER_INIT macro when declaring a variable
and we will put it in a special ".ro_after_init" section in the kernel.
Data in that section remains writable during the boot and init process,
and is then marked read-only just before launching the SystemServer.
This is based on an idea from the Linux kernel. :^)
If we try to align a number above 0xfffff000 to the next multiple of
the page size (4 KiB), it would wrap around to 0. This is most likely
never what we want, so let's assert if that happens.
Now that we no longer need to support the signal trampolines being
user-accessible inside the kernel memory range, we can get rid of the
"kernel" and "user-accessible" flags on Region and simply use the
address of the region to determine whether it's kernel or user.
This also tightens the page table mapping code, since it can now set
user-accessibility based solely on the virtual address of a page.
This patch adds Space, a class representing a process's address space.
- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.
This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.
Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.
Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
If a TLB flush request is broadcast to other processors and the addresses
to flush are user mode addresses, we can ignore such a request on the
target processor if the page directory currently in use doesn't match
the addresses to be flushed. We still need to broadcast to all processors
in that case because the other processors may switch to that same page
directory at any time.
By designating a committed page pool we can guarantee to have physical
pages available for lazy allocation in mappings. However, when forking
we will overcommit. The assumption is that worst-case it's better for
the fork to die due to insufficient physical memory on COW access than
the parent that created the region. If a fork wants to ensure that all
memory is available (trigger a commit) then it can use madvise.
This also means that fork now can gracefully fail if we don't have
enough physical pages available.
This adds the ability for a Region to define volatile/nonvolatile
areas within mapped memory using madvise(). This also means that
memory purging takes into account all views of the PurgeableVMObject
and only purges memory that is not needed by all of them. When calling
madvise() to change an area to nonvolatile memory, return whether
memory from that area was purged. At that time also try to remap
all memory that is requested to be nonvolatile, and if insufficient
pages are available notify the caller of that fact.
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
Sometimes a physical underlying page may be there, but we may be
unable to allocate a page table that may be needed to map it. Bubble
up such mapping errors so that they can be handled more appropriately.
If allocating a page table triggers purging memory, we need to call
quickmap_pd again to make sure the underlying physical page is
remapped to the correct one. This is needed because purging itself
may trigger calls to ensure_pte as well.
Fixes#3370
Add an ExpandableHeap and switch kmalloc to use it, which allows
for the kmalloc heap to grow as needed.
In order to make heap expansion to work, we keep around a 1 MiB backup
memory region, because creating a region would require space in the
same heap. This means, the heap will grow as soon as the reported
utilization is less than 1 MiB. It will also return memory if an entire
subheap is no longer needed, although that is rarely possible.