The function to protect ksyms after initialization, is only used during
boot of the system, so it can be UNMAP_AFTER_INIT as well.
This requires we switch the order of the init sequence, so we now call
`MM.protect_ksyms_after_init()` before `MM.unmap_text_after_init()`.
The Prekernel's memory is only accessed until MemoryManager has been
initialized. Keeping them around afterwards is both unnecessary and bad,
as it prevents the userland from using the 0x100000-0x155000 virtual
address range.
Co-authored-by: Idan Horowitz <idan.horowitz@gmail.com>
We can leave the .ksyms section mapped-but-read-only and then have the
symbols index simply point into it.
Note that we manually insert null-terminators into the symbols section
while parsing it.
This gets rid of ~950 KiB of kmalloc_eternal() at startup. :^)
It's not enough to just find the largest-address-not-above the argument,
we must also check that the found region actually contains the argument.
Regressed in a23edd42b8, thanks to Idan
for pointing this out.
We were already doing this for userspace memory regions (in the
Memory::AddressSpace class), so let's do it for kernel regions as well.
This gives a nice speed-up on test-js and probably basically everything
else as well. :^)
SIGSTKFLT is a signal that signifies a stack fault in a x87 coprocessor,
this signal is not POSIX and also unused by Linux and the BSDs, so let's
use SIGSEGV so programs that setup signal handlers for the common
signals could still handle them in serenity.
This is a handy helper that copies out the full contents of a physical
page into a caller-provided buffer. It uses quickmapping internally
(and takes the MM lock for the duration.)
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.
Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
This small change simplifies the function a bit but also fixes a problem
with it.
Let's take an example to see this:
Let's say we have a reserved range between 0xe0000 to 0xfffff (EBDA),
then we want to map from the memory device (/dev/mem) the entire
EBDA to a program. If a program tries to map more than 131072 bytes,
the current logic will work - the start address is 0xe0000, and ofcourse
it's below the limit, hence it passes the first two restrictions.
Then, the third if statement will fail if we try to mmap more than
the said allowed bytes.
However, let's take another scenario, where we try to mmap from
0xf0000 - but we try to mmap less than 131072 - but more than 65536.
In such case, we again pass the first two if statements, but the third
one is passed two, because it doesn't take into account the offseted
address from the start of the reserved range (0xe0000). In such case,
a user can easily mmap 65535 bytes above 0x100000. This might
seem negligible. However, it's still a severe bug that can theoretically
be exploited into a info leak or tampering with important kernel
structures.
When testing the RTL8168 driver, it seems we can't allocate super pages
anymore. Either we expand the super pages range, or find a solution to
dynamically expand the range (or let drivers utilize other ranges).
This expands the reach of error propagation greatly throughout the
kernel. Sadly, it also exposes the fact that we're allocating (and
doing other fallible things) in constructors all over the place.
This patch doesn't attempt to address that of course. That's work for
our future selves.
We have seen cases where the map fails, but we return the region
to the caller, causing them to page fault later on when they touch
the region.
The fix is to always observe the return code of map/remap.
The quickmap_page() and unquickmap_page() functions are used to map a
single physical page at a kernel virtual address for temporary access.
These use the per-CPU quickmap buffer in the page tables, and access to
this is guarded by the MM lock. To prevent bugs, quickmap_page() should
not *take* the MM lock, but rather verify that it is already held!
This exposed two situations where we were using quickmap without holding
the MM lock during page fault handling. This patch is forced to fix
these issues (which is great!) :^)
The VMObject class now manages its own instance list (it was previously
a member of MemoryManager.) Removal from the list is done safely on the
last unref(), closing a race window in the previous implementation.
Note that VMObject::all_instances() now has its own lock instead of
using the global MM lock.
This makes for nicer handling of errors compared to checking whether a
RefPtr is null. Additionally, this will give way to return different
types of errors in the future.
We don't want to be holding the MM lock if it's a user region and we
have to consult the page directory, since that can lead to a deadlock
if we don't already have the page directory lock.
Taking a reference or a pointer to a value that's not aligned properly
is undefined behavior. While `[[gnu::packed]]` ensures that reads from
and writes to fields of packed structs is a safe operation, the
information about the reduced alignment is lost when creating pointers
to these values.
Weirdly enough, GCC's undefined behavior sanitizer doesn't flag these,
even though the doc of `-Waddress-of-packed-member` says that it usually
leads to UB. In contrast, x86_64 Clang does flag these, which renders
the 64-bit kernel unable to boot.
For now, the `address-of-packed-member` warning will only be enabled in
the kernel, as it is absolutely crucial there because of KUBSAN, but
might get excessively noisy for the userland in the future.
Also note that we can't append to `CMAKE_CXX_FLAGS` like we do for other
flags in the kernel, because flags added via `add_compile_options` come
after these, so the `-Wno-address-of-packed-member` in the root would
cancel it out.
When booting AP's, we identity map a region at 0x8000 while doing the
initial bringup sequence. This is the only thing in the kernel that
requires an identity mapping, yet we had a bunch of generic API's and a
dedicated VirtualRangeAllocator in every PageDirectory for this purpose.
This patch simplifies the situation by moving the identity mapping logic
to the AP boot code and removing the generic API's.
...and also RangeAllocator => VirtualRangeAllocator.
This clarifies that the ranges we're dealing with are *virtual* memory
ranges and not anything else.