Commit graph

102 commits

Author SHA1 Message Date
Tom
5b38132e3c Kernel: Protect the PageDirectory from concurrent access 2020-11-11 12:27:25 +01:00
Tom
c8d9f1b9c9 Kernel: Make copy_to/from_user safe and remove unnecessary checks
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.

So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.

To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.

Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
2020-09-13 21:19:15 +02:00
Tom
bf268a0185 Kernel: Handle committing pages in regions more gracefully
Sometimes a physical underlying page may be there, but we may be
unable to allocate a page table that may be needed to map it. Bubble
up such mapping errors so that they can be handled more appropriately.
2020-09-02 00:35:56 +02:00
Tom
30d36a3ad1 Kernel: Remove assertion from Region::commit
We should be able to gracefully fail a commit in low-memory situations.
2020-09-01 22:08:43 +02:00
Ben Wiederhake
081bb29626 Kernel: Unbreak building with extra debug macros, part 2 2020-08-30 09:43:49 +02:00
Tom
67dbb56444 Kernel: Release page tables when no longer needed
When unmapping regions, check if page tables can be freed.

This is a follow-up change for #3254.
2020-08-28 09:21:24 +02:00
Tom
06d50f64b0 Kernel: Aggregate TLB flush requests for Regions for SMP
Rather than sending one TLB flush request for each page,
aggregate them so that we're not spamming the other
processors with FlushTLB IPIs.
2020-07-06 22:39:06 +02:00
Tom
bc107d0b33 Kernel: Add SMP IPI support
We can now properly initialize all processors without
crashing by sending SMP IPI messages to synchronize memory
between processors.

We now initialize the APs once we have the scheduler running.
This is so that we can process IPI messages from the other
cores.

Also rework interrupt handling a bit so that it's more of a
1:1 mapping. We need to allocate non-sharable interrupts for
IPIs.

This also fixes the occasional hang/crash because all
CPUs now synchronize memory with each other.
2020-07-06 17:07:44 +02:00
Tom
9b4e6f6a23 Kernel: Consolidate features into CPUFeature enum
This allows us to consolidate printing out all the CPU features
into one log statement. Also expose them in /proc/cpuinfo
2020-07-03 19:32:34 +02:00
Tom
16783bd14d Kernel: Turn Thread::current and Process::current into functions
This allows us to query the current thread and process on a
per processor basis
2020-07-01 12:07:01 +02:00
Tom
841364b609 Kernel: Add mechanism to identity map the lowest 2MB 2020-06-04 18:15:23 +02:00
Andreas Kling
6fe83b0ac4 Kernel: Crash the current process on OOM (instead of panicking kernel)
This patch adds PageFaultResponse::OutOfMemory which informs the fault
handler that we were unable to allocate a necessary physical page and
cannot continue.

In response to this, the kernel will crash the current process. Because
we are OOM, we can't symbolicate the crash like we normally would
(since the ELF symbolication code needs to allocate), so we also
communicate to Process::crash() that we're out of memory.

Now we can survive "allocate 300 MB" (only the allocate process dies.)
This is definitely not perfect and can easily end up killing a random
innocent other process who happened to allocate one page at the wrong
time, but it's a *lot* better than panicking on OOM. :^)
2020-05-06 22:28:23 +02:00
Andreas Kling
c633c1c2ea Kernel: Assert on OOM in Region::commit()
This function has a lot of callers that don't bother checking if it
returns successfully or not. We'll need to handle failure in a bunch
of places and then we can remove this assertion.
2020-05-06 22:28:23 +02:00
Andreas Kling
4419685b7e Kernel: Leave VMObject alone on OOM during CoW fault
If we OOM during a CoW fault and fail to allocate a new page for the
writing process, just leave the original VMObject alone so everyone
else can keep using it.
2020-04-28 17:05:14 +02:00
Andreas Kling
9c856811b2 Kernel: Add Region helpers for accessing underlying physical pages
Since a Region is basically a view into a potentially larger VMObject,
it was always necessary to include the Region starting offset when
accessing its underlying physical pages.

Until now, you had to do that manually, but this patch adds a simple
Region::physical_page() for read-only access and a physical_page_slot()
when you want a mutable reference to the RefPtr<PhysicalPage> itself.

A lot of code is simplified by making use of this.
2020-04-28 17:05:14 +02:00
Andreas Kling
c19b56dc99 Kernel+LibC: Add minherit() and MAP_INHERIT_ZERO
This patch adds the minherit() syscall originally invented by OpenBSD.
Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap
region, that region will be zeroed out on fork().
2020-04-12 20:22:26 +02:00
Andreas Kling
522d8c5d71 Kernel: Non-readable-but-writable regions should still be mapped
Fixes #1436.
2020-04-03 10:10:56 +02:00
Andreas Kling
7d862dd5fc AK: Reduce header dependency graph of String.h
String.h no longer pulls in StringView.h. We do this by moving a bunch
of String functions out-of-line.
2020-03-23 13:48:44 +01:00
Andreas Kling
fa9fba6901 Kernel: Add missing #includes now that <AK/StdLibExtras.h> is smaller 2020-03-08 13:06:51 +01:00
Andreas Kling
7a6c4a72d5 LibWeb: Move everything into the Web namespace 2020-03-07 10:27:02 +01:00
Andreas Kling
c6693f9b3a Kernel: Simplify a bunch of dbg() and klog() calls
LogStream can handle VirtualAddress and PhysicalAddress directly.
2020-03-06 15:00:44 +01:00
Andreas Kling
94f287b1c0 Kernel: Unmap non-readable pages
This was caught by running all crash tests with "crash -A".
Basically, non-readable pages need to not be mapped *at all* so that
a "page not present" exception is provoked on access.

Unfortunately x86 does not support write-only mappings, so this is
the best we can do.

Fixes #1336.
2020-03-06 09:58:59 +01:00
Liav A
0fc60e41dd Kernel: Use klog() instead of kprintf()
Also, duplicate data in dbg() and klog() calls were removed.
In addition, leakage of virtual address to kernel log is prevented.
This is done by replacing kprintf() calls to dbg() calls with the
leaked data instead.
Also, other kprintf() calls were replaced with klog().
2020-03-02 22:23:39 +01:00
Andreas Kling
fee20bd8de Kernel: Remove some more harmless InodeVMObject miscasts 2020-03-01 12:27:03 +01:00
Andreas Kling
48bbfe51fb Kernel: Add some InodeVMObject type assertions in Region::clone()
Let's make sure that we're never cloning shared inode-backed objects
as if they were private, and vice versa.
2020-03-01 11:23:10 +01:00
Andreas Kling
88b334135b Kernel: Remove some Region construction helpers
It's now up to the caller to provide a VMObject when constructing a new
Region object. This will make it easier to handle things going wrong,
like allocation failures, etc.
2020-03-01 11:23:10 +01:00
Andreas Kling
fddc3c957b Kernel: CoW-clone private inode-backed memory regions on fork()
When forking a process, we now turn all of the private inode-backed
mmap() regions into copy-on-write regions in both the parent and child.

This patch also removes an assertion that becomes irrelevant.
2020-03-01 11:23:10 +01:00
Andreas Kling
7cd1bdfd81 Kernel: Simplify some dbg() logging
We don't have to log the process name/PID/TID, dbg() automatically adds
that as a prefix to every line.

Also we don't have to do .characters() on Strings passed to dbg() :^)
2020-02-29 13:39:06 +01:00
Andreas Kling
aa1e209845 Kernel: Remove some unnecessary indirection in InodeFile::mmap()
InodeFile now directly calls Process::allocate_region_with_vmobject()
instead of taking an awkward detour via a special Region constructor.
2020-02-28 20:29:14 +01:00
Andreas Kling
651417a085 Kernel: Split InodeVMObject into two subclasses
We now have PrivateInodeVMObject and SharedInodeVMObject, corresponding
to MAP_PRIVATE and MAP_SHARED respectively.

Note that PrivateInodeVMObject is not used yet.
2020-02-28 20:20:35 +01:00
Andreas Kling
07a26aece3 Kernel: Rename InodeVMObject => SharedInodeVMObject 2020-02-28 20:07:51 +01:00
Liav A
24d2aeda8e Region: Use dbg() instead of dbgprintf() 2020-02-27 13:05:12 +01:00
Andreas Kling
0763f67043 AK: Make Bitmap use size_t for its size
Also rework its API's to return Optional<size_t> instead of int with -1
as the error value.
2020-02-24 09:56:07 +01:00
Andreas Kling
04e40da188 Kernel: Fix crash when reading /proc/PID/vmobjects
InodeVMObjects can have nulled-out physical page slots. That just means
we haven't cached that page from disk right now.
2020-02-21 16:03:56 +01:00
Andreas Kling
48f7c28a5c Kernel: Replace "current" with Thread::current and Process::current
Suggested by Sergey. The currently running Thread and Process are now
Thread::current and Process::current respectively. :^)
2020-02-17 15:04:27 +01:00
Andreas Kling
1d611e4a11 Kernel: Reduce header dependencies of MemoryManager and Region 2020-02-16 01:33:41 +01:00
Andreas Kling
a356e48150 Kernel: Move all code into the Kernel namespace 2020-02-16 01:27:42 +01:00
Andreas Kling
c624d3875e Kernel: Use a shared physical page for zero-filled pages until written
This patch adds a globally shared zero-filled PhysicalPage that will
be mapped into every slot of every zero-filled AnonymousVMObject until
that page is written to, achieving CoW-like zero-filled pages.

Initial testing show that this doesn't actually achieve any sharing yet
but it seems like a good design regardless, since it may reduce the
number of page faults taken by programs.

If you look at the refcount of MM.shared_zero_page() it will have quite
a high refcount, but that's just because everything maps it everywhere.
If you want to see the "real" refcount, you can build with the
MAP_SHARED_ZERO_PAGE_LAZILY flag, and we'll defer mapping of the shared
zero page until the first NP read fault.

I've left this behavior behind a flag for future testing of this code.
2020-02-15 13:17:40 +01:00
Andreas Kling
00d8ec3ead Kernel: The inode fault handler should grab the VMObject lock earlier
It doesn't look healthy to create raw references into an array before
a temporary unlock. In fact, that temporary unlock looks generally
unhealthy, but it's a different problem.
2020-02-08 12:55:21 +01:00
Andreas Kling
a9d7902bb7 x86: Simplify region unmapping a bit
Add PageTableEntry::clear() to zero out a whole PTE, and use that for
unmapping instead of clearing individual fields.
2020-02-08 12:49:38 +01:00
Andreas Kling
f91b3aab47 Kernel: Cloned shared regions should also be marked as shared 2020-02-08 02:39:46 +01:00
Andreas Kling
f38cfb3562 Kernel: Tidy up debug logging a little bit
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.

This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
2020-01-21 16:16:20 +01:00
Andreas Kling
4ebff10bde Kernel: Write-only regions should still be mapped as present
There is no real "read protection" on x86, so we have no choice but to
map write-only pages simply as "present & read/write".

If we get a read page fault in a non-readable region, that's still a
correctness issue, so we crash the process. It's by no means a complete
protection against invalid reads, since it's trivial to fool the kernel
by first causing a write fault in the same region.
2020-01-20 13:13:03 +01:00
Andreas Kling
f7b394e9a1 Kernel: Assert that copy_to/from_user() are called with user addresses
This will panic the kernel immediately if these functions are misused
so we can catch it and fix the misuse.

This patch fixes a couple of misuses:

    - create_signal_trampolines() writes to a user-accessible page
      above the 3GB address mark. We should really get rid of this
      page but that's a whole other thing.

    - CoW faults need to use copy_from_user rather than copy_to_user
      since it's the *source* pointer that points to user memory.

    - Inode faults need to use memcpy rather than copy_to_user since
      we're copying a kernel stack buffer into a quickmapped page.

This should make the copy_to/from_user() functions slightly less useful
for exploitation. Before this, they were essentially just glorified
memcpy() with SMAP disabled. :^)
2020-01-19 09:18:55 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Andreas Kling
e362b56b4f Kernel: Move kernel above the 3GB virtual address mark
The kernel and its static data structures are no longer identity-mapped
in the bottom 8MB of the address space, but instead move above 3GB.

The first 8MB above 3GB are pseudo-identity-mapped to the bottom 8MB of
the physical address space. But things don't have to stay this way!

Thanks to Jesse who made an earlier attempt at this, it was really easy
to get device drivers working once the page tables were in place! :^)

Fixes #734.
2020-01-17 22:34:26 +01:00
Liav A
d2b41010c5 Kernel: Change Region allocation helpers
We now can create a cacheable Region, so when map() is called, if a
Region is cacheable then all the virtual memory space being allocated
to it will be marked as not cache disabled.

In addition to that, OS components can create a Region that will be
mapped to a specific physical address by using the appropriate helper
method.
2020-01-14 15:38:58 +01:00
Andreas Kling
5c3c2a9bac Kernel: Copy Region's "is_mmap" flag when cloning regions for fork()
Otherwise child processes will not be allowed to munmap(), madvise(),
etc. on the cloned regions!
2020-01-10 19:24:01 +01:00
Andreas Kling
197e73ee31 Kernel+LibELF: Enable SMAP protection during non-syscall exec()
When loading a new executable, we now map the ELF image in kernel-only
memory and parse it there. Then we use copy_to_user() when initializing
writable regions with data from the executable.

Note that the exec() syscall still disables SMAP protection and will
require additional work. This patch only affects kernel-originated
process spawns.
2020-01-10 10:57:06 +01:00
Andreas Kling
9eef39d68a Kernel: Start implementing x86 SMAP support
Supervisor Mode Access Prevention (SMAP) is an x86 CPU feature that
prevents the kernel from accessing userspace memory. With SMAP enabled,
trying to read/write a userspace memory address while in the kernel
will now generate a page fault.

Since it's sometimes necessary to read/write userspace memory, there
are two new instructions that quickly switch the protection on/off:
STAC (disables protection) and CLAC (enables protection.)
These are exposed in kernel code via the stac() and clac() helpers.

There's also a SmapDisabler RAII object that can be used to ensure
that you don't forget to re-enable protection before returning to
userspace code.

THis patch also adds copy_to_user(), copy_from_user() and memset_user()
which are the "correct" way of doing things. These functions allow us
to briefly disable protection for a specific purpose, and then turn it
back on immediately after it's done. Going forward all kernel code
should be moved to using these and all uses of SmapDisabler are to be
considered FIXME's.

Note that we're not realizing the full potential of this feature since
I've used SmapDisabler quite liberally in this initial bring-up patch.
2020-01-05 18:14:51 +01:00