Commit graph

88 commits

Author SHA1 Message Date
Andreas Kling
cbcf891040 Kernel: Move select Process members into protected memory
Process member variable like m_euid are very valuable targets for
kernel exploits and until now they have been writable at all times.

This patch moves m_euid along with a whole bunch of other members
into a new Process::ProtectedData struct. This struct is remapped
as read-only memory whenever we don't need to write to it.

This means that a kernel write primitive is no longer enough to
overwrite a process's effective UID, you must first unprotect the
protected data where the UID is stored. :^)
2021-03-10 22:30:02 +01:00
Andreas Kling
ac71775de5 Kernel: Make all syscall functions return KResultOr<T>
This makes it a lot easier to return errors since we no longer have to
worry about negating EFOO errors and can just return them flat.
2021-03-01 13:54:32 +01:00
Andreas Kling
5d180d1f99 Everywhere: Rename ASSERT => VERIFY
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)

Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.

We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
2021-02-23 20:56:54 +01:00
Brian Gianforcaro
26bba8e100 Kernel: Populate ELF::AuxilaryValue::Platform from Processor object.
Move this to the processor object so it can easily be implemented
when Serenity is compiled for a different architecture.
2021-02-21 17:06:24 +01:00
Brian Gianforcaro
a977cdd9ac Kernel: Remove unneeded Thread::set_default_signal_dispositions
The `default_signal_action(u8 signal)` function already has the
full mapping. The only caveat being that now we need to make
sure the thread constructor and clear_signals() method do the work
of resetting the m_signal_action_data array, instead or relying on
the previous logic in set_default_signal_dispositions.
2021-02-21 12:54:39 +01:00
Andreas Kling
6e83be67b8 Kernel: Release ptrace lock in exec before stopping due to PT_TRACE_ME
If we have a tracer process waiting for us to exec, we need to release
the ptrace lock before stopping ourselves, since otherwise the tracer
will block forever on the lock.

Fixes #5409.
2021-02-19 12:13:54 +01:00
Andreas Kling
55a9a4f57a Kernel: Use KResult a bit more in sys$execve() 2021-02-18 09:37:33 +01:00
Andreas Kling
e47bffdc8c Kernel: Add some bits of randomness to the userspace stack pointer
This patch adds a random offset between 0 and 4096 to the initial
stack pointer in new processes. Since the stack has to be 16-byte
aligned, the bottom bits can't be randomized.

Yet another thing to make things less predictable. :^)
2021-02-14 11:53:49 +01:00
Andreas Kling
09b1b09c19 Kernel: Assert if rounding-up-to-page-size would wrap around to 0
If we try to align a number above 0xfffff000 to the next multiple of
the page size (4 KiB), it would wrap around to 0. This is most likely
never what we want, so let's assert if that happens.
2021-02-14 10:01:50 +01:00
Andreas Kling
1593219a41 Kernel: Map signal trampoline into each process's address space
The signal trampoline was previously in kernelspace memory, but with
a special exception to make it user-accessible.

This patch moves it into each process's regular address space so we
can stop supporting user-allowed memory above 0xc0000000.
2021-02-14 01:33:17 +01:00
Andreas Kling
ffdfbf1dba Kernel: Fix wrong sizeof() type in sys$execve() argument overflow check 2021-02-14 00:15:01 +01:00
Andreas Kling
1ef43ec89a Kernel: Move get_interpreter_load_offset() out of Process class
This is only used inside the sys$execve() implementation so just make
it a execve.cpp local function.
2021-02-12 16:30:29 +01:00
Andreas Kling
4ff0f971f7 Kernel: Prevent execve/ptrace race
Add a per-process ptrace lock and use it to prevent ptrace access to a
process after it decides to commit to a new executable in sys$execve().

Fixes #5230.
2021-02-08 23:05:41 +01:00
Andreas Kling
4b7b92c201 Kernel: Remove two unused fields from sys$execve's LoadResult 2021-02-08 22:31:03 +01:00
Andreas Kling
0d7af498d7 Kernel: Move ShouldAllocateTls enum from Process to execve.cpp 2021-02-08 22:24:37 +01:00
Andreas Kling
45231051e6 Kernel: Set the dumpable flag before switching spaces in sys$execve() 2021-02-08 19:15:42 +01:00
Andreas Kling
d746639171 Kernel: Remove outdated code to dump memory layout after exec load 2021-02-08 19:07:29 +01:00
Andreas Kling
f1b5def8fd Kernel: Factor address space management out of the Process class
This patch adds Space, a class representing a process's address space.

- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.

This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.

Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.

Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
2021-02-08 18:27:28 +01:00
Andreas Kling
823186031d Kernel: Add a way to specify which memory regions can make syscalls
This patch adds sys$msyscall() which is loosely based on an OpenBSD
mechanism for preventing syscalls from non-blessed memory regions.

It works similarly to pledge and unveil, you can call it as many
times as you like, and when you're finished, you call it with a null
pointer and it will stop accepting new regions from then on.

If a syscall later happens and doesn't originate from one of the
previously blessed regions, the kernel will simply crash the process.
2021-02-02 20:13:44 +01:00
Andreas Kling
e67402c702 Kernel: Remove Range "valid" state and use Optional<Range> instead
It's easier to understand VM ranges if they are always valid. We can
simply use an empty Optional<Range> to encode absence when needed.
2021-01-27 21:14:42 +01:00
asynts
7cf0c7cc0d Meta: Split debug defines into multiple headers.
The following script was used to make these changes:

    #!/bin/bash
    set -e

    tmp=$(mktemp -d)

    echo "tmp=$tmp"

    find Kernel \( -name '*.cpp' -o -name '*.h' \) | sort > $tmp/Kernel.files
    find . \( -path ./Toolchain -prune -o -path ./Build -prune -o -path ./Kernel -prune \) -o \( -name '*.cpp' -o -name '*.h' \) -print | sort > $tmp/EverythingExceptKernel.files

    cat $tmp/Kernel.files | xargs grep -Eho '[A-Z0-9_]+_DEBUG' | sort | uniq > $tmp/Kernel.macros
    cat $tmp/EverythingExceptKernel.files | xargs grep -Eho '[A-Z0-9_]+_DEBUG' | sort | uniq > $tmp/EverythingExceptKernel.macros

    comm -23 $tmp/Kernel.macros $tmp/EverythingExceptKernel.macros > $tmp/Kernel.unique
    comm -1 $tmp/Kernel.macros $tmp/EverythingExceptKernel.macros > $tmp/EverythingExceptKernel.unique

    cat $tmp/Kernel.unique | awk '{ print "#cmakedefine01 "$1 }' > $tmp/Kernel.header
    cat $tmp/EverythingExceptKernel.unique | awk '{ print "#cmakedefine01 "$1 }' > $tmp/EverythingExceptKernel.header

    for macro in $(cat $tmp/Kernel.unique)
    do
        cat $tmp/Kernel.files | xargs grep -l $macro >> $tmp/Kernel.new-includes ||:
    done
    cat $tmp/Kernel.new-includes | sort > $tmp/Kernel.new-includes.sorted

    for macro in $(cat $tmp/EverythingExceptKernel.unique)
    do
        cat $tmp/Kernel.files | xargs grep -l $macro >> $tmp/Kernel.old-includes ||:
    done
    cat $tmp/Kernel.old-includes | sort > $tmp/Kernel.old-includes.sorted

    comm -23 $tmp/Kernel.new-includes.sorted $tmp/Kernel.old-includes.sorted > $tmp/Kernel.includes.new
    comm -13 $tmp/Kernel.new-includes.sorted $tmp/Kernel.old-includes.sorted > $tmp/Kernel.includes.old
    comm -12 $tmp/Kernel.new-includes.sorted $tmp/Kernel.old-includes.sorted > $tmp/Kernel.includes.mixed

    for file in $(cat $tmp/Kernel.includes.new)
    do
        sed -i -E 's/#include <AK\/Debug\.h>/#include <Kernel\/Debug\.h>/' $file
    done

    for file in $(cat $tmp/Kernel.includes.mixed)
    do
        echo "mixed include in $file, requires manual editing."
    done
2021-01-26 21:20:00 +01:00
Andreas Kling
c7858622ec Kernel: Update process promise states on execve() and fork()
We now move the execpromises state into the regular promises, and clear
the execpromises state.

Also make sure to duplicate the promise state on fork.

This fixes an issue where "su" would launch a shell which immediately
crashed due to not having pledged "stdio".
2021-01-26 15:26:37 +01:00
Andreas Kling
1e25d2b734 Kernel: Remove allocate_region() functions that don't take a Range
Let's force callers to provide a VM range when allocating a region.
This makes ENOMEM error handling more visible and removes implicit
VM allocation which felt a bit magical.
2021-01-26 14:13:57 +01:00
asynts
eea72b9b5c Everywhere: Hook up remaining debug macros to Debug.h. 2021-01-25 09:47:36 +01:00
asynts
acdcf59a33 Everywhere: Remove unnecessary debug comments.
It would be tempting to uncomment these statements, but that won't work
with the new changes.

This was done with the following commands:

    find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec awk -i inplace '$0 !~ /\/\/#define/ { if (!toggle) { print; } else { toggle = !toggle } } ; $0 ~/\/\/#define/ { toggle = 1 }' {} \;

    find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec awk -i inplace '$0 !~ /\/\/ #define/ { if (!toggle) { print; } else { toggle = !toggle } } ; $0 ~/\/\/ #define/ { toggle = 1 }' {} \;
2021-01-25 09:47:36 +01:00
Luke
50a2cb38e5 Kernel: Fix two error codes being returned as positive in Process::exec
This made the assertion on line 921 think it was a successful exec, when it wasn't.

Fixes #5084
2021-01-24 01:06:24 +01:00
Andreas Kling
54f421e170 Kernel: Clear coredump metadata on exec()
If for some reason a process wants to exec after saving some coredump
metadata, we should just throw away the data.
2021-01-23 09:41:11 +01:00
Andreas Kling
19d3f8cab7 Kernel+LibC: Turn errno codes into a strongly typed enum
..and allow implicit creation of KResult and KResultOr from ErrnoCode.
This means that kernel functions that return those types can finally
do "return EINVAL;" and it will just work.

There's a handful of functions that still deal with signed integers
that should be converted to return KResults.
2021-01-20 23:20:02 +01:00
Tom
1d621ab172 Kernel: Some futex improvements
This adds support for FUTEX_WAKE_OP, FUTEX_WAIT_BITSET, FUTEX_WAKE_BITSET,
FUTEX_REQUEUE, and FUTEX_CMP_REQUEUE, as well well as global and private
futex and absolute/relative timeouts against the appropriate clock. This
also changes the implementation so that kernel resources are only used when
a thread is blocked on a futex.

Global futexes are implemented as offsets in VMObjects, so that different
processes can share a futex against the same VMObject despite potentially
being mapped at different virtual addresses.
2021-01-17 20:30:31 +01:00
Andreas Kling
992f513ad2 Kernel: Limit exec arguments and environment to 1/8th of stack each
This sort-of matches what some other systems do and seems like a
generally sane thing to do instead of allowing programs to spawn a
child with a nearly full stack.
2021-01-17 18:29:56 +01:00
Andreas Kling
bf0719092f Kernel+Userland: Remove shared buffers (shbufs)
All users of this mechanism have been switched to anonymous files and
passing file descriptors with sendfd()/recvfd().

Shbufs got us where we are today, but it's time we say good-bye to them
and welcome a much more idiomatic replacement. :^)
2021-01-17 09:07:32 +01:00
Brendan Coles
1fa9d9dd68 Kernel: execve: find_elf_interpreter_for_executable: Fix dbgln 2021-01-16 22:36:46 +01:00
Linus Groh
1ccc2e6482 Kernel: Store process arguments and environment in coredumps
Currently they're only pushed onto the stack but not easily accessible
from the Process class, so this adds a Vector<String> for both.
2021-01-15 23:26:47 +01:00
Andreas Kling
64b0d89335 Kernel: Make Process::allocate_region*() return KResultOr<Region*>
This allows region allocation to return specific errors and we don't
have to assume every failure is an ENOMEM.
2021-01-15 19:10:30 +01:00
Lenny Maiorani
e6f907a155 AK: Simplify constructors and conversions from nullptr_t
Problem:
- Many constructors are defined as `{}` rather than using the ` =
  default` compiler-provided constructor.
- Some types provide an implicit conversion operator from `nullptr_t`
  instead of requiring the caller to default construct. This violates
  the C++ Core Guidelines suggestion to declare single-argument
  constructors explicit
  (https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c46-by-default-declare-single-argument-constructors-explicit).

Solution:
- Change default constructors to use the compiler-provided default
  constructor.
- Remove implicit conversion operators from `nullptr_t` and change
  usage to enforce type consistency without conversion.
2021-01-12 09:11:45 +01:00
Andreas Kling
5dafb72370 Kernel+Profiler: Make profiling per-process and without core dumps
This patch merges the profiling functionality in the kernel with the
performance events mechanism. A profiler sample is now just another
perf event, rather than a dedicated thing.

Since perf events were already per-process, this now makes profiling
per-process as well.

Processes with perf events would already write out a perfcore.PID file
to the current directory on death, but since we may want to profile
a process and then let it continue running, recorded perf events can
now be accessed at any time via /proc/PID/perf_events.

This patch also adds information about process memory regions to the
perfcore JSON format. This removes the need to supply a core dump to
the Profiler app for symbolication, and so the "profiler coredump"
mechanism is removed entirely.

There's still a hard limit of 4MB worth of perf events per process,
so this is by no means a perfect final design, but it's a nice step
forward for both simplicity and stability.

Fixes #4848
Fixes #4849
2021-01-11 11:36:00 +01:00
Itamar
f259d96871 Kernel: Avoid collision between dynamic loader and main program
When loading non position-independent programs, we now take care not to
load the dynamic loader at an address that collides with the location
the main program wants to load at.

Fixes #4847.
2021-01-10 22:04:43 +01:00
Itamar
40a8159c62 Kernel: Plumb the elf header of the main program down to Process::load
This will enable us to take the desired load address of non-position
independent programs into account when randomizing the load address
of the dynamic loader.
2021-01-10 22:04:43 +01:00
William Marlow
747e8de96a Kernel+Loader.so: Allow dynamic executables without an interpreter
Commit a3a9016701 removed the PT_INTERP header
from Loader.so which cleaned up some kernel code in execve. Unfortunately
it prevents Loader.so from being run as an executable
2021-01-03 19:45:16 +01:00
Andreas Kling
5dae85afe7 Kernel: Pass "shared" flag to Region constructor
Before this change, we would sometimes map a region into the address
space with !is_shared(), and then moments later call set_shared(true).

I found this very confusing while debugging, so this patch makes us pass
the initial shared flag to the Region constructor, ensuring that it's in
the correct state by the time we first map the region.
2021-01-02 16:57:31 +01:00
Tom
e87eaf5df0 Kernel: Fix memory corruption when rolling back regions in execve
We need to free the regions before reverting the paging scope to the
original one when rolling back changes due to an error. This fixes
silent memory corruption.
2021-01-01 23:43:44 +01:00
Tom
bf9be3ec01 Kernel: More gracefully handle out-of-memory when creating PageDirectory 2021-01-01 23:43:44 +01:00
Tom
476f17b3f1 Kernel: Merge PurgeableVMObject into AnonymousVMObject
This implements memory commitments and lazy-allocation of committed
memory.
2021-01-01 23:43:44 +01:00
Andrew Kaster
a3a9016701 DynamicLoader: Tell the linker to not add a PT_INTERP header
Use the GNU LD option --no-dynamic-linker. This allows uncommenting some
code in the Kernel that gets upset if your ELF interpreter has its own
interpreter.
2021-01-01 02:12:28 +01:00
Andreas Kling
1cfdaf96c4 Kernel: Reset the process dumpable flag on successful non-setid exec
Once we've committed to a new memory layout and non-setid credentials,
we can reset the dumpable flag.
2020-12-26 01:31:24 +01:00
Andreas Kling
82f86e35d6 Kernel+LibC: Introduce a "dumpable" flag for processes
This new flag controls two things:
- Whether the kernel will generate core dumps for the process
- Whether the EUID:EGID should own the process's files in /proc

Processes are automatically made non-dumpable when their EUID or EGID is
changed, either via syscalls that specifically modify those ID's, or via
sys$execve(), when a set-uid or set-gid program is executed.

A process can change its own dumpable flag at any time by calling the
new sys$prctl(PR_SET_DUMPABLE) syscall.

Fixes #4504.
2020-12-25 19:35:55 +01:00
Andreas Kling
89d3b09638 Kernel: Allocate new main thread stack before committing to exec
If the allocation fails (e.g ENOMEM) we want to simply return an error
from sys$execve() and continue executing the current executable.

This patch also moves make_userspace_stack_for_main_thread() out of the
Thread class since it had nothing in particular to do with Thread.
2020-12-25 16:22:01 +01:00
Andreas Kling
2f1712cc29 Kernel: Move ELF auxiliary vector building out of Process class
Process had a couple of members whose only purpose was holding on to
some temporary data while building the auxiliary vector. Remove those
members and move the vector building to a free function in execve.cpp
2020-12-25 15:23:35 +01:00
Andreas Kling
40e9edd798 LibELF: Move AuxiliaryValue into the ELF namespace 2020-12-25 14:48:30 +01:00
Andreas Kling
6c9a6bea1e Kernel+LibELF: Abort ELF executable load sooner when something fails
Make it possible to bail out of ELF::Image::for_each_program_header()
and then do exactly that if something goes wrong during executable
loading in the kernel.

Also make the errors we return slightly more nuanced than just ENOEXEC.
2020-12-25 14:42:42 +01:00