Commit graph

834 commits

Author SHA1 Message Date
Andreas Kling
7c916b9fe9 Kernel: Make realpath() take path+length, get rid of SmapDisabler 2020-01-06 11:32:25 +01:00
Andreas Kling
d6b06fd5a3 Kernel: Make watch_file() syscall take path length as a size_t
We don't care to handle negative path lengths anyway.
2020-01-06 11:15:49 +01:00
Andreas Kling
cf7df95ffe Kernel: Use get_syscall_path_argument() for syscalls that take paths 2020-01-06 11:15:49 +01:00
Andreas Kling
0df72d4712 Kernel: Pass path+length to mkdir(), rmdir() and chmod() 2020-01-06 11:15:49 +01:00
Andreas Kling
642137f014 Kernel: Make access() take path+length
Also, let's return EFAULT for nullptr at the LibC layer. We can't do
all bad addresses this way, but we can at least do null. :^)
2020-01-06 11:15:48 +01:00
Andreas Kling
2c3a6c37ac Kernel: Paper over SMAP violations in clock_{gettime,nanosleep}()
Just put some SmapDisablers here to unbreak the nesalizer port.
2020-01-05 23:20:33 +01:00
Andreas Kling
c5890afc8b Kernel: Make chdir() take path+length 2020-01-05 22:06:25 +01:00
Andreas Kling
f231e9ea76 Kernel: Pass path+length to the stat() and lstat() syscalls
It's not pleasant having to deal with null-terminated strings as input
to syscalls, so let's get rid of them one by one.
2020-01-05 22:02:54 +01:00
Andreas Kling
152a83fac5 Kernel: Remove SmapDisabler in watch_file() 2020-01-05 21:55:20 +01:00
Andreas Kling
80cbb72f2f Kernel: Remove SmapDisablers in open(), openat() and set_thread_name()
This patch introduces a helpful copy_string_from_user() function
that takes a bounded null-terminated string from userspace memory
and copies it into a String object.
2020-01-05 21:51:06 +01:00
Andreas Kling
c4a1ea34c2 Kernel: Fix SMAP violation in writev() syscall 2020-01-05 19:20:08 +01:00
Andreas Kling
9eef39d68a Kernel: Start implementing x86 SMAP support
Supervisor Mode Access Prevention (SMAP) is an x86 CPU feature that
prevents the kernel from accessing userspace memory. With SMAP enabled,
trying to read/write a userspace memory address while in the kernel
will now generate a page fault.

Since it's sometimes necessary to read/write userspace memory, there
are two new instructions that quickly switch the protection on/off:
STAC (disables protection) and CLAC (enables protection.)
These are exposed in kernel code via the stac() and clac() helpers.

There's also a SmapDisabler RAII object that can be used to ensure
that you don't forget to re-enable protection before returning to
userspace code.

THis patch also adds copy_to_user(), copy_from_user() and memset_user()
which are the "correct" way of doing things. These functions allow us
to briefly disable protection for a specific purpose, and then turn it
back on immediately after it's done. Going forward all kernel code
should be moved to using these and all uses of SmapDisabler are to be
considered FIXME's.

Note that we're not realizing the full potential of this feature since
I've used SmapDisabler quite liberally in this initial bring-up patch.
2020-01-05 18:14:51 +01:00
Andreas Kling
1525c11928 Kernel: Add missing iovec base validation for writev() syscall
We were forgetting to validate the base pointers of iovecs passed into
the writev() syscall.

Thanks to braindead for finding this bug! :^)
2020-01-05 10:38:02 +01:00
Andreas Kling
c89fe8a6a3 Kernel: Fix bad TOCTOU pattern in syscalls that take a parameter struct
Our syscall calling convention only allows passing up to 3 arguments in
registers. For syscalls that take more arguments, we bake them into a
struct and pass a pointer to that struct instead.

When doing pointer validation, this is what we would do:

    1) Validate the "params" struct
    2) Validate "params->some_pointer"
    3) ... other stuff ...
    4) Use "params->some_pointer"

Since the parameter struct is stored in userspace, it can be modified
by userspace after validation has completed.

This was a recurring pattern in many syscalls that was further hidden
by me using structured binding declarations to give convenient local
names to things in the parameter struct:

    auto& [some_pointer, ...] = *params;
    memcpy(some_pointer, ...);

This devilishly makes "some_pointer" look like a local variable but
it's actually more like an alias for "params->some_pointer" and will
expand to a dereference when accessed!

This patch fixes the issues by explicitly copying out each member from
the parameter structs before validating them, and then never using
the "param" pointers beyond that.

Thanks to braindead for finding this bug! :^)
2020-01-05 10:37:57 +01:00
Andreas Kling
3a27790fa7 Kernel: Use Thread::from_tid() in more places 2020-01-04 18:56:04 +01:00
Andreas Kling
95ba0d5a02 Kernel: Remove unused "putch" syscall 2020-01-04 16:00:25 +01:00
Andreas Kling
5abc30e057 Kernel: Allow setgroups() to drop all groups with nullptr
Previously we'd EFAULT for setgroups(0, nullptr), but we can just as
well tolerate it if someone wants to drop groups without a pointer.
2020-01-04 13:47:54 +01:00
Andreas Kling
d84299c7be Kernel: Allow fchmod() and fchown() on pre-bind() local sockets
In order to ensure a specific owner and mode when the local socket
filesystem endpoint is instantiated, we need to be able to call
fchmod() and fchown() on a socket fd between socket() and bind().

This is because until we call bind(), there is no filesystem inode
for the socket yet.
2020-01-03 20:14:56 +01:00
Andreas Kling
1dc64ec064 Kernel: Remove unnecessary logic in kill() and killpg() syscalls
As Sergey pointed out, do_killpg() already interprets PID 0 as the
PGID of the calling process.
2020-01-03 12:58:59 +01:00
Andreas Kling
9026598999 Kernel: Add a more expressive API for getting random bytes
We now have these API's in <Kernel/Random.h>:

    - get_fast_random_bytes(u8* buffer, size_t buffer_size)
    - get_good_random_bytes(u8* buffer, size_t buffer_size)
    - get_fast_random<T>()
    - get_good_random<T>()

Internally they both use x86 RDRAND if available, otherwise they fall
back to the same LCG we had in RandomDevice all along.

The main purpose of this patch is to give kernel code a way to better
express its needs for random data.

Randomness is something that will require a lot more work, but this is
hopefully a step in the right direction.
2020-01-03 12:43:07 +01:00
Andreas Kling
24cc67d199 Kernel: Remove read_tsc() syscall
Since nothing is using this, let's just remove it. That's one less
thing to worry about.
2020-01-03 09:27:09 +01:00
Andreas Kling
8cc5fa5598 Kernel: Unbreak module loading (broke with NX bit changes)
Modules are now mapped fully RWX. This can definitely be improved,
but at least it unbreaks the feature for now.
2020-01-03 03:44:55 +01:00
Andreas Kling
0a1865ebc6 Kernel: read() and write() should fail with EBADF for wrong mode fd's
It was previously possible to write to read-only file descriptors,
and read from write-only file descriptors.

All FileDescription objects now start out non-readable + non-writable,
and whoever is creating them has to "manually" enable reading/writing
by calling set_readable() and/or set_writable() on them.
2020-01-03 03:29:59 +01:00
Andreas Kling
15f3abc849 Kernel: Handle O_DIRECTORY in VFS::open() instead of in each syscall
Just taking care of some FIXMEs.
2020-01-03 03:16:29 +01:00
Andreas Kling
05653a9189 Kernel: killpg() with pgrp=0 should signal every process in the group
In the same group as the calling process, that is.
2020-01-03 03:16:29 +01:00
Andreas Kling
005313df82 Kernel: kill() with signal 0 should not actually send anything
Also kill() with pid 0 should send to everyone in the same process
group as the calling process.
2020-01-03 03:16:29 +01:00
Andreas Kling
8345f51a24 Kernel: Remove unnecessary wraparound check in Process::validate_read()
This will be checked moments later by MM.validate_user_read().
2020-01-03 03:16:29 +01:00
Andreas Kling
fdde5cdf26 Kernel: Don't include the process GID in the "extra GIDs" table
Process::m_extra_gids is for supplementary GIDs only.
2020-01-02 23:45:52 +01:00
Andreas Kling
9fe316c2d8 Kernel: Add some missing error checks to the setpgid() syscall 2020-01-02 19:40:04 +01:00
Andreas Kling
285130cc55 Kernel: Remove debug spam about marking threads for death 2020-01-02 13:45:22 +01:00
Andreas Kling
7f843ef3b2 Kernel: Make the purge() syscall superuser-only
I don't think we need to give unprivileged users access to what is
essentially a kernel testing mechanism.
2020-01-02 13:39:49 +01:00
Andreas Kling
c01f766fb2 Kernel: writev() should fail with EINVAL if total length > INT32_MAX 2020-01-02 13:01:41 +01:00
Andreas Kling
7f04334664 Kernel: Remove broken implementation of Unix SHM
This code never worked, as was never used for anything. We can build
a much better SHM implementation on top of TmpFS or similar when we
get to the point when we need one.
2020-01-02 12:44:21 +01:00
Andrew Kaster
bc50a10cc9 Kernel: sys$mprotect protects sub-regions as well as whole ones
Split a region into two/three if the desired mprotect range is a strict
subset of an existing region. We can then set the access bits on a new
region that is just our desired range and add both the new
desired subregion and the leftovers back to our page tables.
2020-01-02 12:27:13 +01:00
Andreas Kling
3f7de2713e Kernel: Make mknod() respect the process umask
Otherwise the /bin/mknod command would create world-writable inodes
by default (when run by superuser) which you probably don't want.
2020-01-02 02:40:43 +01:00
Andreas Kling
c7eb3ff1b3 Kernel: mknod() should not allow unprivileged users to create devices
In fact, unless you are superuser, you may only create a regular file,
a named pipe, or a local domain socket. Anything else should EPERM.
2020-01-02 02:36:12 +01:00
Andreas Kling
3dcec260ed Kernel: Validate the full range of user memory passed to syscalls
We now validate the full range of userspace memory passed into syscalls
instead of just checking that the first and last byte of the memory are
in process-owned regions.

This fixes an issue where it was possible to avoid rejection of invalid
addresses that sat between two valid ones, simply by passing a valid
address and a size large enough to put the end of the range at another
valid address.

I added a little test utility that tries to provoke EFAULT in various
ways to help verify this. I'm sure we can think of more ways to test
this but it's at least a start. :^)

Thanks to mozjag for pointing out that this code was still lacking!

Incidentally this also makes backtraces work again.

Fixes #989.
2020-01-02 02:17:12 +01:00
Andreas Kling
38f93ef13b Kernel: Disable x86 RDTSC instruction in userspace
It's still possible to read the TSC via the read_tsc() syscall, but we
will now clear some of the bottom bits for unprivileged users.
2020-01-01 18:22:20 +01:00
Andreas Kling
f598bbbb1d Kernel: Prevent executing I/O instructions in userspace
All threads were running with iomapbase=0 in their TSS, which the CPU
interprets as "there's an I/O permission bitmap starting at offset 0
into my TSS".

Because of that, any bits that were 1 inside the TSS would allow the
thread to execute I/O instructions on the port with that bit index.

Fix this by always setting the iomapbase to sizeof(TSS32), and also
setting the TSS descriptor's limit to sizeof(TSS32), effectively making
the I/O permissions bitmap zero-length.

This should make it no longer possible to do I/O from userspace. :^)
2020-01-01 17:31:41 +01:00
Andreas Kling
14cdd3fdc1 Kernel: Make module_load() and module_unload() be superuser-only
These should just fail with EPERM if you're not the superuser.
2020-01-01 00:46:08 +01:00
Tibor Nagy
624116a8b1 Kernel: Implement AltGr key support 2019-12-31 19:31:42 +01:00
Andreas Kling
36f1de3c89 Kernel: Pointer range validation should fail on wraparound
Let's reject address ranges that wrap around the 2^32 mark.
2019-12-31 18:23:17 +01:00
Andreas Kling
903b159856 Kernel: Write address validation was only checking end of write range
Thanks to yyyyyyy for finding the bug! :^)
2019-12-31 18:18:54 +01:00
Andreas Kling
3f254bfbc8 Kernel+ping: Only allow superuser to create SOCK_RAW sockets
/bin/ping is now setuid-root, and will drop privileges immediately
after opening a raw socket.
2019-12-31 01:42:34 +01:00
Andreas Kling
a69734bf2e Kernel: Also add a process boosting mechanism
Let's also have set_process_boost() for giving all threads in a process
the same boost.
2019-12-30 20:10:00 +01:00
Andreas Kling
610f3ad12f Kernel: Add a basic thread boosting mechanism
This patch introduces a syscall:

    int set_thread_boost(int tid, int amount)

You can use this to add a permanent boost value to the effective thread
priority of any thread with your UID (or any thread in the system if
you are the superuser.)

This is quite crude, but opens up some interesting opportunities. :^)
2019-12-30 19:23:13 +01:00
Andreas Kling
50677bf806 Kernel: Refactor scheduler to use dynamic thread priorities
Threads now have numeric priorities with a base priority in the 1-99
range.

Whenever a runnable thread is *not* scheduled, its effective priority
is incremented by 1. This is tracked in Thread::m_extra_priority.
The effective priority of a thread is m_priority + m_extra_priority.

When a runnable thread *is* scheduled, its m_extra_priority is reset to
zero and the effective priority returns to base.

This means that lower-priority threads will always eventually get
scheduled to run, once its effective priority becomes high enough to
exceed the base priority of threads "above" it.

The previous values for ThreadPriority (Low, Normal and High) are now
replaced as follows:

    Low -> 10
    Normal -> 30
    High -> 50

In other words, it will take 20 ticks for a "Low" priority thread to
get to "Normal" effective priority, and another 20 to reach "High".

This is not perfect, and I've used some quite naive data structures,
but I think the mechanism will allow us to build various new and
interesting optimizations, and we can figure out better data structures
later on. :^)
2019-12-30 18:46:17 +01:00
Andrew Kaster
cdcab7e5f4 Kernel: Retry mmap if MAP_FIXED is not in flags and addr is not 0
If an mmap fails to allocate a region, but the addr passed in was
non-zero, non-fixed mmaps should attempt to allocate at any available
virtual address.
2019-12-29 23:01:27 +01:00
Andreas Kling
fed3416bd2 Kernel: Embrace the SerenityOS name 2019-12-29 19:08:02 +01:00
Andreas Kling
1f31156173 Kernel: Add a mode flag to sys$purge and allow purging clean inodes 2019-12-29 13:16:53 +01:00