This commit adds a basic implementation of
the ptrace syscall, which allows one process
(the tracer) to control another process (the tracee).
While a process is being traced, it is stopped whenever a signal is
received (other than SIGCONT).
The tracer can start tracing another thread with PT_ATTACH,
which causes the tracee to stop.
From there, the tracer can use PT_CONTINUE
to continue the execution of the tracee,
or use other request codes (which haven't been implemented yet)
to modify the state of the tracee.
Additional request codes are PT_SYSCALL, which causes the tracee to
continue exection but stop at the next entry or exit from a syscall,
and PT_GETREGS which fethces the last saved register set of the tracee
(can be used to inspect syscall arguments and return value).
A special request code is PT_TRACE_ME, which is issued by the tracee
and causes it to stop when it calls execve and wait for the
tracer to attach.
This was only used by the mechanism for mapping executables into each
process's own address space. Now that we remap executables on demand
when needed for symbolication, this can go away.
Previously we would map the entire executable of a program in its own
address space (but make it unavailable to userspace code.)
This patch removes that and changes the symbolication code to remap
the executable on demand (and into the kernel's own address space
instead of the process address space.)
This opens up a couple of further simplifications that will follow.
Add an extra out-parameter to shbuf_get() that receives the size of the
shared buffer. That way we don't need to make a separate syscall to
get the size, which we always did immediately after.
This feels a lot more consistent and Unixy:
create_shared_buffer() => shbuf_create()
share_buffer_with() => shbuf_allow_pid()
share_buffer_globally() => shbuf_allow_all()
get_shared_buffer() => shbuf_get()
release_shared_buffer() => shbuf_release()
seal_shared_buffer() => shbuf_seal()
get_shared_buffer_size() => shbuf_get_size()
Also, "shared_buffer_id" is shortened to "shbuf_id" all around.
This allows a process wich has more than 1 thread to call exec, even
from a thread. This kills all the other threads, but it won't wait for
them to finish, just makes sure that they are not in a running/runable
state.
In the case where a thread does exec, the new program PID will be the
thread TID, to keep the PID == TID in the new process.
This introduces a new function inside the Process class,
kill_threads_except_self which is called on exit() too (exit with
multiple threads wasn't properly working either).
Inside the Lock class, there is the need for a new function,
clear_waiters, which removes all the waiters from the
Process::big_lock. This is needed since after a exit/exec, there should
be no other threads waiting for this lock, the threads should be simply
killed. Only queued threads should wait for this lock at this point,
since blocked threads are handled in set_should_die.
Replace Process::m_being_inspected with an inspector reference count.
This prevents an assertion from firing when inspecting the same process
in /proc from multiple processes at the same time.
It was trivially reproducible by opening multiple FileManagers.
This mechanism wasn't actually used to create any WeakPtr<Process>.
Such pointers would be pretty hard to work with anyway, due to the
multi-step destruction ritual of Process.
Calling shutdown prevents further reads and/or writes on a socket.
We should do a few more things based on the type of socket, but this
initial implementation just puts the basic mechanism in place.
Work towards #428.
sys$waitid() takes an explicit description of whether it's waiting for a single
process with the given PID, all of the children, a group, etc., and returns its
info as a siginfo_t.
It also doesn't automatically imply WEXITED, which clears up the confusion in
the kernel.
This patch introduces sys$perf_event() with two event types:
- PERF_EVENT_MALLOC
- PERF_EVENT_FREE
After the first call to sys$perf_event(), a process will begin keeping
these events in a buffer. When the process dies, that buffer will be
written out to "perfcore" in the current directory unless that filename
is already taken.
This is probably not the best way to do this, but it's a start and will
make it possible to start doing memory allocation profiling. :^)
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.
This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
This syscall is a complement to pledge() and adds the same sort of
incremental relinquishing of capabilities for filesystem access.
The first call to unveil() will "drop a veil" on the process, and from
now on, only unveiled parts of the filesystem are visible to it.
Each call to unveil() specifies a path to either a directory or a file
along with permissions for that path. The permissions are a combination
of the following:
- r: Read access (like the "rpath" promise)
- w: Write access (like the "wpath" promise)
- x: Execute access
- c: Create/remove access (like the "cpath" promise)
Attempts to open a path that has not been unveiled with fail with
ENOENT. If the unveiled path lacks sufficient permissions, it will fail
with EACCES.
Like pledge(), subsequent calls to unveil() with the same path can only
remove permissions, not add them.
Once you call unveil(nullptr, nullptr), the veil is locked, and it's no
longer possible to unveil any more paths for the process, ever.
This concept comes from OpenBSD, and their implementation does various
things differently, I'm sure. This is just a first implementation for
SerenityOS, and we'll keep improving on it as we go. :^)
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.
For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.
Going forward, all new source files should include a license header.
The syscall is now called sys$open(), but it behaves like the old sys$openat().
In userspace, open_with_path_length() is made a wrapper over openat_with_path_length().
This patch adds a new "accept" promise that allows you to call accept()
on an already listening socket. This lets programs set up a socket for
for listening and then dropping "inet" and/or "unix" so that only
incoming (and existing) connections are allowed from that point on.
No new outgoing connections or listening server sockets can be created.
In addition to accept() it also allows getsockopt() with SOL_SOCKET
and SO_PEERCRED, which is used to find the PID/UID/GID of the socket
peer. This is used by our IPC library when creating shared buffers that
should only be accessible to a specific peer process.
This allows us to drop "unix" in WindowServer and LookupServer. :^)
It also makes the debugging/introspection RPC sockets in CEventLoop
based programs work again.
This patch changes how exec() figures out which program image to
actually load. Previously, we opened the path to our main executable in
find_shebang_interpreter_for_executable, read the first page (or less,
if the file was smaller) and then decided whether to recurse with the
interpreter instead. We then then re-opened the main executable in
do_exec.
However, since we now want to parse the ELF header and Program Headers
of an elf image before even doing any memory region work, we can change
the way this whole process works. We open the file and read (up to) the
first page in exec() itself, then pass just the page and the amount read
to find_shebang_interpreter_for_executable. Since we now have that page
and the FileDescription for the main executable handy, we can do a few
things. First, validate the ELF header and ELF program headers for any
shenanigans. ELF32 Little Endian i386 only, please. Second, we can grab
the PT_INTERP interpreter from any ET_DYN files, and open that guy right
away if it exists. Finally, we can pass the main executable's and
optionally the PT_INTERP interpreter's file descriptions down to do_exec
and not have to feel guilty about opening the file twice.
In do_exec, we now have a choice. Are we going to load the main
executable, or the interpreter? We could load both, but it'll be way
easier for the inital pass on the RTLD if we only load the interpreter.
Then it can load the main executable itself like any old shared object,
just, the one with main in it :). Later on we can load both of them
into memory and the RTLD can relocate itself before trying to do
anything. The way it's written now the RTLD will get dibs on its
requested virtual addresses being the actual virtual addresses.
Right now there is a significant amount of boiler plate code required
to validate user mode parameters in syscalls. In an attempt to reduce
this a bit, introduce validate_read_and_copy_typed which combines the
usermode address check and does the copy internally if the validation
passes. This cleans up a little bit of code from a significant amount
of syscalls.
Since a chroot is in many ways similar to a separate root mount, we can also
apply mount flags to it as if it was an actual mount. These flags will apply
whenever the chrooted process accesses its root directory, but not when other
processes access this same directory for the outside. Since it's common to
chdir("/") immediately after chrooting (so that files accessed through the
current directory inherit the same mount flags), this effectively allows one to
apply additional limitations to a process confined inside a chroot.
To this effect, sys$chroot() gains a mount_flags argument (exposed as
chroot_with_mount_flags() in userspace) which can be set to all the same values
as the flags argument for sys$mount(), and additionally to -1 to keep the flags
set for that file system. Note that passing 0 as mount_flags will unset any
flags that may have been set for the file system, not keep them.
While I was updating syscalls to stop passing null-terminated strings,
I added some helpful struct types:
- StringArgument { const char*; size_t; }
- ImmutableBuffer<Data, Size> { const Data*; Size; }
- MutableBuffer<Data, Size> { Data*; Size; }
The Process class has some convenience functions for validating and
optionally extracting the contents from these structs:
- get_syscall_path_argument(StringArgument)
- validate_and_copy_string_from_user(StringArgument)
- validate(ImmutableBuffer)
- validate(MutableBuffer)
There's still so much code around this and I'm wondering if we should
generate most of it instead. Possible nice little project.