These methods will be used later on to introduce symbolic links support
in the SysFS, so the kernel will be able to resolve relative paths of
components in filesystem based on using the m_parent_directory pointer
in each SysFSComponent object.
This folder in the SysFS code represents everything related to /sys/dev,
which is a directory meant to be a convenient interface to track all IDs
of all block and character devices (ID = major:minor numbers).
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
I haven't found any POSIX specification on this, but the Linux kernel
appears to handle it like that.
This is required by QEMU, as it just bulk-unlocks all its file locking
bytes without checking first if they are held.
In a previous commit I moved everything into the new subdirectories in
FileSystem/SysFS directory without trying to actually make changes in
the code itself too much. Now it's time to split the code to make it
more readable and understandable, hence this change occurs now.
As with the previous commit, we put a distinction between filesystems
that require a file description and those which don't, but now in a much
more readable mechanism - all initialization properties as well as the
create static method are grouped to create the FileSystemInitializer
structure. Then when we need to initialize an instance, we iterate over
a table of these structures, checking for matching structure and then
validating the given arguments from userspace against the requirements
to ensure we can create a valid instance of the requested filesystem.
This exposes the child processes for a process as a directory
of symlinks to the respective /proc entries for each child.
This makes for an easier and possibly more efficient way
to find and count a process's children. Previously the only
method was to parse the entire /proc/all JSON file.
Coverage tools like LLVM's source-based coverage or GNU's --coverage
need to be able to write out coverage files from any binary, regardless
of its security posture. Not ignoring these pledges and veils means we
can't get our coverage data out without playing some serious tricks.
However this is pretty terrible for normal exeuction, so only skip these
checks when we explicitly configured userspace for coverage.
AnonymousFile always allocates in multiples of a page size when created
with anon_create. This is especially an issue if we use AnonymousFile
shared memory to store a shared data structure that isn't exactly a
multiple of a page in size. Therefore, we can just allow mmaps of
AnonymousFile to map only an initial part of the shared memory.
This makes SharedSingleProducerCircularQueue work when it's introduced
later.
This is important for dmidecode because it does an fstat on the DMI
blobs, trying to figure out their size. Because we already know the size
of the blobs when creating the SysFS components, there's no performance
penalty whatsoever, and this allows dmidecode to not use the /dev/mem
device as a fallback.
The obsolete ttyname and ptsname syscalls are removed.
LibC doesn't rely on these anymore, and it helps simplifying the Kernel
in many places, so it's an overall an improvement.
In addition to that, /proc/PID/tty node is removed too as it is not
needed anymore by userspace to get the attached TTY of a process, as
/dev/tty (which is already a character device) represents that as well.
Error codes can leak information about veiled paths, if the path
resolution fails with e.g. EACCESS.
This is non-trivial to fix, as there is a group of error codes we want
to propagate to the caller, such as ENOMEM.
VirtualFileSystem::mkdir() relies on resolve_path() returning an error,
since it is only interested in the out_parent passed as a pointer. Since
resolve_path_without_veil returns an error, no process veil validation
is done by resolve_path() in that case. Due to this problem, mkdir()
should use resolve_path_without_veil() and then manually validate if the
parent directory of the to-be-created directory is unveiled with 'c'
permissions.
This fixes a bug where the mkdir syscall would not respect the process
veil at all.
Previously, VirtualFileSystem::resolve_path() could return a non-null
RefPtr<Custody>* out_parent even if the function errored because the
path has been veiled.
If code relies on recieving the parent custody even if the path is
veiled, it should just call resolve_path_without_veil and do the veil
validation manually. This is because it could be that the parent is
unveiled but the child isn't or the other way round.
Rename the bound socket accessor from socket() to bound_socket().
Also return RefPtr<LocalSocket> instead of a raw pointer, to make it
harder for callers to mess up.
A mutex is useful when we need to be able to block the current thread
until it's available. This is overkill for OpenFileDescriptor.
First off, this patch wraps the main state member variables inside a
SpinlockProtected<State> to enforce synchronized access. This also
avoids "free locking" where figuring out which variables are guarded
by which lock is left as an unamusing exercise for the reader.
Then we remove mutex locking from the functions that simply call through
to the underlying File or Inode, since those fields never change anyway,
and the target objects perform their own synchronization.
InodeWatcher::register_inode was already partially fallible, but the
insertion of the inodes and watch descriptions into their respective
hash maps was not. Note that we cannot simply TRY the insertion into
both, as that could result in an inconsistent state, instead we must
remove the inode from the inode hash map if the insertion into the
watch description hash map failed.