When ProcFS could no longer allocate KBuffer objects to serve calls to
read, it would just return 0, indicating EOF. This then triggered
parsing errors because code assumed it read the file.
Because read isn't supposed to return ENOMEM, change ProcFS to populate
the file data upon file open or seek to the beginning. This also means
that calls to open can now return ENOMEM if needed. This allows the
caller to either be able to successfully open the file and read it, or
fail to open it in the first place.
There is a window between dropping the last reference and removing
a ProcFSInode from the lookup map. So, when looking up we need to
check if that Inode is being destructed.
Before this change, we would sometimes map a region into the address
space with !is_shared(), and then moments later call set_shared(true).
I found this very confusing while debugging, so this patch makes us pass
the initial shared flag to the Region constructor, ensuring that it's in
the correct state by the time we first map the region.
This adds the ability for a Region to define volatile/nonvolatile
areas within mapped memory using madvise(). This also means that
memory purging takes into account all views of the PurgeableVMObject
and only purges memory that is not needed by all of them. When calling
madvise() to change an area to nonvolatile memory, return whether
memory from that area was purged. At that time also try to remap
all memory that is requested to be nonvolatile, and if insufficient
pages are available notify the caller of that fact.
Compared to version 10 this fixes a bunch of formatting issues, mostly
around structs/classes with attributes like [[gnu::packed]], and
incorrect insertion of spaces in parameter types ("T &"/"T &&").
I also removed a bunch of // clang-format off/on and FIXME comments that
are no longer relevant - on the other hand it tried to destroy a couple of
neatly formatted comments, so I had to add some as well.
The unblock_all variant used to ASSERT if a blocker didn't unblock,
but it wasn't clear from the name that it would do that. Because
the BlockCondition already asserts that no blockers are left at
destruction time, it would still catch blockers that haven't been
unblocked for whatever reason.
Fixes#4496
The partitioning code was very outdated, and required a full refactor.
The new subsystem removes duplicated code and uses more AK containers.
The most important change is that all implementations of the
PartitionTable class conform to one interface, which made it possible
to remove unnecessary code in the EBRPartitionTable class.
Finding partitions is now done in the StorageManagement singleton,
instead of doing so in init.cpp.
Also, now we don't try to find partitions on demand - the kernel will
try to detect if a StorageDevice is partitioned, and if so, will check
what is the partition table, which could be MBR, GUID or EBR.
Then, it will create DiskPartitionMetadata object for each partition
that is available in the partition table. This object will be used
by the partition enumeration code to create a DiskPartition with the
correct minor number.
The DevFS along with DevPtsFS give a complete solution for populating
device nodes in /dev. The main purpose of DevFS is to eliminate the
need of device nodes generation when building the system.
Later on, DevFS will assist with exposing disk partition nodes.
BlockBasedFileSystem::read_block method should get a reference of
a UserOrKernelBuffer.
If we need to force caching a block, we will call other method to do so.
clang trunk with -std=c++20 doesn't seem to properly look for an
aggregate initializer here when the type being constructed is a simple
aggregate (e.g. `struct Thing { int a; int b; };`). This template fails
to compile in a usage added 12/16/2020 in `AK/Trie.h`.
Both forms of initialization are supposed to call the
aggregate-initializers but direct-list-initialization delegating to
aggregate initializers is a new addition in c++20 that might not be
implemented yet.
This was a goofy kernel API where you could assign an icon_id (int) to
a process which referred to a global shbuf with a 16x16 icon bitmap
inside it.
Instead of this, programs that want to display a process icon now
retrieve it from the process executable instead.
This new flag controls two things:
- Whether the kernel will generate core dumps for the process
- Whether the EUID:EGID should own the process's files in /proc
Processes are automatically made non-dumpable when their EUID or EGID is
changed, either via syscalls that specifically modify those ID's, or via
sys$execve(), when a set-uid or set-gid program is executed.
A process can change its own dumpable flag at any time by calling the
new sys$prctl(PR_SET_DUMPABLE) syscall.
Fixes#4504.
This is instead of the UID:GID, since that was allowing some very bad
information leaks like spawning "su" as an unprivileged user and having
full /proc access to it.
Work towards #4504.
ProcFS /proc/<pid>/vm map info no longer contains two `purgeable` keys.
The second `purgeable` key has been removed and replaced with keys for
`kernel` and `cacheable`.
This implements a number of changes related to time:
* If a HPET is present, it is now used only as a system timer, unless
the Local APIC timer is used (in which case the HPET timer will not
trigger any interrupts at all).
* If a HPET is present, the current time can now be as accurate as the
chip can be, independently from the system timer. We now query the
HPET main counter for the current time in CPU #0's system timer
interrupt, and use that as a base line. If a high precision time is
queried, that base line is used in combination with quering the HPET
timer directly, which should give a much more accurate time stamp at
the expense of more overhead. For faster time stamps, the more coarse
value based on the last interrupt will be returned. This also means
that any missed interrupts should not cause the time to drift.
* The default system interrupt rate is reduced to about 250 per second.
* Fix calculation of Thread CPU usage by using the amount of ticks they
used rather than the number of times a context switch happened.
* Implement CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE and use it
for most cases where precise timestamps are not needed.
Problem:
- `(void)` simply casts the expression to void. This is understood to
indicate that it is ignored, but this is really a compiler trick to
get the compiler to not generate a warning.
Solution:
- Use the `[[maybe_unused]]` attribute to indicate the value is unused.
Note:
- Functions taking a `(void)` argument list have also been changed to
`()` because this is not needed and shows up in the same grep
command.
Fix some problems with join blocks where the joining thread block
condition was added twice, which lead to a crash when trying to
unblock that condition a second time.
Deferred block condition evaluation by File objects were also not
properly keeping the File object alive, which lead to some random
crashes and corruption problems.
Other problems were caused by the fact that the Queued state didn't
handle signals/interruptions consistently. To solve these issues we
remove this state entirely, along with Thread::wait_on and change
the WaitQueue into a BlockCondition instead.
Also, deliver signals even if there isn't going to be a context switch
to another thread.
Fixes#4336 and #4330
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.
This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
Use the TimerQueue to expire blocking operations, which is one less thing
the Scheduler needs to check on every iteration.
Also, add a BlockTimeout class that will automatically handle relative or
absolute timeouts as well as overriding timeouts (e.g. socket timeouts)
more consistently.
Also, rework the TimerQueue class to be able to fire events from
any processor, which requires Timer to be RefCounted. Also allow
creating id-less timers for use by blocking operations.
This makes misses in the BlockBasedFS's LRU block cache faster by
storing the cache entries in one of two doubly-linked list.
Dirty and clean cache entries are kept in two separate lists, and
move between them when their state changes. This can probably be
improved upon further.
If the inode's block list cache is empty, we forgot to assign the
result of computing the block list. The fact that this worked anyway
makes me wonder when we actually don't have a cache..
Thanks to szyszkienty for spotting this! :^)
Instead of doing a linear scan of the entire cache when doing a lookup,
we now have a nice O(1) HashMap in front of the cache.
The cache miss case can still be improved, this patch really only helps
the cache hit case.
This dramatically improves cached filesystem I/O. :^)
Since we're using UserOrKernelBuffers, SMAP will be automatically
disabled when we actually access the buffer later on. There's no need
to disable it wholesale across the entire read/write operations.
This is a new "browse" permission that lets you open (and subsequently list
contents of) directories underneath the path, but not regular files or any other
types of files.
e2fsck was complaining about blocks being allocated in an inode's list
of direct blocks while at the same time being free in the block bitmap.
It was easy to reproduce by creating a file with non-zero length and
then truncating it. This fixes the issue by clearing out the direct
block list when resizing a file to 0.
This makes most operations thread safe, especially so that they
can safely be used in the Kernel. This includes obtaining a strong
reference from a weak reference, which now requires an explicit
call to WeakPtr::strong_ref(). Another major change is that
Weakable::make_weak_ref() may require the explicit target type.
Previously we used reinterpret_cast in WeakPtr, assuming that it
can be properly converted. But WeakPtr does not necessarily have
the knowledge to be able to do this. Instead, we now ask the class
itself to deliver a WeakPtr to the type that we want.
Also, WeakLink is no longer specific to a target type. The reason
for this is that we want to be able to safely convert e.g. WeakPtr<T>
to WeakPtr<U>, and before this we just reinterpret_cast the internal
WeakLink<T> to WeakLink<U>, which is a bold assumption that it would
actually produce the correct code. Instead, WeakLink now operates
on just a raw pointer and we only make those constructors/operators
available if we can verify that it can be safely cast.
In order to guarantee thread safety, we now use the least significant
bit in the pointer for locking purposes. This also means that only
properly aligned pointers can be used.
When computing the list of blocks to deallocate when freeing an inode,
we would stop collecting blocks after reaching the inode's block count.
Since we're getting rid of the inode, we need to also include the meta
blocks used by the on-disk block list itself.
If you try to do this (e.g "mv directory directory"), sys$rename() will
now fail with EDIRINTOSELF.
Dr. POSIX says we should return EINVAL for this, but a custom error
code allows us to print a much more helpful error message when this
problem occurs. :^)