Function-local `static constexpr` variables can be `constexpr`. This
can reduce memory consumption, binary size, and offer additional
compiler optimizations.
These changes result in a stripped x86_64 kernel binary size reduction
of 592 bytes.
Move this architecture-specific sanity check (IOPL must be 0) out of
Scheduler and into the x86 enter_thread_context(). Also do this for
every thread and not just userspace ones.
It was annoyingly hard to spot these when we were using them with
different amounts of qualification everywhere.
This patch uses Thread::State::Foo everywhere instead of Thread::Foo
or just Foo.
Signal dispatch is already taken care of elsewhere, so there appears to
be no need for the hack in enter_current().
This also allows us to remove the Thread::m_in_block flag, simplifying
thread blocking logic somewhat.
Verified with the original repro for #4336 which this was meant to fix.
This allows us to enable Write-Combine on e.g. framebuffers,
significantly improving performance on bare metal.
To keep things simple we right now only use one of up to three bits
(bit 7 in the PTE), which maps to the PA4 entry in the PAT MSR, which
we set to the Write-Combine mode on each CPU at boot time.
Since the inline capacity of the Vector return type was not specified
explicitly, the vector was automatically copied to a 0-length inline
capacity one, essentially eliminating the optimization.
Contradictory to the comment above it, this while loop was actually
clearing the selectors above or equal to the edited one (instead of
the selectors that were skipped when the gdt was extended), this wasn't
really an issue so far, as all calls to this function did extend the
GDT, which meant this condition was always false, but future calls to
this function that will try to edit an existing entry would fail.
Add a kernel data segment and make the user code segment come after
the data segment. We need the GDT to be in a certain order to support
the syscall and sysret instruction pair.
In order to reduce our reliance on __builtin_{ffs, clz, ctz, popcount},
this commit removes all calls to these functions and replaces them with
the equivalent functions in AK/BuiltinWrappers.h.
Unsurprisingly, the /proc/PID/stacks/TID stack walk had the same
arbitrary memory read problem as the perf event stack walk.
It would be nice if the kernel had a single stack walk implementation,
but that's outside the scope of this commit.
The platform independent Processor.h file includes the shared processor
code and includes the specific platform header file.
All references to the Arch/x86/Processor.h file have been replaced with
a reference to Arch/Processor.h.
This fixes a triple fault that occurs when compiling serenity with
the i686 clang toolchain. (The underlying issue is that the old inline
assembly did not specify that it clobbered the eax/ecx/edx registers
and as such the compiler assumed they were not changed and used their
values across it)
Co-authored-by: Brian Gianforcaro <bgianf@serenityos.org>
Initializing the variable this way fixes a kernel panic in Clang where
the object was zero-initialized, so the `m_in_scheduler` contained the
wrong value. GCC got it right, but we're better off making this change,
as leaving uninitialized fields in constant-initialized objects can
cause other weird situations like this. Also, initializing only a single
field to a non-zero value isn't worth the cost of no longer fitting in
`.bss`.
Another two variables suffer from the same problem, even though their
values are supposed to be zero. Removing these causes the
`_GLOBAL_sub_I_` function to no longer be generated and the (not
handled) `.init_array` section to be omitted.
This has several benefits:
1) We no longer just blindly derefence a null pointer in various places
2) We will get nicer runtime error messages if the current process does
turn out to be null in the call location
3) GCC no longer complains about possible nullptr dereferences when
compiling without KUBSAN
Leave interrupts enabled so that we can still process IRQs. Critical
sections should only prevent preemption by another thread.
Co-authored-by: Tom <tomut@yahoo.com>
By making these functions static we close a window where we could get
preempted after calling Processor::current() and move to another
processor.
Co-authored-by: Tom <tomut@yahoo.com>
Processing SMP messages outside of non-SMP mode is a waste of time,
and now that we don't rely on the side effects of calling the message
processing function, let's stop calling it entirely. :^)
We were previously relying on a side effect of the critical section in
smp_process_pending_messages(): when exiting that section, it would
process any pending deferred calls.
Instead of relying on that, make the deferred invocations explicit by
calling deferred_call_execute_pending() in exit_trap().
This ensures that deferred calls get processed before entering the
scheduler at the end of exit_trap(). Since thread unblocking happens
via deferred calls, the threads don't have to wait until the next
scheduling opportunity when they could be ready *now*. :^)
This was the main reason Tom's SMP branch ran slowly in non-SMP mode.
Enter a critical section in Processor::exit_trap so that processing
SMP messages doesn't enable interrupts upon leaving. We need to delay
this until the end where we call into the Scheduler if exiting the
trap results in being outside of a critical section and irq handler.
Co-authored-by: Tom <tomut@yahoo.com>
- Use the receiver's per-CPU entry in the message, instead of the
sender's. (Using the sender's entry wasn't safe for broadcast
messages since the same entry ended up on multiple message queues.)
- Retry the CAS until it *succeeds* instead of *fails*. This closes a
race window, and also ensures a correct return value. The return value
is used by the caller to decide whether to broadcast an IPI.
This was the main reason smp=on was so slow. We had CPUs busy-waiting
until someone else triggered an IPI and moved things along.
- Add a CPU pause hint to the spin loop. :^)
Due to a boolean mistake in smp_return_to_pool(), we didn't retry
pushing the message onto the freelist after a failed attempt.
This caused the message pool to eventually become completely empty
after enough contentious access attempts.
This patch also adds a pause hint to the CPU in the failed attempt
code path.
To add a new per-CPU data structure, add an ID for it to the
ProcessorSpecificDataID enum.
Then call ProcessorSpecific<T>::initialize() when you are ready to
construct the per-CPU data structure on the current CPU. It can then
be accessed via ProcessorSpecific<T>::get().
This patch replaces the existing hard-coded mechanisms for Scheduler
and MemoryManager per-CPU data structure.
The non CPU specific code of the kernel shouldn't need to deal with
architecture specific registers, and should instead deal with an
abstract view of the machine. This allows us to remove a variety of
architecture specific ifdefs and helps keep the code slightly more
portable.
We do this by exposing the abstract representation of instruction
pointer, stack pointer, base pointer, return register, etc on the
RegisterState struct.