Commit graph

55 commits

Author SHA1 Message Date
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
c70f45ff44 Everywhere: Explicitly specify the size in StringView constructors
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
2022-07-12 23:11:35 +02:00
Tim Schumacher
e2036ca2ca LibELF: Store the full file path in DynamicObject
Otherwise, our `dirname` call on the parent object will always be empty
when trying to resolve dependencies.
2022-06-30 11:57:10 +02:00
Daniel Bertalan
08c459e495 LibELF: Add support for IFUNCs
IFUNC is a GNU extension to the ELF standard that allows a function to
have multiple implementations. A resolver function has to be called at
load time to choose the right one to use. The PLT will contain the entry
to the resolved function, so branching and more indirect jumps can be
avoided at run-time.

This mechanism is usually used when a routine can be made faster using
CPU features that are available in only some models, and a fallback
implementation has to exist for others.

We will use this feature to have two separate memset implementations for
CPUs with and without ERMS (Enhanced REP MOVSB/STOSB) support.
2022-05-01 12:42:01 +02:00
Daniel Bertalan
4d5965bd2c LibELF: Keep track of whether the PLT contains REL or RELA relocations 2022-05-01 12:42:01 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Brian Gianforcaro
39f924a731 LibELF: Skip DynamicObject::dump() if logging isn't enabled
I noticed that we were populating this StringBuilder and then throwing
away the result while profiling `true` with UserSpace emulator.

Before:

    courage:~ $ time -n 1000 true
    Timing report: 3454 ms
    ==============
    Command:         true
    Average time:    3.45 ms (median: 3, stddev: 3.42, min: 0, max:11)
    Excluding first: 3.45 ms (median: 3, stddev: 3.42, min: 0, max:11)

After:

    courage:~ $ time -n 1000 true
    Timing report: 3308 ms
    ==============
    Command:         true
    Average time:    3.30 ms (median: 3, stddev: 3.28, min: 0, max:12)
    Excluding first: 3.30 ms (median: 3, stddev: 3.29, min: 0, max:12)
2022-03-31 10:18:07 +02:00
Daniel Bertalan
3974cac148 LibELF: Implement support for DT_RELR relative relocations
The DT_RELR relocation is a relatively new relocation encoding designed
to achieve space-efficient relative relocations in PIE programs.

The description of the format is available here:
https://groups.google.com/g/generic-abi/c/bX460iggiKg/m/Pi9aSwwABgAJ

It works by using a bitmap to store the offsets which need to be
relocated. Even entries are *address* entries: they contain an address
(relative to the base of the executable) which needs to be relocated.
Subsequent even entries are *bitmap* entries: "1" bits encode offsets
(in word size increments) relative to the last address entry which need
to be relocated.

This is in contrast to the REL/RELA format, where each entry takes up
2/3 machine words. Certain kinds of relocations store useful data in
that space (like the name of the referenced symbol), so not everything
can be encoded in this format. But as position-independent executables
and shared libraries tend to have a lot of relative relocations, a
specialized encoding for them absolutely makes sense.

The authors of the format suggest an overall 5-20% reduction in the file
size of various programs. Due to our extensive use of dynamic linking
and us not stripping debug info, relative relocations don't make up such
a large portion of the binary's size, so the measurements will tend to
skew to the lower side of the spectrum.

The following measurements were made with the x86-64 Clang toolchain:

- The kernel contains 290989 relocations. Enabling RELR decreased its
  size from 30 MiB to 23 MiB.
- LibUnicodeData contains 190262 relocations, almost all of them
  relative. Its file size changed from 17 MiB to 13 MiB.
- /bin/WebContent contains 1300 relocations, 66% of which are relative
  relocations. With RELR, its size changed from 832 KiB to 812 KiB.

This change was inspired by the following blog post:
https://maskray.me/blog/2021-10-31-relative-relocations-and-relr
2022-02-11 18:07:53 +01:00
Idan Horowitz
6c8f1e62db LibELF: Ignore unknown dynamic tags instead of asserting
Some programs (like wine) add custom dynamic tags to their binaries
that are used internally by them, so we can just ignore them.
2021-12-22 00:02:36 -08:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Rodrigo Tobar
5ae384fc33 LibELF: Indicate value of unimplemented dtag
For flags that are not mapped to a string this message prints DT_??,
which isn't really useful.
2021-09-26 12:45:55 +02:00
Rodrigo Tobar
a67e06184b LibC+LibELF: Add definitions for extra dtags
These are found in some libraries, and LibELF doesn't know how to handle
them, not even their name. Adding these definitions should at least help
readelf display information correctly, but more work is needed to
actually implement them.
2021-09-26 12:45:55 +02:00
Rodrigo Tobar
3efd7b458a LibELF+readelf: Remove duplicated dtag->string map
A copy of the same mapping was found both in LibELF and in the readelf
utility, which uses LibELF; keeping them both is redundant and removing
the duplicate saves (a bit of) space.
2021-09-26 12:45:55 +02:00
Brian Gianforcaro
2038d2c49e LibELF: Apply some minor optimizations to symbol lookup
Optimizations:

- Make sure `DT_SYMTAB` is a string view literal, instead of string.

- DynamicObject::HashSection::lookup_sysv_symbol should be using
  raw_name() from symbol comparison to avoid needlessly calling
  `strlen`, when the StrinView::operator= walks the cstring without
  calling `strlen` first.

- DynamicObject::HashSection::lookup_gnu_symbol shouldn't create a
  symbol unless we know the hashes match first.

In order to test these changes I enabled Undefined behavior sanitizer
which creates a huge amount of relocations, and then ran the browser
with the help argument 100 times. The browser is a fairly big app with
a few different libraries being loaded, so it seemed liked a good
target.

Command: `time -n 100 br --help`

Before:
```
Timing report:
==============
Command:         br --help
Average time:    3897.679931 ms
Excluding first: 3901.242431 ms
```

After:
```
Timing report:
==============
Command:         br --help
Average time:    3612.860107 ms
Excluding first: 3613.54541 ms
```
2021-08-28 20:03:08 +02:00
Daniel Bertalan
18b2484985 LibELF: Remove (FlatPtr)something.as_ptr() idiom
This is equivalent to `something.get()`, but more verbose.
2021-08-09 23:15:48 +02:00
Gunnar Beutner
f9a8c6f053 LibELF: Implement support for RELA relocations 2021-07-01 10:50:00 +02:00
Gunnar Beutner
d3127efc01 LibELF: Implement PLT relocations for x86_64 2021-06-29 20:03:36 +02:00
Gunnar Beutner
c81d959afb LibELF: Implement GNU hash section lookups for x86_64 2021-06-29 20:03:36 +02:00
Gunnar Beutner
73b9cfac1b LibELF: Support weak symbols when using BIND_NOW
When using BIND_NOW (e.g. via -Wl,-z,now) we would fail to load ELF
images while doing relocations when we encounter a weak symbol. Instead
we should just patch the PLT entry with a null pointer.

This can be reproduced with:

$ cat test.cpp
int main()
{
    std::cout << "Hello World!" << std::endl;
}
$ g++ -o test -Wl,-z,now test.cpp
$ ./test
did not find symbol while doing relocations for library test: _ITM_RU1
2021-05-31 11:49:32 +01:00
Andrew Kaster
7b4dc590e7 AK+Userland: Use akaster@serenityos.org for my copyright headers 2021-05-30 14:35:34 +01:00
Nicholas Baron
aa4d41fe2c
AK+Kernel+LibELF: Remove the need for IteratorDecision::Continue
By constraining two implementations, the compiler will select the best
fitting one. All this will require is duplicating the implementation and
simplifying for the `void` case.

This constraining also informs both the caller and compiler by passing
the callback parameter types as part of the constraint
(e.g.: `IterationFunction<int>`).

Some `for_each` functions in LibELF only take functions which return
`void`. This is a minimal correctness check, as it removes one way for a
function to incompletely do something.

There seems to be a possible idiom where inside a lambda, a `return;` is
the same as `continue;` in a for-loop.
2021-05-16 10:36:52 +01:00
Andreas Kling
ea027834df LibELF: Convert StringBuilder::appendf() => AK::Format 2021-05-07 21:12:09 +02:00
Gunnar Beutner
fdbe66a7b4 LibELF+LibC: Support building LibELF for 64-bit targets 2021-05-03 08:42:39 +02:00
Itamar
101ac45c1a LibELF: Change TLS offset calculation
This changes the TLS offset calculation logic to be based on the
symbol's size instead of the total size of the TLS.

Because of this change, we no longer need to pipe "m_tls_size" to so
many functions.

Also, After this patch, the TLS data of the main program exists at the
"end" of the TLS block (Highest addresses).

This fixes a part of #6609.
2021-04-30 18:47:39 +02:00
Gunnar Beutner
f40ee1b03f LibC+LibELF: Implement more fully-features dlfcn functionality
This implements more of the dlfcn functionality. Most notably:

* It's now possible to dlopen() libraries which were already
  loaded at program startup time. This does not cause those
  libraries to be loaded twice.
* Errors are reported via dlerror() rather than by crashing
  the program.
* Calls to the dl*() functions are thread-safe.
2021-04-25 10:14:50 +02:00
Gunnar Beutner
f74b8a2d1f LibELF: Avoid calculating symbol hashes when we don't need them 2021-04-23 23:35:36 +02:00
Andreas Kling
b91c49364d AK: Rename adopt() to adopt_ref()
This makes it more symmetrical with adopt_own() (which is used to
create a NonnullOwnPtr from the result of a naked new.)
2021-04-23 16:46:57 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Gunnar Beutner
38619a9f24 LibELF: Ignore DT_SYMBOLIC entries
The shared library libicudata.so has a DT_SYMBOLIC entry:

Dynamic Section:
  NEEDED               libgcc_s.so
  SONAME               libicudata.so.69
  SYMBOLIC             0x00000000
  HASH                 0x00000094
  STRTAB               0x000000c8
  SYMTAB               0x000000a8
  STRSZ                0x0000002a
  SYMENT               0x00000010

According to the ELF spec DT_SYMBOLIC has no special meaning
for the dynamic loader.
2021-04-19 20:39:22 +02:00
Gunnar Beutner
0cca23def5 LibELF: Improve error message for missing symbols 2021-04-19 12:00:40 +02:00
Gunnar Beutner
6cb28ecee8 LibC+LibELF: Implement support for the dl_iterate_phdr helper
This helper is used by libgcc_s to figure out where the .eh_frame sections
are located for all loaded shared objects.
2021-04-18 10:55:25 +02:00
Gunnar Beutner
cd7512a2ad LibELF: Add support for loading objects with multiple data and text segments
This enables loading executables with multiple data and text segments. Also
it fixes loading executables where the text segment has a non-zero offset.

Example:

  $ echo "main () {}" > test.c
  $ gcc -Wl,-z,separate-code -o test test.c
  $ objdump -p test
  test:     file format elf32-i386

  Program Header:
      PHDR off    0x00000034 vaddr 0x00000034 paddr 0x00000034 align 2**2
           filesz 0x000000e0 memsz 0x000000e0 flags r--
    INTERP off    0x00000114 vaddr 0x00000114 paddr 0x00000114 align 2**0
           filesz 0x00000013 memsz 0x00000013 flags r--
      LOAD off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12
           filesz 0x000003c4 memsz 0x000003c4 flags r--
      LOAD off    0x00001000 vaddr 0x00001000 paddr 0x00001000 align 2**12
           filesz 0x00000279 memsz 0x00000279 flags r-x
      LOAD off    0x00002000 vaddr 0x00002000 paddr 0x00002000 align 2**12
           filesz 0x00000004 memsz 0x00000004 flags r--
      LOAD off    0x00002004 vaddr 0x00003004 paddr 0x00003004 align 2**12
           filesz 0x00000100 memsz 0x00000124 flags rw-
   DYNAMIC off    0x00002014 vaddr 0x00003014 paddr 0x00003014 align 2**2
           filesz 0x000000c8 memsz 0x000000c8 flags rw-
2021-04-14 13:12:52 +02:00
Brendan Coles
8ad74684ea LibELF: DynamicObject: Add rpath and runpath helpers 2021-03-22 17:46:05 +01:00
Brian Gianforcaro
069fd58381 LibELF: Convert more string literals to StringView literals.
Most of these won't have perf impact, but the optimization is
practically free, so no harm in fixing these up.
2021-02-24 14:45:34 +01:00
Andreas Kling
5d180d1f99 Everywhere: Rename ASSERT => VERIFY
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)

Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.

We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
2021-02-23 20:56:54 +01:00
Andreas Kling
b33a6a443e LibELF: Inline DynamicObject::hash_section()
This was high up in profiles and gets almost entirely optimized out
when inlined, so let's do that.
2021-02-23 20:33:32 +01:00
Andreas Kling
22b8110554 LibELF: Avoid doing strlen() on everything while iterating GNU hash
It's a lot faster to iterate the GNU hash tables if we don't have to
compute the length of every symbol name before rejecting it anyway while
comparing the first character. :^)
2021-02-23 19:43:44 +01:00
Andreas Kling
46a94a9a9e LibELF: Rename lookup_elf_symbol() => lookup_sysv_symbol()
We have two kinds of lookup, SYSV and GNU hash. Both are ELF lookups.
2021-02-23 19:43:44 +01:00
Andreas Kling
cc00df0f0f LibELF: Avoid calling strlen() in DynamicObject::hash_section()
The long-term fix here is to make StringView recognize compile-time
string literals and do the right thing automatically.
2021-02-23 19:43:44 +01:00
Andreas Kling
d6af3302e8 LibELF: Don't recompute the same ELF hashes over and over
When performing a global symbol lookup, we were recomputing the symbol
hashes once for every dynamic object searched. The hash function was
at the very top of a profile (15%) of program startup.

With this change, the hash function is no longer visible among the top
stacks in the profile. :^)
2021-02-23 19:43:44 +01:00
Andreas Kling
37420f1baf LibELF: Move ELF hash functions to their own file (and make constexpr) 2021-02-23 19:43:44 +01:00
Andreas Kling
f23b29f605 LibELF: Move DynamicObject::lookup_symbol() to DynamicLoader
Also simplify it by removing an unreachable code path.
2021-02-21 00:29:52 +01:00
Andreas Kling
a43910acc3 LibELF: Make SymbolLookupResult::address a VirtualAddress
Let's use a stronger type than void* for this since we're talking
specifically about a virtual address and not necessarily a pointer
to something actually in memory (yet).
2021-02-21 00:02:21 +01:00
Andreas Kling
1997d5de1a LibELF: Make symbol lookup functions return Optional<Symbol>
It was very confusing how these functions used the "undefined" state
of Symbol to signal lookup failure. Let's use Optional<T> to make things
a bit more understandable.
2021-02-21 00:02:21 +01:00
Andreas Kling
ba1eea9898 LibELF+DynamicLoader: Rename DynamicObject::construct() => create() 2021-02-21 00:02:21 +01:00
Andreas Kling
01f1e480e5 LibELF: Fix various clang-tidy warnings
Remove a bunch of unused code, unnecessary const, and make some
non-object-specific member functions static.
2021-02-21 00:02:21 +01:00
Andreas Kling
0c0127dc3f LibELF: Use StringView instead of "const char*" in dynamic linker code
There's no reason to use C strings more than absolutely necessary.
2021-02-20 22:29:12 +01:00
Andreas Kling
fa4c249425 LibELF+Userland: Enable RELRO for all userland executables :^)
The dynamic loader will now mark RELRO segments read-only after
performing relocations. This is pretty cool!

Note that this only applies to main executables so far,.
RELRO support for shared libraries will require some reorganizing
of the dynamic loader.
2021-02-18 18:55:19 +01:00
AnotherTest
09a43969ba Everywhere: Replace dbgln<flag>(...) with dbgln_if(flag, ...)
Replacement made by `find Kernel Userland -name '*.h' -o -name '*.cpp' | sed -i -Ee 's/dbgln\b<(\w+)>\(/dbgln_if(\1, /g'`
2021-02-08 18:08:55 +01:00