Commit graph

17 commits

Author SHA1 Message Date
Andrzej Janik
a1890e87c3 Various fixes 2025-09-24 17:11:10 +00:00
Violet
93820e3159
Handle PrmtSlow (#518)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-09-23 12:31:55 -07:00
Andrzej Janik
b5f41c7cd0
More runtime fixes, add mma instruction (#509)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-09-18 11:15:22 -07:00
Violet
7b5fdb30c4
Implement ldmatrix (#503)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-09-09 19:31:56 -07:00
Violet
d342e1a06e
Implement redux.sync for u32 and s32 (#500) 2025-09-08 16:13:28 -07:00
Andrzej Janik
ea99dcc0b0
Implement vote instruction and add support for %laneid (#484) 2025-08-28 18:23:09 -07:00
Violet
8f484d6a5f
Add support for fp8 to cvt (#468)
This implements specifically the fp8 conversion instructions needed by llm.c:

* `cvt.rn.satfinite{.relu}.f8x2type.f32`
* `cvt.rn{.relu}.f16x2.f8x2type`

It uses HIP's fp8 and fp16 headers: https://rocm.docs.amd.com/projects/HIP/en/docs-develop/reference/low_fp_types.html#fp8-quarter-precision.
2025-08-28 17:54:07 -07:00
Andrzej Janik
547536de4a
Add more instructions, tighten generated assembly (#475)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-08-18 11:12:57 -07:00
Andrzej Janik
65367f04ee
Fix how full-precision fp32 sqrt and div are handled (#467)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
Previously, when compiling full precision `sqrt`/`div` we'd leave it to the LLVM. LLVM looks at module's `denormal-fp-math-f32` mode, which is incompatible with how we handle denormals and could give wrong results in certain edge cases.
Instead handle it fully inside ZLUDA
2025-08-14 17:24:40 -07:00
Andrzej Janik
3746079b1a
Assorted instruction fixes (#423)
This fixes transcendentals and some other buggy instructions exposed by `ptx_tests` (abs, neg). Add (slow - hardware limitation) tanh.
Only two remaining incorrect instructions are div and sqrt with non-default rounding, but this commit is already bloated enough
2025-07-23 15:50:35 -07:00
Violet
27cfd50ddd
Implement nanosleep.u32 (#421)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
2025-07-21 17:42:04 -07:00
Violet
dc69808e54
Add support for shfl.sync.MODE.b32 (#409)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-07-16 17:23:11 -07:00
Violet
5cb0a9b8e8
Add support for bar.red.and.pred (#402)
Implements bar.red.and.pred and bar.red.or.pred, using the undocument __ockl_wgred functions. Doesn't yet add support for numbered barriers and threadcount, as these are not needed for llm.c.
2025-07-03 11:56:20 -07:00
Andrzej Janik
7ac67a89e9
Enable Geekbench 5 (#304) 2024-12-10 21:48:10 +01:00
Andrzej Janik
7a6df9dcbf
Fix host code and update to CUDA 12.4 (#299) 2024-12-02 00:29:57 +01:00
Andrzej Janik
970ba5aa25
Fix linking of AMD device libraries (#296)
It's weird that it fails without `-mno-link-builtin-bitcode-postopt`. I've tested it only on ROCm 6.2, might be broken on older or newer ROCm
2024-11-02 16:07:44 +01:00
Andrzej Janik
3870a96592
Re-enable all failing PTX tests (#277)
Additionally remove unused compilation paths
2024-10-16 03:15:48 +02:00