Andrzej Janik
a1890e87c3
Various fixes
2025-09-24 17:11:10 +00:00
Violet
93820e3159
Handle PrmtSlow ( #518 )
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-09-23 12:31:55 -07:00
Andrzej Janik
b5f41c7cd0
More runtime fixes, add mma instruction ( #509 )
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-09-18 11:15:22 -07:00
Violet
7b5fdb30c4
Implement ldmatrix ( #503 )
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-09-09 19:31:56 -07:00
Violet
d342e1a06e
Implement redux.sync for u32 and s32 ( #500 )
2025-09-08 16:13:28 -07:00
Andrzej Janik
ea99dcc0b0
Implement vote instruction and add support for %laneid ( #484 )
2025-08-28 18:23:09 -07:00
Violet
8f484d6a5f
Add support for fp8 to cvt
( #468 )
...
This implements specifically the fp8 conversion instructions needed by llm.c:
* `cvt.rn.satfinite{.relu}.f8x2type.f32`
* `cvt.rn{.relu}.f16x2.f8x2type`
It uses HIP's fp8 and fp16 headers: https://rocm.docs.amd.com/projects/HIP/en/docs-develop/reference/low_fp_types.html#fp8-quarter-precision .
2025-08-28 17:54:07 -07:00
Andrzej Janik
547536de4a
Add more instructions, tighten generated assembly ( #475 )
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-08-18 11:12:57 -07:00
Andrzej Janik
65367f04ee
Fix how full-precision fp32 sqrt and div are handled ( #467 )
...
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
Previously, when compiling full precision `sqrt`/`div` we'd leave it to the LLVM. LLVM looks at module's `denormal-fp-math-f32` mode, which is incompatible with how we handle denormals and could give wrong results in certain edge cases.
Instead handle it fully inside ZLUDA
2025-08-14 17:24:40 -07:00
Andrzej Janik
3746079b1a
Assorted instruction fixes ( #423 )
...
This fixes transcendentals and some other buggy instructions exposed by `ptx_tests` (abs, neg). Add (slow - hardware limitation) tanh.
Only two remaining incorrect instructions are div and sqrt with non-default rounding, but this commit is already bloated enough
2025-07-23 15:50:35 -07:00
Violet
27cfd50ddd
Implement nanosleep.u32
( #421 )
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
2025-07-21 17:42:04 -07:00
Violet
dc69808e54
Add support for shfl.sync.MODE.b32
( #409 )
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-07-16 17:23:11 -07:00
Violet
5cb0a9b8e8
Add support for bar.red.and.pred
( #402 )
...
Implements bar.red.and.pred and bar.red.or.pred, using the undocument __ockl_wgred functions. Doesn't yet add support for numbered barriers and threadcount, as these are not needed for llm.c.
2025-07-03 11:56:20 -07:00
Andrzej Janik
7ac67a89e9
Enable Geekbench 5 ( #304 )
2024-12-10 21:48:10 +01:00
Andrzej Janik
7a6df9dcbf
Fix host code and update to CUDA 12.4 ( #299 )
2024-12-02 00:29:57 +01:00
Andrzej Janik
970ba5aa25
Fix linking of AMD device libraries ( #296 )
...
It's weird that it fails without `-mno-link-builtin-bitcode-postopt`. I've tested it only on ROCm 6.2, might be broken on older or newer ROCm
2024-11-02 16:07:44 +01:00
Andrzej Janik
3870a96592
Re-enable all failing PTX tests ( #277 )
...
Additionally remove unused compilation paths
2024-10-16 03:15:48 +02:00