Commit graph

483 commits

Author SHA1 Message Date
Andrzej Janik
a66fddc0fa Fail linking on undefined 2025-09-23 01:27:53 +00:00
Andrzej Janik
07acc64d33 Allow skipping post-values 2025-09-22 21:18:01 +00:00
Andrzej Janik
3bad9852a5 Minor compiler improvements 2025-09-22 20:29:22 +00:00
Andrzej Janik
18a2b765f7 Pass correct arguments 2025-09-20 01:54:52 +00:00
Andrzej Janik
2b9c8946ec Add replayer 2025-09-20 00:43:29 +00:00
Andrzej Janik
644a22fd43 Merge commit '160048a293' into trace_debugger 2025-09-19 23:33:11 +00:00
Andrzej Janik
160048a293 Fix cuCtxPopCurrent 2025-09-19 23:30:29 +00:00
Violet
875ac13be2
Support lists of variables to be declared (#516)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
For example,

```
.reg .u32 a, b;
```
2025-09-19 13:36:48 -07:00
Violet
62ec652e7c
Disable virtual memory management (#515)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
We don't currently support it, so report it as unsupported.
2025-09-18 19:11:30 -07:00
Andrzej Janik
3289d92f59 Filter enqueues to be saved 2025-09-19 02:07:59 +00:00
Andrzej Janik
f3e143d8dd Save source ptx and save to the right path 2025-09-19 01:53:01 +00:00
Andrzej Janik
d880ee78b5 Fix some bugs 2025-09-19 00:58:42 +00:00
Andrzej Janik
bfef3317dc Start working on trace replay 2025-09-19 00:39:27 +00:00
Andrzej Janik
b5f41c7cd0
More runtime fixes, add mma instruction (#509)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-09-18 11:15:22 -07:00
Violet
150ce171cf
Fix devcontainer (#514)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-09-17 14:38:42 -07:00
Violet
571dad0972
Add support for cuBLASLt functions used by llm.c (#512) 2025-09-17 11:02:21 -07:00
Violet
5185138596
Create bindings for hipblasLt (#510)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
Generate bindings for hipblasLt and make some changes to the bindings for cublasLt. Notably, the `hip_type` `Option` is changed to a `Vec`, so that multiple `From` implementations (for `rocblas_error` and `hipblasLtError`) can be created for `cublasError_t`.
2025-09-16 16:23:15 -07:00
Violet
3afb8d39e7
Implement cuStreamCreate (#511) 2025-09-16 15:52:35 -07:00
Andrzej Janik
262c25c76e
Use LD_AUDIT instead of LD_PRELOAD (#508)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
LD_AUDIT gives us more control that LD_PRELOAD and I've observed it to work much better
2025-09-15 11:58:45 -07:00
Andrzej Janik
044fab47e5
Update devcontainer (#507)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-09-14 19:03:14 -07:00
Violet
e3a99b7ee1
Fix min.ftz.nan.f16 for ROCm 6.3.4 (#506)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
This PR fixes compatibility issues with ROCm 6.3.4 by replacing the unavailable llvm.minimum.f16 intrinsic with an equivalent implementation using llvm.minnum.f16 combined with NaN handling logic. Same applies to max operations

This also adds `"amdgpu-ieee"="false"` attribute everywhere, which gives us better codegen for min/max everywhere
2025-09-12 16:48:47 -07:00
Andrzej Janik
6c811a55d2
Random fixes (#504)
This is a collection of random changes coming from the workload I'm working on. The most important change is better support for `.params`: PTX uses .param namespace both for some local variables and kernel args. This is a problem for us because those are different address spaces on AMDGPU. So far we've made an effort to convert to local and const namespaces whenever possible, but this commit tries to handle more patterns, which are impossible to track precisely, by converting to generic space.
2025-09-12 13:52:33 -07:00
Violet
7b5fdb30c4
Implement ldmatrix (#503)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-09-09 19:31:56 -07:00
Violet
3da39364e0
Make blame ignore formatting commit (#502) 2025-09-09 13:12:31 -07:00
Violet
d81456a549
Add support for cvt_rn_bf16x2_f32 (#501)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-09-08 17:41:24 -07:00
Violet
d342e1a06e
Implement redux.sync for u32 and s32 (#500) 2025-09-08 16:13:28 -07:00
Andrzej Janik
869d291099
Progress compilation despite parsing errors (#495)
Previously if we ran into a broken instruction we'd fail whole compilation. This PR changes it so (only in Release mode) we try and progress at all cost. Meaning that if we had trouble parsing an instruction we just remove function form the output and continue.

For some workloads we can still compile a semi-broken, but meaningful subset of a module
2025-09-08 14:35:29 -07:00
Violet
00ca92167d
Remove accidentally committed file (#499) 2025-09-08 13:18:11 -07:00
Violet
4306646739
Support immediates in vector operands (#488)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-09-08 10:26:58 -07:00
Violet
b11c17a48b
Fix const_ident.ll (#497)
Minor fix to .ptx was not updated in .ll
2025-09-08 09:49:59 -07:00
Violet
e7f10afb51
Use Vec<RegOrImmediate> as const/global variable initializer (#490)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-09-05 16:41:41 -07:00
Violet
b7f3a647d7
Implement fma.rn.fn.bf16x2 (#496)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
* Add fma bf16x2 test

* Implement fma.rn.fn.bf16x2

* cargo fmt
2025-09-04 17:29:20 -07:00
Andrzej Janik
5309065cc1
Enhance bug report template with ZLUDA version input (#494)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
Added version input field to bug report template with a note on version support.
2025-09-03 16:20:34 -07:00
Violet
ac54d9a8cc
Add issue form for zluda_trace logs (#432) 2025-09-03 14:34:08 -07:00
Andrzej Janik
6dd633d32a
Update quick start link in README.md (#493) 2025-09-03 14:18:28 -07:00
Andrzej Janik
a34a8da53f
Rework the documentation and landing page (#474) 2025-09-03 14:11:46 -07:00
Andrzej Janik
88b01c809e
Add small compiler fixes and a fake ptxas binary (#491) 2025-09-03 12:23:01 -07:00
Andrzej Janik
8a7a5b45be
Reorganize driver host tests, fix bugs around pointer host code (#492) 2025-09-03 12:22:07 -07:00
Violet
08f7e874e3
Update broken tests (#489)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
It seems like #484 broke the LLVM IR tests; this PR updates the .ll files to match what's now being produced.
2025-09-02 16:14:54 -07:00
aiwhskruht
4752fcdcf2
Api traits test code (#487)
Add initial templated API test support. This needs to be improved to use an attribute macro, but that will require some major surgery :(
2025-09-02 08:57:54 -07:00
Andrzej Janik
9d4f1699d0
Do a better job in zluda_trace when saving opaque ELF binaries (#486)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-08-28 18:23:25 -07:00
Andrzej Janik
ea99dcc0b0
Implement vote instruction and add support for %laneid (#484) 2025-08-28 18:23:09 -07:00
Violet
8f484d6a5f
Add support for fp8 to cvt (#468)
This implements specifically the fp8 conversion instructions needed by llm.c:

* `cvt.rn.satfinite{.relu}.f8x2type.f32`
* `cvt.rn{.relu}.f16x2.f8x2type`

It uses HIP's fp8 and fp16 headers: https://rocm.docs.amd.com/projects/HIP/en/docs-develop/reference/low_fp_types.html#fp8-quarter-precision.
2025-08-28 17:54:07 -07:00
Andrzej Janik
3632f2bf03
Some fixes to BLASLt (#482)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-08-26 14:28:36 -07:00
Andrzej Janik
ec1358af1c
Add more NVML and cuBLAS coverage (#481)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-08-26 12:55:12 -07:00
Andrzej Janik
62d340e4bd
Add more host-side functionality (#480)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-08-25 16:55:17 -07:00
Violet
de319f7c00
Add test for conversion from .f16x2 to .b32 (#479) 2025-08-25 15:33:53 -07:00
Violet
00eb553454
Add pass test mechanism for insert_implicit_conversions (#477)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-08-22 13:01:39 -07:00
Andrzej Janik
e805cb72a5
Add nvml tracing (#476)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-08-18 23:09:15 -07:00
Andrzej Janik
547536de4a
Add more instructions, tighten generated assembly (#475)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-08-18 11:12:57 -07:00