Commit graph

45 commits

Author SHA1 Message Date
Andrzej Janik
00d7cd131b Add mma 2025-09-17 01:51:29 +00:00
Andrzej Janik
3c56ee446c Parse griddepcontrol 2025-09-16 20:12:25 +00:00
Andrzej Janik
6c811a55d2
Random fixes (#504)
This is a collection of random changes coming from the workload I'm working on. The most important change is better support for `.params`: PTX uses .param namespace both for some local variables and kernel args. This is a problem for us because those are different address spaces on AMDGPU. So far we've made an effort to convert to local and const namespaces whenever possible, but this commit tries to handle more patterns, which are impossible to track precisely, by converting to generic space.
2025-09-12 13:52:33 -07:00
Violet
7b5fdb30c4
Implement ldmatrix (#503)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-09-09 19:31:56 -07:00
Violet
d81456a549
Add support for cvt_rn_bf16x2_f32 (#501)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-09-08 17:41:24 -07:00
Violet
d342e1a06e
Implement redux.sync for u32 and s32 (#500) 2025-09-08 16:13:28 -07:00
Andrzej Janik
869d291099
Progress compilation despite parsing errors (#495)
Previously if we ran into a broken instruction we'd fail whole compilation. This PR changes it so (only in Release mode) we try and progress at all cost. Meaning that if we had trouble parsing an instruction we just remove function form the output and continue.

For some workloads we can still compile a semi-broken, but meaningful subset of a module
2025-09-08 14:35:29 -07:00
Violet
4306646739
Support immediates in vector operands (#488)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-09-08 10:26:58 -07:00
Violet
e7f10afb51
Use Vec<RegOrImmediate> as const/global variable initializer (#490)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-09-05 16:41:41 -07:00
Violet
b7f3a647d7
Implement fma.rn.fn.bf16x2 (#496)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
* Add fma bf16x2 test

* Implement fma.rn.fn.bf16x2

* cargo fmt
2025-09-04 17:29:20 -07:00
Andrzej Janik
88b01c809e
Add small compiler fixes and a fake ptxas binary (#491) 2025-09-03 12:23:01 -07:00
Andrzej Janik
ea99dcc0b0
Implement vote instruction and add support for %laneid (#484) 2025-08-28 18:23:09 -07:00
Violet
8f484d6a5f
Add support for fp8 to cvt (#468)
This implements specifically the fp8 conversion instructions needed by llm.c:

* `cvt.rn.satfinite{.relu}.f8x2type.f32`
* `cvt.rn{.relu}.f16x2.f8x2type`

It uses HIP's fp8 and fp16 headers: https://rocm.docs.amd.com/projects/HIP/en/docs-develop/reference/low_fp_types.html#fp8-quarter-precision.
2025-08-28 17:54:07 -07:00
Violet
00eb553454
Add pass test mechanism for insert_implicit_conversions (#477)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-08-22 13:01:39 -07:00
Andrzej Janik
547536de4a
Add more instructions, tighten generated assembly (#475)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-08-18 11:12:57 -07:00
Andrzej Janik
65367f04ee
Fix how full-precision fp32 sqrt and div are handled (#467)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
Previously, when compiling full precision `sqrt`/`div` we'd leave it to the LLVM. LLVM looks at module's `denormal-fp-math-f32` mode, which is incompatible with how we handle denormals and could give wrong results in certain edge cases.
Instead handle it fully inside ZLUDA
2025-08-14 17:24:40 -07:00
Violet
a420601128
Add test for unrecognized statement error with vector braces (#472)
The old code using `take_till_inclusive` assumed that a right brace would be the end of a block and therefore never part of a statement. However, some PTX statements can include vector operands. This meant that any unrecognized statement with a vector operand would backtrace and eventually produce an unhelpful context error rather than an `UnrecognizedStatement` error.

This pull request also adds a mechanism for testing parser errors.
2025-08-13 17:23:51 -07:00
Joëlle van Essen
fe7a18f912
zoc (ZLUDA offline compiler) (#344) 2025-08-13 15:27:02 -07:00
Violet
d2f92e4267
More tracing for custom parsers (#471)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
2025-08-08 12:19:27 -07:00
Violet
ef98c1e0ba
Add tracing to custom parsers (#469)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-08-07 11:51:56 -07:00
Violet
94eec34bdb
More descriptive syntax errors (#466)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
* More descriptive syntax errors

* cargo fmt

* Remove brackets
2025-08-05 17:23:55 -07:00
Violet
dd05752fc4
Implement shf instruction (#463)
* Implement `shf` instruction

* Tests for `shf`

* cargo fmt
2025-08-04 10:15:46 -07:00
Violet
21ef5f60a3
Check Rust formatting on pull requests (#451)
* Check Rust formatting on pull requests

This should help us maintain consistent style, without having unrelated style changes in pull requests from running `rustfmt`.

* cargo fmt non-generated files

* Ignore generated files
2025-07-30 14:55:09 -07:00
Violet
2b90fdb56c
Add support for cp.async (#427)
Adds support for

* `cp.async`
* `cp.async.commit_group`
* `cp.async.wait_group`
* `cp.async.wait_all`

Asynchronous copy operations are only supported by AMD Instinct GPUs, so for now we lower them as synchronous copy operations. Because of this, `cp.async.commit_group`, `cp.async.wait_group`, and `cp.async.wait_all` are no-op.
2025-07-23 16:25:49 -07:00
Andrzej Janik
3746079b1a
Assorted instruction fixes (#423)
This fixes transcendentals and some other buggy instructions exposed by `ptx_tests` (abs, neg). Add (slow - hardware limitation) tanh.
Only two remaining incorrect instructions are div and sqrt with non-default rounding, but this commit is already bloated enough
2025-07-23 15:50:35 -07:00
Violet
a86ba3d642
Remove Type::Pointer (#428) 2025-07-23 11:22:17 -07:00
Violet
27cfd50ddd
Implement nanosleep.u32 (#421)
Some checks failed
ZLUDA / Build (Linux) (push) Has been cancelled
ZLUDA / Build (Windows) (push) Has been cancelled
ZLUDA / Run AMD GPU unit tests (push) Has been cancelled
ZLUDA / Build AMD GPU unit tests (push) Has been cancelled
2025-07-21 17:42:04 -07:00
Andrzej Janik
2f27c47acc
Improve error recovery (#418)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-07-17 10:02:03 -07:00
Violet
dc69808e54
Add support for shfl.sync.MODE.b32 (#409)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
ZLUDA / Build AMD GPU unit tests (push) Waiting to run
ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions
2025-07-16 17:23:11 -07:00
Violet
6fb09f393a
Handle WARP_SZ (#412)
* Add tests for `WARP_SZ`

* Handle WARP_SZ in parser
2025-07-16 11:02:17 -07:00
Violet
5cb0a9b8e8
Add support for bar.red.and.pred (#402)
Implements bar.red.and.pred and bar.red.or.pred, using the undocument __ockl_wgred functions. Doesn't yet add support for numbered barriers and threadcount, as these are not needed for llm.c.
2025-07-03 11:56:20 -07:00
Violet
1cf345329c
Make derive_parser work with all optional arguments (#397)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
The current implementation using `winnow`'s `opt` does not work for optional arguments that are in the middle of the command. For example, `bar{.cta}.red.op.pred   p, a{, b}, {!}c;`. This is because `opt` is greedy, and will always match `{, b}` instead of `,{!} c`. This change switches to using a custom combinator that handles this properly
2025-06-30 18:54:31 -07:00
Andrzej Janik
2a374ad880
Add fp saturation, fix various bugs in cvt instruction exposed by ptx_tests (#379)
Some checks are pending
ZLUDA / Build (Linux) (push) Waiting to run
ZLUDA / Build (Windows) (push) Waiting to run
2025-06-16 19:14:16 -07:00
Andrzej Janik
3361046760
Fix mad.wide, replace external CUDA library in test with our own (#376) 2025-06-09 21:33:18 -07:00
Andrzej Janik
c790ab45ec
Redo logging to better log dark API and performance libraries (#372) 2025-06-09 15:29:14 -07:00
Andrzej Janik
cc83b9f1f6
Create infrastructure for performance libraries (#363) 2025-05-01 22:37:18 +02:00
Joëlle van Essen
7cdab7abc2
Implement mul24 (#351) 2025-04-08 12:27:19 +02:00
Andrzej Janik
d704e92c97
Support instruction modes (denormal and rounding) on AMD GPUs (#342) 2025-03-17 21:37:26 +01:00
Andrzej Janik
646d746e02 Start working on mul24 2025-02-07 19:37:11 +00:00
Andrzej Janik
7a6df9dcbf
Fix host code and update to CUDA 12.4 (#299) 2024-12-02 00:29:57 +01:00
Andrzej Janik
b4cb3ade63
Recover from and report unknown instructions and directives (#295) 2024-11-02 15:57:57 +01:00
Andrzej Janik
3870a96592
Re-enable all failing PTX tests (#277)
Additionally remove unused compilation paths
2024-10-16 03:15:48 +02:00
Andrzej Janik
c92abba2bb
Refactor compilation passes (#270)
The overarching goal is to refactor all passes so they are module-scoped and not function-scoped. Additionally, make improvements to the most egregiously buggy/unfit passes (so the code is ready for the next major features: linking, ftz handling) and continue adding more code to the LLVM backend
2024-09-23 16:33:46 +02:00
Andrzej Janik
46def3e7e0
Connect new parser to LLVM bitcode backend (#269)
Some checks failed
Rust / Build and publish (Linux) (push) Has been cancelled
Rust / Build and publish (Windows) (push) Has been cancelled
This is very incomplete. Just enough code to emit LLVM bitcode and continue further development
2024-09-13 01:07:31 +02:00
Andrzej Janik
193eb29be8
PTX parser rewrite (#267)
Some checks failed
Rust / Build and publish (Linux) (push) Has been cancelled
Rust / Build and publish (Windows) (push) Has been cancelled
Replaces traditional LALRPOP-based parser with winnow-based parser to handle out-of-order instruction modifer. Generate instruction type and instruction visitor from a macro instead of writing by hand. Add separate compilation path using the new parser that only works in tests for now
2024-09-04 15:47:42 +02:00