ZLUDA

mirror of https://github.com/vosen/ZLUDA.git synced 2025-10-04 15:19:37 +00:00

Author	SHA1	Message	Date
Andrzej Janik	48e20d06bf	Minor compiler improvements	2025-09-24 17:17:40 +00:00
Violet	93820e3159	Handle PrmtSlow (#518 ) Some checks failed ZLUDA / Build (Linux) (push) Has been cancelled Details ZLUDA / Build (Windows) (push) Has been cancelled Details ZLUDA / Build AMD GPU unit tests (push) Has been cancelled Details ZLUDA / Run AMD GPU unit tests (push) Has been cancelled Details	2025-09-23 12:31:55 -07:00
Violet	875ac13be2	Support lists of variables to be declared (#516 ) Some checks failed ZLUDA / Build (Linux) (push) Has been cancelled Details ZLUDA / Build (Windows) (push) Has been cancelled Details ZLUDA / Build AMD GPU unit tests (push) Has been cancelled Details ZLUDA / Run AMD GPU unit tests (push) Has been cancelled Details For example, ``` .reg .u32 a, b; ```	2025-09-19 13:36:48 -07:00
Andrzej Janik	b5f41c7cd0	More runtime fixes, add mma instruction (#509 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details ZLUDA / Build AMD GPU unit tests (push) Waiting to run Details ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions Details	2025-09-18 11:15:22 -07:00
Andrzej Janik	6c811a55d2	Random fixes (#504 ) This is a collection of random changes coming from the workload I'm working on. The most important change is better support for `.params`: PTX uses .param namespace both for some local variables and kernel args. This is a problem for us because those are different address spaces on AMDGPU. So far we've made an effort to convert to local and const namespaces whenever possible, but this commit tries to handle more patterns, which are impossible to track precisely, by converting to generic space.	2025-09-12 13:52:33 -07:00
Violet	7b5fdb30c4	Implement ldmatrix (#503 ) Some checks failed ZLUDA / Build (Linux) (push) Has been cancelled Details ZLUDA / Build (Windows) (push) Has been cancelled Details ZLUDA / Build AMD GPU unit tests (push) Has been cancelled Details ZLUDA / Run AMD GPU unit tests (push) Has been cancelled Details	2025-09-09 19:31:56 -07:00
Violet	d81456a549	Add support for cvt_rn_bf16x2_f32 (#501 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details ZLUDA / Build AMD GPU unit tests (push) Waiting to run Details ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions Details	2025-09-08 17:41:24 -07:00
Violet	d342e1a06e	Implement redux.sync for u32 and s32 (#500 )	2025-09-08 16:13:28 -07:00
Andrzej Janik	869d291099	Progress compilation despite parsing errors (#495 ) Previously if we ran into a broken instruction we'd fail whole compilation. This PR changes it so (only in Release mode) we try and progress at all cost. Meaning that if we had trouble parsing an instruction we just remove function form the output and continue. For some workloads we can still compile a semi-broken, but meaningful subset of a module	2025-09-08 14:35:29 -07:00
Violet	4306646739	Support immediates in vector operands (#488 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details ZLUDA / Build AMD GPU unit tests (push) Waiting to run Details ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions Details	2025-09-08 10:26:58 -07:00
Violet	e7f10afb51	Use `Vec<RegOrImmediate>` as const/global variable initializer (#490 ) Some checks failed ZLUDA / Build (Linux) (push) Has been cancelled Details ZLUDA / Build (Windows) (push) Has been cancelled Details ZLUDA / Build AMD GPU unit tests (push) Has been cancelled Details ZLUDA / Run AMD GPU unit tests (push) Has been cancelled Details	2025-09-05 16:41:41 -07:00
Violet	b7f3a647d7	Implement `fma.rn.fn.bf16x2` (#496 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details ZLUDA / Build AMD GPU unit tests (push) Waiting to run Details ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions Details * Add fma bf16x2 test * Implement fma.rn.fn.bf16x2 * cargo fmt	2025-09-04 17:29:20 -07:00
Andrzej Janik	88b01c809e	Add small compiler fixes and a fake ptxas binary (#491 )	2025-09-03 12:23:01 -07:00
Andrzej Janik	ea99dcc0b0	Implement vote instruction and add support for %laneid (#484 )	2025-08-28 18:23:09 -07:00
Violet	8f484d6a5f	Add support for fp8 to `cvt` (#468 ) This implements specifically the fp8 conversion instructions needed by llm.c: * `cvt.rn.satfinite{.relu}.f8x2type.f32` * `cvt.rn{.relu}.f16x2.f8x2type` It uses HIP's fp8 and fp16 headers: https://rocm.docs.amd.com/projects/HIP/en/docs-develop/reference/low_fp_types.html#fp8-quarter-precision.	2025-08-28 17:54:07 -07:00
Violet	00eb553454	Add pass test mechanism for insert_implicit_conversions (#477 ) Some checks failed ZLUDA / Build (Linux) (push) Has been cancelled Details ZLUDA / Build (Windows) (push) Has been cancelled Details ZLUDA / Build AMD GPU unit tests (push) Has been cancelled Details ZLUDA / Run AMD GPU unit tests (push) Has been cancelled Details	2025-08-22 13:01:39 -07:00
Andrzej Janik	547536de4a	Add more instructions, tighten generated assembly (#475 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details ZLUDA / Build AMD GPU unit tests (push) Waiting to run Details ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions Details	2025-08-18 11:12:57 -07:00
Andrzej Janik	65367f04ee	Fix how full-precision fp32 sqrt and div are handled (#467 ) Some checks failed ZLUDA / Build (Linux) (push) Has been cancelled Details ZLUDA / Build (Windows) (push) Has been cancelled Details ZLUDA / Build AMD GPU unit tests (push) Has been cancelled Details ZLUDA / Run AMD GPU unit tests (push) Has been cancelled Details Previously, when compiling full precision `sqrt`/`div` we'd leave it to the LLVM. LLVM looks at module's `denormal-fp-math-f32` mode, which is incompatible with how we handle denormals and could give wrong results in certain edge cases. Instead handle it fully inside ZLUDA	2025-08-14 17:24:40 -07:00
Violet	a420601128	Add test for unrecognized statement error with vector braces (#472 ) The old code using `take_till_inclusive` assumed that a right brace would be the end of a block and therefore never part of a statement. However, some PTX statements can include vector operands. This meant that any unrecognized statement with a vector operand would backtrace and eventually produce an unhelpful context error rather than an `UnrecognizedStatement` error. This pull request also adds a mechanism for testing parser errors.	2025-08-13 17:23:51 -07:00
Joëlle van Essen	fe7a18f912	zoc (ZLUDA offline compiler) (#344 )	2025-08-13 15:27:02 -07:00
Violet	d2f92e4267	More tracing for custom parsers (#471 ) Some checks failed ZLUDA / Build (Linux) (push) Has been cancelled Details ZLUDA / Build (Windows) (push) Has been cancelled Details ZLUDA / Build AMD GPU unit tests (push) Has been cancelled Details ZLUDA / Run AMD GPU unit tests (push) Has been cancelled Details	2025-08-08 12:19:27 -07:00
Violet	ef98c1e0ba	Add tracing to custom parsers (#469 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details ZLUDA / Build AMD GPU unit tests (push) Waiting to run Details ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions Details	2025-08-07 11:51:56 -07:00
Violet	94eec34bdb	More descriptive syntax errors (#466 ) Some checks failed ZLUDA / Build (Linux) (push) Has been cancelled Details ZLUDA / Build (Windows) (push) Has been cancelled Details ZLUDA / Build AMD GPU unit tests (push) Has been cancelled Details ZLUDA / Run AMD GPU unit tests (push) Has been cancelled Details * More descriptive syntax errors * cargo fmt * Remove brackets	2025-08-05 17:23:55 -07:00
Violet	dd05752fc4	Implement `shf` instruction (#463 ) * Implement `shf` instruction * Tests for `shf` * cargo fmt	2025-08-04 10:15:46 -07:00
Violet	21ef5f60a3	Check Rust formatting on pull requests (#451 ) * Check Rust formatting on pull requests This should help us maintain consistent style, without having unrelated style changes in pull requests from running `rustfmt`. * cargo fmt non-generated files * Ignore generated files	2025-07-30 14:55:09 -07:00
Violet	2b90fdb56c	Add support for `cp.async` (#427 ) Adds support for * `cp.async` * `cp.async.commit_group` * `cp.async.wait_group` * `cp.async.wait_all` Asynchronous copy operations are only supported by AMD Instinct GPUs, so for now we lower them as synchronous copy operations. Because of this, `cp.async.commit_group`, `cp.async.wait_group`, and `cp.async.wait_all` are no-op.	2025-07-23 16:25:49 -07:00
Andrzej Janik	3746079b1a	Assorted instruction fixes (#423 ) This fixes transcendentals and some other buggy instructions exposed by `ptx_tests` (abs, neg). Add (slow - hardware limitation) tanh. Only two remaining incorrect instructions are div and sqrt with non-default rounding, but this commit is already bloated enough	2025-07-23 15:50:35 -07:00
Violet	a86ba3d642	Remove Type::Pointer (#428 )	2025-07-23 11:22:17 -07:00
Violet	27cfd50ddd	Implement `nanosleep.u32` (#421 ) Some checks failed ZLUDA / Build (Linux) (push) Has been cancelled Details ZLUDA / Build (Windows) (push) Has been cancelled Details ZLUDA / Run AMD GPU unit tests (push) Has been cancelled Details ZLUDA / Build AMD GPU unit tests (push) Has been cancelled Details	2025-07-21 17:42:04 -07:00
Andrzej Janik	2f27c47acc	Improve error recovery (#418 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details ZLUDA / Build AMD GPU unit tests (push) Waiting to run Details ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions Details	2025-07-17 10:02:03 -07:00
Violet	dc69808e54	Add support for `shfl.sync.MODE.b32` (#409 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details ZLUDA / Build AMD GPU unit tests (push) Waiting to run Details ZLUDA / Run AMD GPU unit tests (push) Blocked by required conditions Details	2025-07-16 17:23:11 -07:00
Violet	6fb09f393a	Handle `WARP_SZ` (#412 ) * Add tests for `WARP_SZ` * Handle WARP_SZ in parser	2025-07-16 11:02:17 -07:00
Violet	5cb0a9b8e8	Add support for `bar.red.and.pred` (#402 ) Implements bar.red.and.pred and bar.red.or.pred, using the undocument __ockl_wgred functions. Doesn't yet add support for numbered barriers and threadcount, as these are not needed for llm.c.	2025-07-03 11:56:20 -07:00
Violet	1cf345329c	Make `derive_parser` work with all optional arguments (#397 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details The current implementation using `winnow`'s `opt` does not work for optional arguments that are in the middle of the command. For example, `bar{.cta}.red.op.pred p, a{, b}, {!}c;`. This is because `opt` is greedy, and will always match `{, b}` instead of `,{!} c`. This change switches to using a custom combinator that handles this properly	2025-06-30 18:54:31 -07:00
Andrzej Janik	2a374ad880	Add fp saturation, fix various bugs in cvt instruction exposed by ptx_tests (#379 ) Some checks are pending ZLUDA / Build (Linux) (push) Waiting to run Details ZLUDA / Build (Windows) (push) Waiting to run Details	2025-06-16 19:14:16 -07:00
Andrzej Janik	3361046760	Fix mad.wide, replace external CUDA library in test with our own (#376 )	2025-06-09 21:33:18 -07:00
Andrzej Janik	c790ab45ec	Redo logging to better log dark API and performance libraries (#372 )	2025-06-09 15:29:14 -07:00
Andrzej Janik	cc83b9f1f6	Create infrastructure for performance libraries (#363 )	2025-05-01 22:37:18 +02:00
Joëlle van Essen	7cdab7abc2	Implement mul24 (#351 )	2025-04-08 12:27:19 +02:00
Andrzej Janik	d704e92c97	Support instruction modes (denormal and rounding) on AMD GPUs (#342 )	2025-03-17 21:37:26 +01:00
Andrzej Janik	646d746e02	Start working on mul24	2025-02-07 19:37:11 +00:00
Andrzej Janik	7a6df9dcbf	Fix host code and update to CUDA 12.4 (#299 )	2024-12-02 00:29:57 +01:00
Andrzej Janik	b4cb3ade63	Recover from and report unknown instructions and directives (#295 )	2024-11-02 15:57:57 +01:00
Andrzej Janik	3870a96592	Re-enable all failing PTX tests (#277 ) Additionally remove unused compilation paths	2024-10-16 03:15:48 +02:00
Andrzej Janik	c92abba2bb	Refactor compilation passes (#270 ) The overarching goal is to refactor all passes so they are module-scoped and not function-scoped. Additionally, make improvements to the most egregiously buggy/unfit passes (so the code is ready for the next major features: linking, ftz handling) and continue adding more code to the LLVM backend	2024-09-23 16:33:46 +02:00
Andrzej Janik	46def3e7e0	Connect new parser to LLVM bitcode backend (#269 ) Some checks failed Rust / Build and publish (Linux) (push) Has been cancelled Details Rust / Build and publish (Windows) (push) Has been cancelled Details This is very incomplete. Just enough code to emit LLVM bitcode and continue further development	2024-09-13 01:07:31 +02:00
Andrzej Janik	193eb29be8	PTX parser rewrite (#267 ) Some checks failed Rust / Build and publish (Linux) (push) Has been cancelled Details Rust / Build and publish (Windows) (push) Has been cancelled Details Replaces traditional LALRPOP-based parser with winnow-based parser to handle out-of-order instruction modifer. Generate instruction type and instruction visitor from a macro instead of writing by hand. Add separate compilation path using the new parser that only works in tests for now	2024-09-04 15:47:42 +02:00

47 commits