mirror of
https://github.com/vosen/ZLUDA.git
synced 2025-09-06 01:27:02 +00:00
Rework the documentation and landing page (#474)
This commit is contained in:
parent
88b01c809e
commit
a34a8da53f
9 changed files with 440 additions and 71 deletions
76
README.md
76
README.md
|
@ -1,74 +1,8 @@
|
|||
[](https://discord.gg/sg6BNzXuc7)
|
||||
ZLUDA is a drop-in replacement for CUDA on non-NVIDIA GPUs. ZLUDA allows running unmodified CUDA applications using non-NVIDIA GPUs with near-native performance
|
||||
|
||||
# ZLUDA
|
||||
<div align="center">
|
||||
|
||||
ZLUDA is a drop-in replacement for CUDA on non-NVIDIA GPUs. ZLUDA allows running unmodified CUDA applications using non-NVIDIA GPUs with near-native performance.
|
||||
<!-- 80x28 104.75x28 62x28-->
|
||||
[<img src="https://img.shields.io/badge/quick start-green?style=for-the-badge&logo=readthedocs&logoColor=white" width="267.5" height="56">](https://zluda--474.org.readthedocs.build/474/) [<img src="https://img.shields.io/badge/Discord-%235865F2.svg?style=for-the-badge&logo=discord&logoColor=white" width="209.5" height="56">](https://discord.gg/sg6BNzXuc7) [<img src="https://img.shields.io/badge/news-red?style=for-the-badge&logo=book&logoColor=white" width="124" height="56">](https://vosen.github.io/ZLUDA/)
|
||||
|
||||
ZLUDA supports AMD Radeon RX 5000 series and newer GPUs (both desktop and integrated).
|
||||
|
||||

|
||||
|
||||
ZLUDA is a work in progress. Follow development here and say hi on [Discord](https://discord.gg/sg6BNzXuc7). For more details, see the announcement: https://vosen.github.io/ZLUDA/blog/zludas-third-life/
|
||||
|
||||
## Usage
|
||||
**Warning**: This version ZLUDA is under heavy development (more [here](https://vosen.github.io/ZLUDA/blog/zludas-third-life/)) and right now only supports Geekbench. ZLUDA probably will not work with your application just yet.
|
||||
|
||||
### Windows
|
||||
You should have a recent AMD GPU driver ("AMD Software: Adrenalin Edition") installed.\
|
||||
To run your application, you should either:
|
||||
* (Recommended approach) Copy ZLUDA-provided `nvcuda.dll` and `nvml.dll` from `target\release` (if built from sources) or `zluda` (if downloaded a zip package) into a path which your application uses to load CUDA. Paths vary application to application, but usually it's the directory where the .exe file is located
|
||||
* Use ZLUDA launcher like below. ZLUDA launcher is known to be buggy and incomplete:
|
||||
```
|
||||
<ZLUDA_DIRECTORY>\zluda_with.exe -- <APPLICATION> <APPLICATIONS_ARGUMENTS>
|
||||
```
|
||||
|
||||
### Linux
|
||||
|
||||
Run your application like this:
|
||||
```
|
||||
LD_LIBRARY_PATH=<ZLUDA_DIRECTORY> <APPLICATION> <APPLICATIONS_ARGUMENTS>
|
||||
```
|
||||
|
||||
where `<ZLUDA_DIRECTORY>` is the directory which contains ZLUDA-provided `libcuda.so`: `target/release` if you built from sources or `zluda` if you downloaded a prebuilt package.
|
||||
|
||||
### MacOS
|
||||
|
||||
Not supported
|
||||
|
||||
## Building
|
||||
|
||||
### Dependencies
|
||||
|
||||
* Git
|
||||
* CMake
|
||||
* Python 3
|
||||
* Rust compiler (recent version)
|
||||
* C++ compiler
|
||||
* (Optional, but recommended) [Ninja build system](https://ninja-build.org/)
|
||||
|
||||
### Build steps
|
||||
|
||||
* Git clone the repo (make sure to use `--recursive` option to fetch submodules):
|
||||
`git clone --recursive https://github.com/vosen/ZLUDA.git`
|
||||
* Enter freshly cloned `ZLUDA` directory and build with cargo (this takes a while):
|
||||
`cargo xtask --release`
|
||||
|
||||
## Contributing
|
||||
|
||||
ZLUDA project has a commercial backing and _does not_ accept donations.
|
||||
ZLUDA project accepts pull requests and other non-monetary contributions.
|
||||
|
||||
If you want to contribute a code fix or documentation update, feel free to open a Pull Request.
|
||||
|
||||
### Getting started
|
||||
|
||||
There's no architecture document (yet). The two most important crates in ZLUDA are `ptx` (PTX compiler) and `zluda` (AMD GPU runtime). A good starting point to tinkering with the project is to run one of the `ptx` unit tests under a debugger and understand what it is doing. `cargo test -p ptx -- ::add_amdgpu` is a simple test that adds two numbers.
|
||||
|
||||
Github issues tagged with ["help wanted"](https://github.com/vosen/ZLUDA/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) are tasks that are self-contained. Their level of difficulty varies, and they are not always good beginner tasks, but they are defined unambiguously.
|
||||
|
||||
If you have questions, feel free to ask on [#devtalk channel on Discord](https://discord.com/channels/1273316903783497778/1303329281409159270).
|
||||
|
||||
|
||||
## License
|
||||
|
||||
This software is dual-licensed under either the Apache 2.0 license or the MIT license. See [LICENSE-APACHE](LICENSE-APACHE) or [LICENSE-MIT](LICENSE-MIT) for details
|
||||
<div/>
|
1
docs/.gitignore
vendored
Normal file
1
docs/.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
book
|
11
docs/.readthedocs.yaml
Normal file
11
docs/.readthedocs.yaml
Normal file
|
@ -0,0 +1,11 @@
|
|||
version: 2
|
||||
build:
|
||||
os: ubuntu-lts-latest
|
||||
tools:
|
||||
rust: latest
|
||||
jobs:
|
||||
install:
|
||||
- cargo install mdbook mdbook-alerts --root .
|
||||
build:
|
||||
html:
|
||||
- PATH=$PATH:./bin mdbook build docs --dest-dir $READTHEDOCS_OUTPUT/html
|
8
docs/book.toml
Normal file
8
docs/book.toml
Normal file
|
@ -0,0 +1,8 @@
|
|||
[book]
|
||||
authors = ["Andrzej Janik"]
|
||||
language = "en"
|
||||
multilingual = false
|
||||
src = "src"
|
||||
title = "ZLUDA"
|
||||
|
||||
[preprocessor.alerts]
|
8
docs/src/SUMMARY.md
Normal file
8
docs/src/SUMMARY.md
Normal file
|
@ -0,0 +1,8 @@
|
|||
# Summary
|
||||
|
||||
# General
|
||||
- [Quick start](./quick_start.md)
|
||||
- [Troubleshooting](./troubleshooting.md)
|
||||
- [FAQ](./faq.md)
|
||||
# For developers
|
||||
- [Building from source](./building.md)
|
19
docs/src/building.md
Normal file
19
docs/src/building.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
# Building
|
||||
|
||||
## Dependencies
|
||||
|
||||
* Git
|
||||
* CMake
|
||||
* Python 3
|
||||
* Rust compiler (recent version)
|
||||
* C++ compiler
|
||||
* (Linux only) HIP ([instructions here](https://rocm.docs.amd.com/projects/HIP/en/latest/install/install.html))
|
||||
* (Optional, but recommended) [Ninja build system](https://ninja-build.org/)
|
||||
|
||||
## Build steps
|
||||
|
||||
* Git clone the repo (make sure to use `--recursive` option to fetch submodules):
|
||||
`git clone --recursive https://github.com/vosen/ZLUDA.git`
|
||||
* Enter freshly cloned `ZLUDA` directory and build with cargo (this takes a while):
|
||||
* `cargo xtask --release` for Release build
|
||||
* `cargo xtask` for Debug build
|
89
docs/src/faq.md
Normal file
89
docs/src/faq.md
Normal file
|
@ -0,0 +1,89 @@
|
|||
# FAQ
|
||||
|
||||
> [!WARNING]
|
||||
> For legal reasons we can't help you with the pre-rollback versions (older than 4). See more here: [https://www.theregister.com/2024/08/09/amd_zluda_take_down](https://www.theregister.com/2024/08/09/amd_zluda_take_down/)
|
||||
|
||||
## General
|
||||
|
||||
1. How I can donate to ZLUDA development?
|
||||
|
||||
The ZLUDA project is fully funded and only accepts donations of labor.
|
||||
|
||||
1. What organization is funding ZLUDA development?
|
||||
|
||||
This will be revealed in due time.
|
||||
|
||||
1. How can I follow ZLUDA's progress
|
||||
|
||||
* Join our [Discord](https://discord.gg/sg6BNzXuc7)
|
||||
* Every quarter we publish a progress report on [ZLUDA's blog](https://vosen.github.io/ZLUDA/)
|
||||
|
||||
|
||||
## Hardware
|
||||
|
||||
1. AMD GPU support?
|
||||
|
||||
ZLUDA supports AMD Radeon RX 5000 series and newer GPUs (both desktop and integrated).
|
||||
Older consumer GPUs (Polaris, Vega, etc.) and server‑class GPUs are not supported; these architectures differ significantly from recent desktop GPUs and would require substantial engineering effort.
|
||||
We expect that the near-future unified GPU architecture (UDNA) will be more similar to desktop GPUs.
|
||||
|
||||
1. Intel GPU support?
|
||||
|
||||
ZLUDA previously supported Intel GPUs, but not currently. It is possible to revive the Intel backend. The development team is focusing on high‑quality AMD GPU support and welcomes contributions.
|
||||
|
||||
1. NVIDIA GPU support?
|
||||
|
||||
Unlikely to ever be on the roadmap, because NVIDIA users can use the original CUDA. That said, if someone wants to add support we are open to contributions.
|
||||
|
||||
1. Qualcomm GPU support?
|
||||
|
||||
It would be interesting to have Qualcomm GPU support, but the development team is focusing on high‑quality AMD GPU support. We welcome contributions.
|
||||
|
||||
1. macOS support?
|
||||
|
||||
Unlikely to ever happen. There is very little non‑deprecated CUDA software for macOS, and what remains will soon be unsupported.
|
||||
|
||||
1. ZLUDA on top of OpenCL or Vulkan?
|
||||
|
||||
ZLUDA could be ported to OpenCL or Vulkan, but with significantly reduced functionality. This might be acceptable for a narrow use case, but it would not be as general‑purpose as using the native backend. Examples of features available with the current compilation path that are not exposed by either Vulkan or OpenCL:
|
||||
* Disabling FP contraction
|
||||
* Explicit alignment
|
||||
* Some subgroup and group operations
|
||||
* Bindless images
|
||||
* Pointer casts
|
||||
* Arbitrary virtual calls
|
||||
* Inline assembly
|
||||
* Rounding modes
|
||||
* Denormal modes
|
||||
|
||||
Additionally, performance libraries (cuBLAS, cuDNN, etc.) cannot be easily mapped through Vulkan or OpenCL.
|
||||
|
||||
# Software
|
||||
|
||||
1. PyTorch support?
|
||||
|
||||
PyTorch support is currently our top priority. We expect to have initial support fourth quarter of 2025.
|
||||
|
||||
1. Tensorflow support?
|
||||
|
||||
Tensorflow support is currently a top priority for ZLUDA and will follow PyTorch support.
|
||||
|
||||
1. Blender support
|
||||
|
||||
Blender is not on the roadmap, but it's often requested. Support might be added at certain point, but it's a low priority. If ZLUDA supports Blender, it will not support hardware ray-tracing (see _Hardware ray-tracing (OptiX) support?_ section below).
|
||||
|
||||
1. Hardware ray-tracing (OptiX) support?
|
||||
|
||||
OptiX support is exceedingly complex. While it's built on top of CUDA, it uses its own dialect of PTX, uses its own host code and requires its own specific optimizations. It's unlikely that ZLUDA will ever support OptiX again. OptiX would require a very dedicated contributor (or team of contributors) to step in.
|
||||
|
||||
1. Support for games using 32 bit PhysX?
|
||||
|
||||
We are convinced that it's possible (both for AMD GPUs and NVIDIA GPUs). Necessary groundwork has been done (log collection) and there is a plan how to implement the feature. It's not on the roadmap and we are hoping for outside contributors to step in.
|
||||
|
||||
1. Support for games using 64 bit PhysX (GameWorks)?
|
||||
|
||||
It is definitely possible, pre-rollback ZLUDA had this capability. It's not on the roadmap and would require outside contributions.
|
||||
|
||||
1. DLSS support?
|
||||
|
||||
Previously DLSS support was blocked by a missing functionality in AMD's Direct3D driver: ability to enqueue HIP kernels into Direct3D command list. This functionality now ships in the newest driver and DLSS support should be possible. It's not on our roadmap, but if someone steps in to implement it, we'd be happy to merge.
|
44
docs/src/quick_start.md
Normal file
44
docs/src/quick_start.md
Normal file
|
@ -0,0 +1,44 @@
|
|||
# Quick start
|
||||
|
||||
> [!WARNING]
|
||||
> This version of ZLUDA is under heavy development and will likely not work with your application yet. In the meantime, you are encouraged to try it and report results.
|
||||
|
||||
## How to get it
|
||||
|
||||
ZLUDA evolves quickly, we recommend downloading the [most recent pre-release version](https://github.com/vosen/ZLUDA/releases).
|
||||
Periodically, we mark a pre-release version as stable, you can download it from the "Releases" section on our [Github page](https://github.com/vosen/ZLUDA).
|
||||
|
||||
## Usage
|
||||
|
||||
### Windows
|
||||
You should have a recent AMD GPU driver ("AMD Software: Adrenalin Edition") installed.\
|
||||
To run your application either:
|
||||
* (_Recommended_) Copy all ZLUDA files (including `nvcuda.dll`) from `zluda` (if you downloaded a zip package) or `target\release` (if you built from sources) into a path which your application uses to load CUDA. Paths vary application to application, but usually it's the directory where the .exe file is located
|
||||
* Use ZLUDA launcher:
|
||||
```
|
||||
<ZLUDA_DIRECTORY>\zluda_with.exe -- <APPLICATION> <APPLICATION_ARGUMENTS>
|
||||
```
|
||||
ZLUDA launcher is known to be buggy and incomplete, but it's less invasive
|
||||
|
||||
### Linux
|
||||
|
||||
Run your application like this:
|
||||
* Recommended method
|
||||
```
|
||||
LD_LIBRARY_PATH="<ZLUDA_DIRECTORY>:$LD_LIBRARY_PATH" <APPLICATION> <APPLICATION_ARGUMENTS>
|
||||
```
|
||||
|
||||
where `<ZLUDA_DIRECTORY>` is the directory which contains ZLUDA-provided `libcuda.so`: `zluda` if you downloaded a prebuilt package or `target/release` if you built from sources.
|
||||
|
||||
* Alternative method
|
||||
```
|
||||
LD_PRELOAD="<ZLUDA_DIRECTORY>/zluda_preload" <APPLICATION> <APPLICATION_ARGUMENTS>
|
||||
```
|
||||
|
||||
where `<ZLUDA_DIRECTORY>` is the directory which contains ZLUDA-provided `libcuda.so`: `zluda` if you downloaded a prebuilt package or `target/release` if you built from sources.
|
||||
|
||||
|
||||
|
||||
### macOS
|
||||
|
||||
Not supported
|
255
docs/src/troubleshooting.md
Normal file
255
docs/src/troubleshooting.md
Normal file
|
@ -0,0 +1,255 @@
|
|||
# Troubleshooting
|
||||
|
||||
## Introduction
|
||||
|
||||
zluda_trace is a [shim](https://en.wikipedia.org/wiki/Shim_(computing))
|
||||
for the CUDA API which traces application's CUDA usage. If your application is
|
||||
encountering issues with ZLUDA, you should use zluda_trace to find out where and
|
||||
why ZLUDA fails.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Linux
|
||||
|
||||
Run your application like this:
|
||||
|
||||
#### AMD GPU:
|
||||
|
||||
```bash
|
||||
ZLUDA_CUDA_LIB=<ZLUDA_DIRECTORY>/libcuda.so LD_LIBRARY_PATH=<ZLUDA_DIRECTORY>/trace/ \
|
||||
ZLUDA_LOG_DIR=<LOG_DIRECTORY> <APPLICATION> <APPLICATION_ARGUMENTS>
|
||||
```
|
||||
|
||||
#### NVIDIA GPU:
|
||||
|
||||
```bash
|
||||
LD_LIBRARY_PATH=<ZLUDA_DIRECTORY>/trace/ \
|
||||
ZLUDA_LOG_DIR=<LOG_DIRECTORY> <APPLICATION> <APPLICATION_ARGUMENTS>
|
||||
```
|
||||
|
||||
If you're [filing a GitHub
|
||||
issue](https://github.com/vosen/ZLUDA/issues/new/choose), please create
|
||||
an archive with your logs from `<LOG_DIRECTORY>` and attach it to the issue:
|
||||
|
||||
|
||||
```bash
|
||||
tar -cvf logs.tar.gz -C <LOG_DIRECTORY> .
|
||||
```
|
||||
|
||||
### Windows
|
||||
|
||||
Run your application like this:
|
||||
|
||||
#### AMD GPU:
|
||||
|
||||
```bash
|
||||
export ZLUDA_CUDA_LIB=<ZLUDA_DIRECTORY>/nvcuda.dll
|
||||
export ZLUDA_LOG_DIR=<LOG_DIRECTORY>
|
||||
zluda_with.exe --nvcuda <ZLUDA_DIRECTORY>/zluda_trace.dll -- <APPLICATION> <APPLICATION_ARGUMENTS>
|
||||
```
|
||||
|
||||
#### NVIDIA GPU:
|
||||
|
||||
```bash
|
||||
export ZLUDA_LOG_DIR=<LOG_DIRECTORY>
|
||||
zluda_with.exe --nvcuda <ZLUDA_DIRECTORY>/zluda_trace.dll -- <APPLICATION> <APPLICATION_ARGUMENTS>
|
||||
```
|
||||
|
||||
If you're [filing a GitHub
|
||||
issue](https://github.com/vosen/ZLUDA/issues/new/choose), please create
|
||||
a .zip file with your logs from `<LOG_DIRECTORY>` to attach to the
|
||||
issue. In Windows Explorer, right click on `<LOG_DIRECTORY>` and select
|
||||
"Send to"/"Compressed (zipped) folder". Exact steps may vary between
|
||||
Windows versions.
|
||||
|
||||
### Explanation
|
||||
|
||||
#### `LD_LIBRARY_PATH=<ZLUDA_DIRECTORY>/trace/`
|
||||
|
||||
`<ZLUDA_DIRECTORY>` is the directory that contains the ZLUDA driver (`libcuda.so`
|
||||
and various other libraries). It will be `target/release` if you built from
|
||||
source, or `zluda` if you downloaded one of the release packages. `<ZLUDA_DIRECTORY>/trace`
|
||||
contains tracing shims for `libcuda.so` (zluda_trace) and other CUDA libraries.
|
||||
|
||||
> [!NOTE]
|
||||
> `LD_LIBRARY_PATH` is an environment variable used by `ld`, Linux's dynamic linker. It tells `ld` that when it's looking for a shared library – for example, `libcuda.so` – it should first look in a specific list of directories, before system paths. It's just like `PATH`, but for shared libraries instead of executables.
|
||||
|
||||
#### `ZLUDA_CUDA_LIB=<ZLUDA_DIRECTORY>/libcuda.so`
|
||||
|
||||
By default, zluda_trace will log all calls and then redirect them to an actual
|
||||
CUDA driver (`libcuda.so`). In order to use ZLUDA, `ZLUDA_CUDA_LIB` must be set
|
||||
to the `libcuda.so` provided by ZLUDA. If `ZLUDA_CUDA_LIB` is not set,
|
||||
zluda_trace will use NVIDIA’s `libcuda.so`.
|
||||
|
||||
#### `ZLUDA_LOG_DIR=<LOG_DIRECTORY>`
|
||||
|
||||
By default, zluda_trace prints logs to stderr. In order to save them to a
|
||||
file, as well as save other useful information, you must provide a
|
||||
directory that they should be saved in – for example, `/tmp/zluda`.
|
||||
|
||||
## Understanding the zluda_trace output
|
||||
|
||||
Let's look at the zluda_trace output for a simple application. Here's a
|
||||
CUDA program that adds two numbers on the GPU:
|
||||
|
||||
```cpp,linenos
|
||||
#include <iostream>
|
||||
|
||||
__global__ void add(int a, int b, int *out) {
|
||||
*out = a + b;
|
||||
}
|
||||
|
||||
int main() {
|
||||
int *result;
|
||||
cudaMallocManaged(&result, sizeof(int));
|
||||
add<<<1, 1>>>(1, 2, result);
|
||||
cudaDeviceSynchronize();
|
||||
std::cout << "result: " << *result << std::endl;
|
||||
cudaFree(result);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
I've saved this file as `add.cu`. ZLUDA doesn't successfully run this
|
||||
application yet, so I'll compile it and run it using zluda_trace and CUDA
|
||||
in order to demonstrate all of zluda_trace's features.
|
||||
|
||||
```bash
|
||||
nvcc add.cu -o add -arch sm_80
|
||||
LD_LIBRARY_PATH=~/ZLUDA/target/release/trace/ ZLUDA_TRACE_DIR=/tmp/zluda ./add
|
||||
```
|
||||
|
||||
The last few lines should look something like:
|
||||
|
||||
```
|
||||
[ZLUDA_TRACE] cuCtxSynchronize() -> CUDA_SUCCESS
|
||||
result: 3
|
||||
[ZLUDA_TRACE] {CONTEXT_LOCAL_STORAGE_INTERFACE_V0301}::context_local_storage_get(value: 0x562c764a73c0, cu_ctx: 0x0, key: 0x562c764ba130) -> CUDA_SUCCESS
|
||||
[ZLUDA_TRACE] cuMemFree_v2(dptr: 0x7f3ca2000000) -> CUDA_SUCCESS
|
||||
[ZLUDA_TRACE] {CONTEXT_LOCAL_STORAGE_INTERFACE_V0301}::context_local_storage_delete(context: 0x562c764ba760, key: 0x562c764ba130) -> CUDA_ERROR_DEINITIALIZED
|
||||
[ZLUDA_TRACE] cuLibraryUnload(library: 0x562c773ffb10) -> CUDA_ERROR_DEINITIALIZED
|
||||
[ZLUDA_TRACE] cuDevicePrimaryCtxRelease(dev: 0) -> CUDA_ERROR_DEINITIALIZED
|
||||
```
|
||||
|
||||
Now, let's take a look at our log directory:
|
||||
|
||||
```bash
|
||||
ls /tmp/zluda
|
||||
add
|
||||
```
|
||||
|
||||
zluda_trace creates a new directory for each run, based on the name of
|
||||
the command. If the `add` directory already existed, it'd create an `add_1`
|
||||
directory, and so on. Next, let's look at that newly-created directory:
|
||||
|
||||
```bash
|
||||
ls /tmp/zluda/add/
|
||||
log.txt module_0001_01.elf module_0001_02.ptx
|
||||
```
|
||||
|
||||
Let's take a look at each of these files.
|
||||
|
||||
### log.txt
|
||||
|
||||
```bash
|
||||
#no_wrap
|
||||
cat /tmp/zluda/add/log.txt
|
||||
# ...
|
||||
# cuModuleGetFunction(hfunc: 0x55ee94d645d0, hmod: 0x55ee94d63c40, name: "_Z3addiiPi") -> CUDA_SUCCESS
|
||||
# cuLaunchKernel(f: 0x55ee94d645d0, gridDimX: 1, gridDimY: 1, gridDimZ: 1, blockDimX: 1, blockDimY: 1, blockDimZ: 1, sharedMemBytes: 0, hStream: 0x0, kernelParams: 0x7fffe0fa193c, extra: NULL) -> CUDA_SUCCESS
|
||||
# {CONTEXT_LOCAL_STORAGE_INTERFACE_V0301}::context_local_storage_get(value: 0x55ee93e083c0, cu_ctx: 0x0, key: 0x55ee93e1b130) -> CUDA_SUCCESS
|
||||
# cuCtxSynchronize() -> CUDA_SUCCESS
|
||||
# {CONTEXT_LOCAL_STORAGE_INTERFACE_V0301}::context_local_storage_get(value: 0x55ee93e083c0, cu_ctx: 0x0, key: 0x55ee93e1b130) -> CUDA_SUCCESS
|
||||
# cuMemFree_v2(dptr: 0x7fbde6000000) -> CUDA_SUCCESS
|
||||
# {CONTEXT_LOCAL_STORAGE_INTERFACE_V0301}::context_local_storage_delete(context: 0x55ee93e1b760, key: 0x55ee93e1b130) -> CUDA_ERROR_DEINITIALIZED
|
||||
# cuLibraryUnload(library: 0x55ee94d60ae0) -> CUDA_ERROR_DEINITIALIZED
|
||||
# cuDevicePrimaryCtxRelease(dev: 0) -> CUDA_ERROR_DEINITIALIZED
|
||||
```
|
||||
|
||||
As you can see, this is the same log that was written to stderr. It
|
||||
records each call made to a CUDA library, the arguments it was passed,
|
||||
and the status code returned. Most of these will be calls that you can
|
||||
find in the NVIDIA documentation – for example,
|
||||
[`cuModuleGetFunction`](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1ga52be009b0d4045811b30c965e1cb2cf)
|
||||
– but some of them aren't publicly documented.
|
||||
|
||||
<!-- I've used a zero-width space around :: below. -->
|
||||
For example, look at the calls to
|
||||
`{CONTEXT_LOCAL_STORAGE_INTERFACE_V0301}::context_local_storage_get`.
|
||||
Calls with this format are to what we call NVIDIA's Dark API. We'll
|
||||
write more documentation for this later, but for now all you need to
|
||||
know are that these are from function pointer tables returned by
|
||||
`cuGetExportTable`.
|
||||
|
||||
We're looking at a very simple example, so it doesn't use any
|
||||
performance libraries. If you use zluda_trace for code calling one of
|
||||
NVIDIA's performance libraries, zluda_trace will log both the call to
|
||||
that library, and then all of the calls made by that library call. That
|
||||
looks like:
|
||||
|
||||
```
|
||||
cublasCreate_v2(handle: 0x55e502373120) -> CUBLAS_STATUS_SUCCESS
|
||||
cuGetProcAddress_v2(symbol: "", pfn: 0x0, cudaVersion: 0, flags: 0, symbolStatus: NULL) -> CUDA_ERROR_NOT_FOUND
|
||||
```
|
||||
|
||||
The call to `cublasCreate_v2` is making a call to `cuGetProcAddress_v2`.
|
||||
|
||||
### module_0001_01.elf
|
||||
|
||||
This is precompiled SASS assembly for a single GPU architecture.
|
||||
|
||||
### module_0001_02.ptx
|
||||
|
||||
This is PTX assembly that is portable across many NVIDIA GPUs.
|
||||
|
||||
```bash
|
||||
cat /tmp/zluda/add/module_0001_02.ptx
|
||||
# //
|
||||
# //
|
||||
# //
|
||||
# //
|
||||
# //
|
||||
# //
|
||||
#
|
||||
# .version 8.7
|
||||
# .target sm_80
|
||||
# .address_size 64
|
||||
#
|
||||
# //
|
||||
#
|
||||
# .visible .entry _Z3addiiPi(
|
||||
# .param .u32 _Z3addiiPi_param_0,
|
||||
# .param .u32 _Z3addiiPi_param_1,
|
||||
# .param .u64 _Z3addiiPi_param_2
|
||||
# )
|
||||
# {
|
||||
# .reg .b32 %r<4>;
|
||||
# .reg .b64 %rd<3>;
|
||||
#
|
||||
#
|
||||
# ld.param.u32 %r1, [_Z3addiiPi_param_0];
|
||||
# ld.param.u32 %r2, [_Z3addiiPi_param_1];
|
||||
# ld.param.u64 %rd1, [_Z3addiiPi_param_2];
|
||||
# cvta.to.global.u64 %rd2, %rd1;
|
||||
# add.s32 %r3, %r2, %r1;
|
||||
# st.global.u32 [%rd2], %r3;
|
||||
# ret;
|
||||
#
|
||||
# }
|
||||
```
|
||||
|
||||
This is the `add` function from `add.cu`. `_Z3addiiPi` is the `add(int, int, int*)` after [C++ name mangling](https://en.wikipedia.org/wiki/Name_mangling).
|
||||
|
||||
### Compiler logs
|
||||
|
||||
There's one more kind of file zluda_trace might produce: a compiler error
|
||||
log file. When zluda_trace encounters a PTX module, it tries to compile
|
||||
it with ZLUDA's PTX compiler. Any errors produced will be saved into a
|
||||
`module_NNNN_NN.log` file. For example, it might look like
|
||||
|
||||
```
|
||||
Unrecognized statement "nanosleep.u32 %r101;"
|
||||
```
|
||||
|
||||
We use this information to discover which PTX instructions are used by the
|
||||
application and not supported by ZLUDA.
|
Loading…
Add table
Add a link
Reference in a new issue