This implements an 8-bit front stencil buffer. Stencil operations are
SIMD optimized. LibGL changes include:
* New `glStencilMask` and `glStencilMaskSeparate` functions
* New context parameter `GL_STENCIL_CLEAR_VALUE`
Xash3D requires this otherwise it will crash with a jump to nullptr,
but it doesn't use it for anything interesting and just sets a default
value of ~0 during initialization.