It became irrelevant in 952dfcd610, when the define was removed; now, the code the comment is referring to is in JitRegister.cpp, and oprofile is controlled by cmake.
My change in 867cd99 caused farcode to fill up much more than before.
Most games still ran fine, but certain games would have multiple
cache clears per minute, on top of being overall slow.
Maybe there's something prettier we can do about this than just
allocating more RAM, but we have the resource budget for making
Dolphin use more RAM, so I consider increasing the size of the
cache to be a good solution at least for the time being.
At least for Prince of Persia: The Forgotten Sands, 48 MB isn't
enough to stop the cache clears, but 64 MB is. (I only played the
game for a few minutes when testing, though.)
Being able to preserve the address register is useful for the
next commit, and W0 is the address register used for loads. Saving
the address register used for stores, W1, was already supported.
If a host register has been newly allocated for the destination
guest register, and the load triggers an exception, we must make
sure to not write the old value in the host register into ppcState.
This commit achieves this by not marking the register as dirty
until after the load is done.
We implement this by first rounding to nearest integer using the current
rouding mode, then converting this value from floating point to an integral
value.
Prefer using eax to isolate the sign bit. This saves a byte when the
destination ends up as r8-15, because those require a REX prefix.
Before:
41 8B C5 mov eax,r13d
41 C1 ED 1F shr r13d,1Fh
44 03 E8 add r13d,eax
41 D1 FD sar r13d,1
After:
41 8B C5 mov eax,r13d
C1 E8 1F shr eax,1Fh
44 03 E8 add r13d,eax
41 D1 FD sar r13d,1
GCC complains about float_emit being null when inlining
ByteswapAfterLoad into MMIOLoadToReg. ByteswapAfterLoad
does dereference float_emit, but only when passing FLAG_FLOAT,
which MMIOLoadToReg has an assert for and does not support.
Also cleaning up some unnecessarily specified namespaces while
I'm at it.
Reusing the slowmem handlers of existing blocks meshes badly
with reusing the empty space left when destroying blocks.
I don't think reusing slowmem handlers was much of a gain anyway.
This is done entirely through interpreter fallbacks. It would
probably be possible to implement this using host exception
handlers instead, but I think it would be a lot of complexity
for a rarely used feature, so let's not do it for now.
For performance reasons, there are two settings for this feature:
One setting which does enables just what True Crime: New York City
needs and one setting which enables it all. The latter makes
almost all float instructions fall back to the interpreter.