-
Notifications
You must be signed in to change notification settings - Fork 5.3k
JIT: Use GT_NULLCHECK for all unused indirections on XARC #123314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR optimizes unused indirections on XARCH by transforming them into null checks regardless of whether the address is contained. This matches the behavior on ARM64 and improves codegen by avoiding unnecessary register allocations. Fixes dotnet#123302
|
Tagging subscribers to this area: @JulieLeeMSFT, @dotnet/jit-contrib |
…on XArch This fix ensures that the address operand of a GT_NULLCHECK is not marked as contained on XArch during lowering. This is required because GT_NULLCHECK is emitted as a comparison instruction that expects its operand to be in a register, preventing an LSRA assertion failure.
src/coreclr/jit/lower.cpp
Outdated
| { | ||
| if (ind->Addr()->isContained()) | ||
| { | ||
| ind->Addr()->clearContained(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change will just lead to an extra instruction to populate address of the nullcheck to a separate register and will lead to a massive size regression in SPMI diffs (you will see it once the code is fixed, e.g. clearContained -> ClearContained, but likely will require more changes as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @EgorBo! You're right about ClearContained potentially increasing the code size by forcing an extra register allocation.
My goal was to avoid the assertion failure when converting unused GT_IND to GT_NULLCHECK on x64, as it requires the address to be un-contained in some cases.
Regarding the SPMI diffs, I'll fix the typo to ClearContained and check the results. However, do you think we should only perform this transformation if the address is already not contained? Or is there a way to emit the GT_NULLCHECK while keeping the address contained on XARCH?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is there a way to emit the GT_NULLCHECK while keeping the address contained on XARCH
the way GT_NULLCHECK is implemented today - it requires a non-contained operand. It probably can support contained if you implement it in genCodeForNullCheck on all targets (where you want to keep it).
I wonder if we can eliminate the GT_NULLCHECK node entirely (at least for post-Lower phases), but that will require some LSRA work
- Avoid converting unused GT_IND to GT_NULLCHECK on XARCH if the address is contained to prevent code size regression. - Fix typo: clearContained -> ClearContained. - Ensure the indirection remains a GT_IND with UnusedValue set when nullcheck conversion is skipped.
This change optimizes unused GT_IND/GT_BLK nodes by converting them to GT_NULLCHECK when the value is not required. Specifically on XARCH, this prevents the emitter from generating unnecessary register loads and sign-extension instructions (such as movsx). By using a null check, we can emit a simpler cmp or test instruction that doesn't require a target register, thereby reducing register pressure and decreasing code size. Key changes: Conversion logic added to Lowering::TransformUnusedIndirection. Ensures ClearContained() is called on the address to meet GT_NULLCHECK requirements. Clears DontExtend flag on XARCH to avoid stale extension constraints. Fixes inefficient codegen reported in dotnet#123302.
This PR optimizes unused indirections on XARCH by transforming them into null checks regardless of whether the address is contained. This matches the behavior on ARM64 and improves codegen by avoiding unnecessary register allocations.
Fixes #123302