Sujet : Re: Calling conventions (particularly 32-bit ARM)
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 09. Jan 2025, 01:11:08
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <a318d8af91ca17885939aede2007871c@www.novabbs.org>
References : 1 2 3 4 5
User-Agent : Rocksolid Light
On Wed, 8 Jan 2025 23:20:43 +0000, Stefan Monnier wrote:
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
My 6600 ABI was designed for C, but is compatible with Fortran and
C++ {and I suspect most languages--under the assumption that those
languages have to clean up their own messes*}.
(*) C++ has to drop "stuff" on the stack so that it can properly
deallocate new structures when Try-Throw-Catch is performing walk
backs, and to utilize that "stack stuff" when searching for the
right exception block.
When C calls Fortran and Fortran is expecting an array, C has
to build the dope vector used by Fortran in accessing said array.
Any calling convention is pressed on both sides--more argument registers
and more callee-save registers--but the number of registers if fixed.
I can agree that it's important to support those use-cases (varargs
obviously, mismatched arg numbers less so), but I think the focus of
optimization of the ABI should be calls to functions known to take the
exact same number of arguments (after all, even in C we normally know
the prototype of the called function; only sloppy ancient C calls
functions without proper declarations), even if it comes at the cost of
using different calling conventions for the two cases.
In My 66000 ABI varargs takes one more Prologue instructions as
a non-varargs subroutine and creates a vector of DW arguments
which can be picked off with va_list = SP; va_start = 0,
and va_arg(va_list,arg) = LD Rd,[va_list,Rarg<<3];
One of the key reasons to have a unified register model.
But in any case, I suspect there are also diminishing returns at some
point: how much faster is it in practice to pass/return 13 values in
registers instead of 8 of them in registers and the remaining 5 on
the stack?
Back when we looked at this in mid 1990s, using more registers for
arguments (than the 8 we were using) was "well down" the low hanging
fruit.