Sujet : Calling conventions (particularly 32-bit ARM)
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.archDate : 06. Jan 2025, 14:57:51
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vlgngv$1ks4a$1@dont-email.me>
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
I'm trying to understand the reasoning behind some of the calling conventions used with 32-bit ARM. I work primarily with small embedded systems, so the efficiency of code on 32-bit Cortex-M devices is very important to me - good calling conventions make a big difference.
No doubt most people here know this already, but in summary these devices are a 32-bit load/store RISC architecture with 16 registers. R0-R3 and R12 are scratch/volatile registers, R4-R11 are preserved registers, R13 is the stack pointer, R14 is the link register and R15 is the program counter. For most Cortex-M cores, there is no super-scaling, out-of-order execution, speculative execution, etc., but instructions are pipelined.
The big problem I see is the registers used for returning values from functions. R0-R3 can all be used for passing arguments to functions, as 32-bit (or smaller) values, pointers, in pairs as 64-bit values, and as parts of structs.
But the ABI only allows returning a single 32-bit value in R0, or a scalar 64-bit value in R0:R1. If a function returns a non-scalar that is larger than 32-bit, the caller has to allocate space on the stack for the return type and pass a pointer to that space in R0.
To my mind, this is massively inefficient, especially when using structs that are made up of two 32-bit parts.
Is there any good reason why the ABI is designed with such limited register usage for returns? Newer ABIs like RISC-V 32-bit and x86_64 can at least use two registers for return values. Modern compilers are quite happy breaking structs into parts in individual registers - it's a /long/ time since they insisted that structs occupied a contiguous block of memory. Can anyone give me an explanation why return types can't simply use all the same registers that are available for argument passing?
I also think code would be a bit more efficient if there more registers available for parameter passing and as scratch registers - perhaps 6 would make more sense.
In more modern C++ programming, it's very practical to use types like std::optional<>, std::variant<>, std::expected<> and std::tuple<> as a way of dealing safely with status and multiple return values rather than using C-style error codes or passing manual pointers to return value slots. But the limited return registers adds significant overhead to small functions.
Are there good technical reasons for the conventions on 32-bit ARM? Or is this all just historical from the days when everything was an "int" and that's all anyone ever returned from functions?
Thanks for any pointers or explanations here.