Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM - dealing with large programs / global arrays - invalid constant (XXXX) after fixup #619

Open
tfaoliveira opened this issue Oct 16, 2023 · 6 comments · May be fixed by #921
Open

ARM - dealing with large programs / global arrays - invalid constant (XXXX) after fixup #619

tfaoliveira opened this issue Oct 16, 2023 · 6 comments · May be fixed by #921
Labels
arm Related to the arm port question

Comments

@tfaoliveira
Copy link
Member

I found the following issue while developing SHA256 for ARM. It isn't an issue right now for that particular implementation, but I will share my notes here as they might be of use and to see if there is a different way of handling these cases or if the language/compiler needs improvement.

Consider the following program:

param int L = 2042; // works for 2041, fails for 2042
u32[1] H = {1};

export fn main() -> reg u32
{
  inline int i;
  reg ptr u32[1] Hp;
  reg u32 t v;

  Hp = H;
  t = Hp[0];

  v = 0;
  for i=0 to L
  { v += t; }

  return v;
}

We have a global array H, from which we get the address into Hp and then use it once. Between Hp = H; and the glob_data: section, there are 1 load, 1 move, 2042 add instructions, and a pop instruction. When I try to run arm-linux-gnueabi-gcc -g -static -mcpu=cortex-m4 -o example example.s, where example.s contains the result of compiling the previous program, I get:

globals.s: Assembler messages:
globals.s:8: Error: invalid constant (101c) after fixup

The assembly instruction corresponding to Hp = H; is ADR r0, glob_data+0. For L = 2041, the program compiles and when observing the result of objdump one can see:

f60f 70fc       addw    r0, pc, #4092   ; 0xffc

Accordingly to p.197 of this manual it seems that the T2 encoding is used, and we get 12 bits (4095) to use (I compared the bits from f60f 70fc with the ones from the pdf).

Going back to L = 2042. The reported value by the compiler is 0x101c, corresponding to 4124 in decimal. The calculation goes as follows, if I'm not mistaken: 4 bytes for Hp = H;, 6 bytes for t = Hp[0]; v=0;, 4084 bytes corresponding to the for loop (2 bytes per instruction), and 2 bytes for the last pop. Total of 4096 bytes (not considering the assembly code before Hp = H;)

Due to the .p2align 5 directive before glob_data:, this seems to put glob_data at a distance of 4124 from the ADR instruction. Hence, this constant does not fit possible encodings for the instruction.


Questions:

  • Is it necessary to align global constants to 32 bytes? (I believe doing smaller alignments does not make a difference for this case, but it might help in other cases).
  • How to deal with large programs / large sets of global variables/arrays in ARM? (I imagine that branching to a code section closer to the data section could work, but I'm unsure if this is the best solution for this problem).

Some final notes regarding the SHA256 implementation:

  • The initial ARM implementation of SHA256 was obtained by copy-pasting the amd64 implementation with some minor adjustments (large implementation; previously described problem observed);
  • Since having a small implementation was the final goal, I wrote some temporary functions to initialize the corresponding constants directly into a stack array to have a working implementation;
  • Refactored the ARM implementation to be "small", which made the previous error disappear. Nonetheless, this issue may appear in future implementations.
@tfaoliveira tfaoliveira added question arm Related to the arm port labels Oct 16, 2023
@bgregoir
Copy link
Contributor

Hi Tiago,
We were aware of this problem. I remember discussing this with Vincent.
Unfortunately I don't remember the possible solution, maybe Vincent will remember them.
For the 32 bytes alignement, it is not necessary it was only a way to simplify the proof compiler.

@tfaoliveira
Copy link
Member Author

While I was developing this code, I tried something similar to the following to see if it would work:

param int L = 2042;
u32[1] H = {1};

fn _get_Hp() -> reg ptr u32[1]
{
  reg ptr u32[1] Hp;
  Hp = H;
  return Hp;
}

export fn test() -> reg u32
{
  inline int i;
  reg ptr u32[1] Hp;
  reg u32 t v;

  Hp = _get_Hp();
  t = Hp[0];

  v = 0;
  for i=0 to L
  { v += t; }

  return v;
}

The compiler did complain (as expected... but hope is the last to die):

typing error: Hp should be one of the paramaters

Providing Hp (unitialized...) did not work as well:

Fatal error: exception File "src/intervalGraphColoring.ml", line 43, characters 6-12: Assertion failed

The idea:

  • when starting to translate the mentioned code, from amd64, all internal functions were inlined;
  • this means that it was almost certain that a local function that would get the pointer would be next to the glob_data: section, hence, the ADR would work there;

Questions:

  • would it be possible for the compiler to support the example code in this message or something else that achieves the same goal? (I think I prefer something else, like a specific intrinsic; otherwise, I would imagine that such functions would need an annotation to instruct the compiler to place them in the right section of the code).

@bgregoir
Copy link
Contributor

I not sure that what I say is correct, but I think a systematic way of doing this kind of access, will be to replace
ADR r (.glob_data + i)
by
.L
.word (.glob_data + i)
ADR r .L
I am not sure this make sense ...

@sarranz
Copy link
Collaborator

sarranz commented Sep 12, 2024

I believe it should be

.L: .word (.glob_data + i)
LDR r, .L

otherwise we're taking the address of .L

@sarranz sarranz linked a pull request Oct 2, 2024 that will close this issue
@eponier
Copy link
Contributor

eponier commented Nov 7, 2024

I tried to explore a bit the solutions to this issue, and here are my findings.

First note that we have the problem both with LDR and ADR, depending on whether we want to access directly a constant or get its address.

When looking on the web, 2 solutions were mentioned:

  • placing the global closer to where it is loaded, potentially using .ltor to tell the assembler to create a "literal pool" and put some literals there
  • using movt/mow to compute it in two instructions

I found it strange that we cannot delegate the task of choosing the right assembly instruction to the assembler, just like we do with RISC-V and its pseudo-instructions. It turns out that we can:

  • there is a pseudo-instruction MOV32 that is understood only by ARM tool
  • there is a pseudo-instruction LDR r0, =constant that is understood only by GCC

Maybe the right solution is to rely on the pseudo-instructions, but that means we need to be able to produce different outputs depending on the toolchain used (ARM tool, GCC, maybe LLVM).

@eponier
Copy link
Contributor

eponier commented Nov 7, 2024

@sarranz What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arm Related to the arm port question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants