Skip to content
This repository has been archived by the owner on Jan 30, 2024. It is now read-only.

Fix unwinding #383

Merged
merged 14 commits into from
Mar 15, 2023
Merged

Fix unwinding #383

merged 14 commits into from
Mar 15, 2023

Conversation

Urhengulas
Copy link
Member

@Urhengulas Urhengulas commented Feb 17, 2023

This PR "fixes" unwinding, by not relying on the sentinel stack frame anymore, but instead checking if unwinding ends in the Reset function. If unwinding ends in the Reset function, it is considered NOT corrupted; if it ends somewhere else, it is considered corrupted.

Fixes #382

Todo:

  • Is the Reset symbol always present? -> No, "an unmangled symbol named Reset is even less common than main"
  • Does any thumb bit need to be cleared? -> no
  • Test it with an corrupted stack -> not trivially possible from Rust

@Urhengulas Urhengulas requested a review from japaric February 17, 2023 16:29
@Urhengulas Urhengulas marked this pull request as ready for review February 17, 2023 16:55
@japaric
Copy link
Member

japaric commented Feb 21, 2023

an unmangled symbol named Reset is even less common than main (which at least is common in C programs). neither symbol is specified in the ABI of the cortex-m-rt crate so relying on either could lead to breakage in the future ("future" being any new patch version, i.e. 0.7.x).

on the other hand, _stack_start is part of the ABI of the cortex-m-rt crate so this idea may work at least for cortex-m-rt v0.7.x but the ABI could change in 0.8.0.

another way to figure out the address of the reset handler is to read the second entry in the vector table. this document states:

On system reset, the vector table is fixed at address 0x00000000.

thought I'm not sure that is specified in the Cortex-M ISA. I know STM32 devices use 0x2000_0000 as the start address of Flash memory so firmware put the vector table at that address, instead of at 0x0000_0000. It could be that the Flash memory at 0x2000_0000 is aliased at address 0x0000_0000 but I have not checked.

yet another way to get the address of the reset handler is to issue a reset-halt and read out the program counter (PC) register. I don't have a link to normative documentation but the Cortex-M ISA states that the program counter is set to the second entry of the vector table (the address of the reset handler) after a system reset. I think this runtime approach would be the sure way to get the address. after a reset-halt you could read the SP register and that would give you _stack_start as well.

@Urhengulas
Copy link
Member Author

Thank you for the answer @japaric. A few clarification questions:

  1. Are you suggesting doing both the _stack_start idea James and checking if we end up in the reset handler? Or are you suggesting that we do one or the other?
  2. Obtaining the reset handler during runtime sounds good. But as far as I understand it only gives me the address of the reset handler, not the size. In order to check if we end up inside of it I need both (as far as I understand). How can I get the size?

🥂

@Urhengulas
Copy link
Member Author

I've switched to James proposal of ending the unwinding depending on the _stack_start and that seems to work as well (see 3e7fe2d).

Atm I extract the stack start by reading the SP after a reset (as you've proposed). It seems to me that _stack_start is the same as the start of the active_ram_region we extract from the ELF. Is this just a coincidence, or always the case? If they are same we can just do one and not both.

@japaric
Copy link
Member

japaric commented Feb 23, 2023

Are you suggesting doing both the _stack_start idea James and checking if we end up in the reset handler? Or are you suggesting that we do one or the other?

I think it'd be good to check the stack pointer (SP) value against _stack_start during each step of the unwinding process. if the SP value ever goes higher than _stack_start then either the stack is corrupted or there's a bug in the unwinding logic; in either case we should stop the unwinding process.

Thinking more about it, I think it may be better not to use _stack_start == SP as the stop condition. reason being, the reset handler could adjust the stack pointer (e.g. to make it multiple of 4 or 8 bytes so that the next stack frame adheres to e.g. the AAPCS ABI) before jumping into the next function. the reset handler itself usually does not follow any ABI (as it's written in assembly) and could lack both a function prologue and enough CFI (call frame information) to let the unwinder compute the value of SP at the start of the reset handler so it may not be possible (for the unwinder) to compute that SP was equal to _stack_start at some point in time.

But as far as I understand it only gives me the address of the reset handler, not the size. In order to check if we end up inside of it I need both (as far as I understand)

correct.

. How can I get the size?

you should be able to iterate the symbol table to find the size (the API may be called symbols or symbol_table depending on whether you are using the xmas_elf or object library). each symbol entry has an associated start address and size (+) so you can compute the address range each symbol spans (start .. (start+size)). you can then use reset_handler_address_span.contains(current_pc) as the stop condition.

(+) should look bit like nm -CSn if you print the data

$ # first column is start address; second is size; numbers are in hexadecimal format
$ arm-none-eabi-nm -CSn some-elf
(..)
00000100 00000028 T Reset
(..)
0000013c 0000000a T main
00000148 00000028 t hello::__cortex_m_rt_main
00000170 0000000a T app::exit

you asked earlier:

Does any thumb bit need to be cleared?

you can pretty much always clear the thumb bit (bit 0) when comparing addresses. the start address of a function (as seen in the objdump output) must always be an even number; the value of PC will always be an even number.

values that eventually end up in an instructions like bl r0 ("branch into function whose address is stored in register R0") have to have the thumb bit set. as those values (effectively function pointer values) are stored in static variables, they are usually stored with the bit already set to avoid setting it at runtime. when you read the second entry of the vector table (either at runtime or from the rodata stored in the ELF) you'll get an odd number.

@Urhengulas
Copy link
Member Author

@japaric said:

Thinking more about it, I think it may be better not to use _stack_start == SP as the stop condition.

What then? That unwinding ends in the reset handler? This is what I implemented in 9f40539.

If this is what you mean we would have:

  • a successful stop if unwinding ends (!cfa_changed && !program_counter_changed) in the reset handler
  • unsuccessful stop if
    • unwinding stops outside of reset handler
    • SP > _stack_start

Correct?


@japaric said:

@Urhengulas said:

How can I get the size?

you should be able to iterate the symbol table to find the size

Didn't you say earlier that we cannot rely on the symbol table? I talking about following quote.

@japaric said:

an unmangled symbol named Reset is even less common than main (which at least is common in C programs). neither symbol is specified in the ABI of the cortex-m-rt crate

Or am I misunderstanding something? Can we figure out the mangled name of the symbol?

@Urhengulas Urhengulas requested review from japaric and removed request for japaric March 1, 2023 11:53
@Urhengulas
Copy link
Member Author

@japaric The PR is ready for review again. I think it makes the code oven messier than it already is, and I am working on a follow-up PR to clean things up. But I wouldn't do this in here.

elf: &Elf,
) -> anyhow::Result<(u32, Range<u32>)> {
let mut core = sess.core(0)?;
core.reset_and_halt(Duration::from_secs(5))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to do a reset halt here? there's another call to reset_and_halt in main.rs due to Canary::install so this may result in the device being reset twice, which should not be a problem to the operation of probe-run but it's preferable to not do double work.

perhaps the reset_and_halt can be do once before calling this function, then it can also be removed from canary::install

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address this in the follow-up clean-up PR, since I will change the order of operations there and also touch the canary

@Urhengulas
Copy link
Member Author

bors r=japaric

@bors
Copy link
Contributor

bors bot commented Mar 15, 2023

Build succeeded:

  • ci

@bors bors bot merged commit 8063d9d into main Mar 15, 2023
@bors bors bot deleted the fix-unwinding branch March 15, 2023 13:28
bors bot added a commit that referenced this pull request Mar 15, 2023
385: Update snapshot tests and test stack canary r=Urhengulas a=Urhengulas

This PR does two things:

1. It updates the test_elfs and the respective output.
2. It adds a new test_elf `overflow-no-flip-link` which triggers a stack overflow and does not have stack-overflow protection; therefore `probe-run` reports that data segments might be corrupted

Depends on #383; only the last two commits are specific to this PR.
Fixes #223.

Co-authored-by: Urhengulas <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider removing the need for a "sentinel stack frame" while unwinding
2 participants