Fix unwinding #383

Urhengulas · 2023-02-17T16:26:44Z

This PR "fixes" unwinding, by not relying on the sentinel stack frame anymore, but instead checking if unwinding ends in the Reset function. If unwinding ends in the Reset function, it is considered NOT corrupted; if it ends somewhere else, it is considered corrupted.

Fixes #382

Todo:

❌ ~~Is the Reset symbol always present?~~ -> No, "an unmangled symbol named Reset is even less common than main"
❌ ~~Does any thumb bit need to be cleared?~~ -> no
❌ ~~Test it with an corrupted stack~~ -> not trivially possible from Rust

japaric · 2023-02-21T17:20:20Z

an unmangled symbol named Reset is even less common than main (which at least is common in C programs). neither symbol is specified in the ABI of the cortex-m-rt crate so relying on either could lead to breakage in the future ("future" being any new patch version, i.e. 0.7.x).

on the other hand, _stack_start is part of the ABI of the cortex-m-rt crate so this idea may work at least for cortex-m-rt v0.7.x but the ABI could change in 0.8.0.

another way to figure out the address of the reset handler is to read the second entry in the vector table. this document states:

On system reset, the vector table is fixed at address 0x00000000.

thought I'm not sure that is specified in the Cortex-M ISA. I know STM32 devices use 0x2000_0000 as the start address of Flash memory so firmware put the vector table at that address, instead of at 0x0000_0000. It could be that the Flash memory at 0x2000_0000 is aliased at address 0x0000_0000 but I have not checked.

yet another way to get the address of the reset handler is to issue a reset-halt and read out the program counter (PC) register. I don't have a link to normative documentation but the Cortex-M ISA states that the program counter is set to the second entry of the vector table (the address of the reset handler) after a system reset. I think this runtime approach would be the sure way to get the address. after a reset-halt you could read the SP register and that would give you _stack_start as well.

Urhengulas · 2023-02-22T13:11:42Z

Thank you for the answer @japaric. A few clarification questions:

Are you suggesting doing both the _stack_start idea James and checking if we end up in the reset handler? Or are you suggesting that we do one or the other?
Obtaining the reset handler during runtime sounds good. But as far as I understand it only gives me the address of the reset handler, not the size. In order to check if we end up inside of it I need both (as far as I understand). How can I get the size?

🥂

Urhengulas · 2023-02-22T14:57:03Z

I've switched to James proposal of ending the unwinding depending on the _stack_start and that seems to work as well (see 3e7fe2d).

Atm I extract the stack start by reading the SP after a reset (as you've proposed). It seems to me that _stack_start is the same as the start of the active_ram_region we extract from the ELF. Is this just a coincidence, or always the case? If they are same we can just do one and not both.

japaric · 2023-02-23T15:09:37Z

Are you suggesting doing both the _stack_start idea James and checking if we end up in the reset handler? Or are you suggesting that we do one or the other?

I think it'd be good to check the stack pointer (SP) value against _stack_start during each step of the unwinding process. if the SP value ever goes higher than _stack_start then either the stack is corrupted or there's a bug in the unwinding logic; in either case we should stop the unwinding process.

Thinking more about it, I think it may be better not to use _stack_start == SP as the stop condition. reason being, the reset handler could adjust the stack pointer (e.g. to make it multiple of 4 or 8 bytes so that the next stack frame adheres to e.g. the AAPCS ABI) before jumping into the next function. the reset handler itself usually does not follow any ABI (as it's written in assembly) and could lack both a function prologue and enough CFI (call frame information) to let the unwinder compute the value of SP at the start of the reset handler so it may not be possible (for the unwinder) to compute that SP was equal to _stack_start at some point in time.

But as far as I understand it only gives me the address of the reset handler, not the size. In order to check if we end up inside of it I need both (as far as I understand)

correct.

. How can I get the size?

you should be able to iterate the symbol table to find the size (the API may be called symbols or symbol_table depending on whether you are using the xmas_elf or object library). each symbol entry has an associated start address and size (+) so you can compute the address range each symbol spans (start .. (start+size)). you can then use reset_handler_address_span.contains(current_pc) as the stop condition.

(+) should look bit like nm -CSn if you print the data

$ # first column is start address; second is size; numbers are in hexadecimal format
$ arm-none-eabi-nm -CSn some-elf
(..)
00000100 00000028 T Reset
(..)
0000013c 0000000a T main
00000148 00000028 t hello::__cortex_m_rt_main
00000170 0000000a T app::exit

you asked earlier:

Does any thumb bit need to be cleared?

you can pretty much always clear the thumb bit (bit 0) when comparing addresses. the start address of a function (as seen in the objdump output) must always be an even number; the value of PC will always be an even number.

values that eventually end up in an instructions like bl r0 ("branch into function whose address is stored in register R0") have to have the thumb bit set. as those values (effectively function pointer values) are stored in static variables, they are usually stored with the bit already set to avoid setting it at runtime. when you read the second entry of the vector table (either at runtime or from the rodata stored in the ELF) you'll get an odd number.

Urhengulas · 2023-02-27T15:09:36Z

@japaric said:

Thinking more about it, I think it may be better not to use _stack_start == SP as the stop condition.

What then? That unwinding ends in the reset handler? This is what I implemented in 9f40539.

If this is what you mean we would have:

a successful stop if unwinding ends (!cfa_changed && !program_counter_changed) in the reset handler
unsuccessful stop if
- unwinding stops outside of reset handler
- SP > _stack_start

Correct?

@japaric said:

@Urhengulas said:

How can I get the size?

you should be able to iterate the symbol table to find the size

Didn't you say earlier that we cannot rely on the symbol table? I talking about following quote.

@japaric said:

an unmangled symbol named Reset is even less common than main (which at least is common in C programs). neither symbol is specified in the ABI of the cortex-m-rt crate

Or am I misunderstanding something? Can we figure out the mangled name of the symbol?

Urhengulas · 2023-03-01T11:54:09Z

@japaric The PR is ready for review again. I think it makes the code oven messier than it already is, and I am working on a follow-up PR to clean things up. But I wouldn't do this in here.

src/main.rs

japaric · 2023-03-03T16:00:43Z

src/main.rs

+    elf: &Elf,
+) -> anyhow::Result<(u32, Range<u32>)> {
+    let mut core = sess.core(0)?;
+    core.reset_and_halt(Duration::from_secs(5))?;


is it necessary to do a reset halt here? there's another call to reset_and_halt in main.rs due to Canary::install so this may result in the device being reset twice, which should not be a problem to the operation of probe-run but it's preferable to not do double work.

perhaps the reset_and_halt can be do once before calling this function, then it can also be removed from canary::install

I will address this in the follow-up clean-up PR, since I will change the order of operations there and also touch the canary

src/backtrace/unwind.rs

Urhengulas · 2023-03-15T13:19:09Z

bors r=japaric

bors · 2023-03-15T13:28:29Z

Build succeeded:

ci

385: Update snapshot tests and test stack canary r=Urhengulas a=Urhengulas This PR does two things: 1. It updates the test_elfs and the respective output. 2. It adds a new test_elf `overflow-no-flip-link` which triggers a stack overflow and does not have stack-overflow protection; therefore `probe-run` reports that data segments might be corrupted Depends on #383; only the last two commits are specific to this PR. Fixes #223. Co-authored-by: Urhengulas <[email protected]>

Urhengulas requested a review from japaric February 17, 2023 16:29

Urhengulas marked this pull request as ready for review February 17, 2023 16:55

Urhengulas force-pushed the fix-unwinding branch from 3a8229b to 171bc07 Compare February 22, 2023 12:21

Urhengulas added 6 commits February 23, 2023 12:31

Extract address and size of the Reset symbol

db0bd93

Do not check for sentinal value during unwinding

c46ab28

Mark stack as intact if unwinding ends in Reset function

9f40539

Update CHANGELOG.md

e232ca8

WIP: end unwinding depending on stack pointer, not reset handler

320b976

Satisfy clippy

48c318a

Urhengulas force-pushed the fix-unwinding branch from 4fbec69 to 48c318a Compare February 23, 2023 11:32

Urhengulas mentioned this pull request Feb 23, 2023

Update snapshot tests and test stack canary #385

Merged

Undo accidental changes

a1d613b

Urhengulas added 5 commits February 27, 2023 16:20

Switch back to checking if we end in Reset handler as stop condition

51719de

Obtain the reset_range from the VTOR

b3b971a

Undo accidental changes

9c495aa

Clean up

c414909

Undo accidental changes

b83bb96

Urhengulas requested review from japaric and removed request for japaric March 1, 2023 11:53

japaric reviewed Mar 3, 2023

View reviewed changes

Urhengulas added 2 commits March 7, 2023 19:30

Assert that we have exactly one reset symbol

f8e3c6a

Remove .unwrap()

caba96f

Urhengulas mentioned this pull request Mar 7, 2023

Unwinding with -C force-frame-pointers=off #387

Open

Urhengulas requested a review from japaric March 7, 2023 18:53

japaric approved these changes Mar 15, 2023

View reviewed changes

bors bot merged commit 8063d9d into main Mar 15, 2023

bors bot deleted the fix-unwinding branch March 15, 2023 13:28

Urhengulas mentioned this pull request Jun 22, 2023

Implement --no-reset mode for live attach to a running device #414

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unwinding #383

Fix unwinding #383

Urhengulas commented Feb 17, 2023 •

edited

Loading

japaric commented Feb 21, 2023

Urhengulas commented Feb 22, 2023

Urhengulas commented Feb 22, 2023

japaric commented Feb 23, 2023

Urhengulas commented Feb 27, 2023

Urhengulas commented Mar 1, 2023

japaric Mar 3, 2023

Urhengulas Mar 7, 2023

Urhengulas commented Mar 15, 2023

bors bot commented Mar 15, 2023

Fix unwinding #383

Fix unwinding #383

Conversation

Urhengulas commented Feb 17, 2023 • edited Loading

japaric commented Feb 21, 2023

Urhengulas commented Feb 22, 2023

Urhengulas commented Feb 22, 2023

japaric commented Feb 23, 2023

Urhengulas commented Feb 27, 2023

Urhengulas commented Mar 1, 2023

japaric Mar 3, 2023

Choose a reason for hiding this comment

Urhengulas Mar 7, 2023

Choose a reason for hiding this comment

Urhengulas commented Mar 15, 2023

bors bot commented Mar 15, 2023

Urhengulas commented Feb 17, 2023 •

edited

Loading