-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move autoreporting out of the temperature ISR #3180
Conversation
Code running in the temperature ISR needs to be fully reentrant, which is hard to track down. Move autoreporting to the main processing loop. This can make the autoreporting slower or pause at times, but removes the reentrant restriction, which allows us to use printf_P.
This makes autoreport work more consistently.
This avoids the "busy" output interleaving with regular output in very rare scenarios. We should focus in finding which calls are not using manage_inactivity() properly instead of working it around.
With the last commit, we also move the busy-protocol processing out of the temperature ISR. The call itself is safe but it can cause legitimate output to be broken if the ISR happens at the wrong time. The temperature ISR should ideally only manage temperature. |
The longpress function is currently called within the temperature ISR, which is bogus. Calling the longpress function at the wrong moment can corrupt the menu buffers. Move the call to the main loop by changing the logic slightly: - still sample the lcd buttons inside the temperature ISR, which keeps scrollong/pressing responsive, but... - set a flag to indicate that longpress has been triggered instead of calling the function directly - call the function on the next manage_inactivity run Combined with prusa3d#3180 this removes _most_ unsafe operations out of the ISR which can happen during a normal run (max/mintemp warnings are still an exception).
* Redraw "Preheating to load" in full when modified by other actions Check for lcd_draw_update to see if the lcd has been altered outside the function and redraw the screen if full. This fixes scenarios such as prusa3d#3129 where the "Card removed" message or the SD menu is entered from outside the function's control. This requires checking/resetting bFilamentWaitingFlag carefully to avoid re-entering this function _twice_ (raise_z_above will run the main loop until complete). * Fix the eeprom address documentation * Farm workaround M1 message Farmers want to abuse a bug from the previous firmware releases - they need to see the filename on the status screen instead of "Wait for user..." So we won't update the message in farm mode... * Allow D2 to be enabled selectively * Fix D2 to read the entire SRAM content Allow to read up to 0x21ff, which is the last byte of SRAM. Set default starting address to 0x200, which is the first true byte. 0x0-200 is mapped to register/io space and could cause issues when read from bi-directional registers. * Unify D2 and D3 handling Handle reading/writing memory using the same base functions in order to save space. * Implement reading XFLASH with D6 This requires expanding the dcode_core address type to 32bit type, thus enlarges the D2/D3 implementation as a result. Still allow to save all the original space if D6 is disabled, for now. * Remove PROGMEM handling from print_mem until D5 uses dcode_core Handling PROGMEM also requires a 32bit address type. * D[236]: remove "busy" messages while dumping, avoid WDT * D6: also hide declaration behind conditional * D6: add documentation * Move "xflash" include inside the conditional * Introduce "xflash_layout" to organize XFLASH's content Update the language code to use the new LANG_OFFSET definition and remove hard-coded flash sizes. * xflash: remove some duplication * xflash: add xflash_multipage_program and documentation Add a new high-level command to perform multipage writes quickly. * xfdump: implement dump-to-xflash functionality Update xflash_layout to include information about the dump, which sits at the end of xflash. * lang/fw-build.sh: check for language data size during build Ensure the language data always fits the reserved space in the XFLASH. The script *should* use the LANG_SIZE definition from "xflash_layout", which can be obtained by preprocessing the source code. At the moment though this step has been omitted since running arduino-builder to preprocess the source requires extra flags passed by build.sh. The size has been hard-coded (and it's unlikely to change given the content size is constant for the architecture). * Dcodes: add D20/D21/D22 to generate/read/clear dumps * Implement EMERGENCY_DUMP for offline analysis If EMERGENCY_DUMP is defined, crash and dump using the new xflash dump functionality instead of just continuing with an error message. When an emergency crash is stored, the first restart after a crash displays a message that debug data is available and to contact support to submit the crash for analysis. * Implement MENU_DUMP: offline memory dump from "Support" If MENU_DUMP is enabled, a new entry at the end of the "Support" menu is added that allows to dump memory for offline use. This allows to trigger a memory dump at any moment during regular usage (either idling or printing) and to recover the dump later even after a hardware reset. * Typo * xfdump: fix build with XFLASH_DUMP disabled * lang/fw-build.sh: fix padding calculation * config: add sanity checks for XFLASH_DUMP options * xfdump: reuse standard definitions for SRAM size/offset * Dcodes: fix daddr_t type when only XFLASH_DUMP is enabled * Document new applicable build options in the variant files Document, but don't enable them. Leave exiting functionality unchanged for now. * xfdump: defensive static checks to ensure dump location always fits * xfdump: fix size check * dcode_code: fix inverted define to print larger types * xfdump: fix another off-by-one static size check * xfdump_layout: add some comments * D6: remove option for unsupported models * Fix millis reference * WDR crash initial * Fix XFLASH_DUMP print_mem * Dump header as well * xfdump_full_dump_and_reset: set a guaranteed minimum WDT Just prior to dumping, reset the WDT to a known-safe (and not too long) interval that guarantees a complete dump. * Enable the "WDR reset" menu item in DEBUG_BUILD only * Move "WDR dump" inside EMERGENGENCY_DUMP * Rename dump_crash_source to dump_crash_reason * xfdump: report to the host that a dump is available As suggested by @3d-gussner, announce to the host that a dump is available for retrieval using an action "dump_available". Any kind of dump is announced (even if manually triggered). To avoid reading from xflash twice, remove some duplication and return the crash reason directly in xfdump_check_state(). * xfdump_erase: remove redundant XFLASH_SPI_ENTER() * Introduce STACK_GUARD_MARGIN in all variants Create a gap between the BSS and the stack guard. Set this gap (STACK_GUARD_MARGIN) to 32 bytes in all variants. The gap serves two purposes: - Detect a stack overflow earlier (falsely triggering in overtight situations is OK!), so that we can hopefully avoid smashing the heap and have a clean view during the dump. - Reserve spack space itself for the stack dumping machinery, which is going to grow the stack even further. Remove get_stack_guard_test_value() which was unused. * Fix usage of RAMEND RAMEND is the last valid address, not one-past as I expected it to be... * Implement an online crash dumper for MK2.5 boards When XFLASH is not available, allow users to request _online_ crash dumps by using D23 (since these require active user cooperation). Once enabled, instead of just rebooting, dump memory directly to the serial. As similarly done with EMERGENCY_DUMP, we have two features that can be enabled: EMERGENCY_SERIAL_DUMP: enables dumping on crash after being requested MENU_SERIAL_DUMP: allow triggering the same manually through the support menu. * Report crash also in MK2.5, fix stack_error abuse Rename EEPROM_CRASH_ACKNOWLEDGED to EEPROM_FW_CRASH_FLAG. Use EEPROM_FW_CRASH_FLAG to always set the last crash reason, which simplifies handling between the online/offline variants. Make stack_error safe, by setting the flag and restarting immediately, so that the error can be shown after restart. * Untangle a bit some recursive include mess * xflash_dump is now always required in all variants * Move stack checking to the temperature ISR Now that the stack_error function is truly minimal, we can check for stack errors much more frequently. Also move away stack_error from ultralcd to Marlin_main. * Move inclusion closer to the usage point * serial_dump_and_reset: do not completely disable WDT Set it to 8s which is long enough to complete the dump. * serial_dump_and_reset: turn on print fan while dumping To avoid scorching the sheet while dumping close to the bed. * Enable debugging features on all variants - XFlash crash dumper on MK3+ series - Online crash dumper on MK2.5+ series - D2/D6 on MK3+ series - D2 on MK2.5+ series * bad ISR catch * serial_dump: add description about bad_isr * Remove duplication in crash handlers It's kind of nice that all handlers eventually came to become the same. * emergency handlers: always save SP _at_ the crash location Save SP which is closest to the crash location, which simplifies debugging. For serial_dump, write SP just before the dump. For xfdump, save SP in the dump header. This makes xfdump_dump and xfdump_full_dump_and_reset() equivalent for stack debugging. * serial_dump: manipulate WDT just once * serial_dump_and_reset: do not call manage_heater with interrupts disabled Do not call manage_heater() in print_mem() if interrupts are already disabled. This means we're running inside the crash handler. * Add test code for the stack overflow handler * Fix last commit * xfdump: simplify stack debugging (sample pc+sp) Instead of having to guess the PC where the SP was sampled, always take both. This allows "seamless" stack decoding for both serial and xflash dumps, since we don't have to guess which function generated the dump. Make the core functions (doing the sampling) be ``noinline`` as well, so that they always have valid frame. * Handle Long-Press in the main loop The longpress function is currently called within the temperature ISR, which is bogus. Calling the longpress function at the wrong moment can corrupt the menu buffers. Move the call to the main loop by changing the logic slightly: - still sample the lcd buttons inside the temperature ISR, which keeps scrollong/pressing responsive, but... - set a flag to indicate that longpress has been triggered instead of calling the function directly - call the function on the next manage_inactivity run Combined with prusa3d#3180 this removes _most_ unsafe operations out of the ISR which can happen during a normal run (max/mintemp warnings are still an exception). * Add some warnings in lcd_buttons_update * GETPC: Do not manipulate the 32bit return address We can do that offline, saving over 30 bytes of instructions. * serial_dump: include hex prefix * Remove ignored/incorrect PROGMEM This PROGMEM is currently ignored by gcc, but even if it wasn't it wouldn't be correct since the following code is expecting to read "item" without fetching the array itself from PROGMEM. * Move autoreporting out of the temperature ISR Code running in the temperature ISR needs to be fully reentrant, which is hard to track down. Move autoreporting to the main processing loop. This can make the autoreporting slower or pause at times, but removes the reentrant restriction, which allows us to use printf_P. * Move host_autoreport() to manage_inactivity() This makes autoreport work more consistently. * Also move host_keepalive to manage_inactivity() This avoids the "busy" output interleaving with regular output in very rare scenarios. We should focus in finding which calls are not using manage_inactivity() properly instead of working it around. * Remove unnecessary assignment * Fix unused static declaration warnings Guard declarations using the appropriate defines * Fix misleading indentation warnings by expanding tabs * Clarify statement by adding extra braces * Silence explicit case-fallthru * Fix two new explicit case fallthru warnings * Fixed C++ bug * Use LCD_WIDTH instead of hardcoding 20 * optiboot_xflash comment about w25x20cl messages Mention supported ICs * Second attempt at retrieving the SN from the 32u2 IC * Minor fixes to SD presence handling (prusa3d#3139) * Remove forgotten function protorypes * Fix code indentation * Fix double sorting if SD card is inserted during setup() * Correctly handle SD removal during sorting * Fix spanish translation for MSG_UNLOAD_SUCCESSFUL (prusa3d#3185) Fix spanish transtalation for MSG_UNLOAD_SUCCESSFUL by: jfestrada <[email protected]> * Remove "bonus" exclamation points from the crash message * Fixup the DUMP_MAGIC constant * Use uint8_t consistently for the block buffer's index Instead of using a mixture of int8_t, unsigned char and (incorrectly) int, use uint8_t consistently for indexing the current block. This improves the performance of the wait loop in plan_buffer_line, which currently expands all comparisons to a word for no reason. This also extends the theoretical limit to 128 entries. Add some static assertions to ensure BLOCK_BUFFER_SIZE is correct. * Remove FW version parsing as it can be done at compile time. Code size dropped by >800 bytes. * xfdump: correctly erase all sectors in xfdump_erase * Remove useless function EEPROM_read_st * Work-around GCC LTO codegen bug in process_commands() When building with GCC 4.9.2 (bundled with PF-build-env-1.0.6.*), -Os and LTO enabled, PID_autotune gets automatically inlined into process_commands(). Sadly, due to the massive size of process_commands(), it results in codegen bug doing a partial stack overwrite in process_commands() itself, manifesting as random behavior depending on the timing of interrupts and the codepath taken inside the merged function. Mark the function as noinline and add a note about the affected compiler version in order to be checked again in the future. * write_command() no line number handling * Include "Dcodes.h" after "Marlin.h" for configuration This is needed in order to get the function prototypes right according to the actual enabled configuration. * Improve/fix D23 for M2.5/S printers - Move D23 into it's own function inside Dcodes - Correctly include a break in the switch statement - Show the dumper status (enabled/disabled) after toggling - Allow to generate an immediate dump via g-code using D23 E for symmetry with D20 E * Correctly read FW_VERSION_NR array from progmem In PR prusa3d#3093 the progmem array FW_VERSION_NR was introduced to store the version components, however the code didn't read it properly using the pgm_read_* functions, making version comparisons fail. Fix the existing/unused is_provided_version_newer() and reuse it in show_upgrade_dialog_if_version_newer(). Similarly also read/update correctly the version in the eeprom. * change boolean to bool * Change nrFiles from int16_t to uint16_t * Correct the C implementation for MultiU16X8toH16 The comment behind the ASM MultiU16X8toH16 was misleading. It actually computes ((a<<8)*b)>>16, or (a*b)>>8. Correct the comment and C reference implementation accordingly. * Use fabs() instead of abs() when using floats This saves 514 bytes of flash memory * Print an error on unknown D-codes This follows the same convention of M/G codes, so that the user knowns that the D-code has been either handled or ignored. * Remove spourious trailing whitespace in errors * Consolidate "Unknown X-Code" to save 16 bytes * Remove unused Sound_Save() function declaration * Remove unused function lcd_choose_color() * Improve mc_arc() parameters - Make the mc_arc() function declaration consistent with the definition - isclockwise is supposed to be bool type, given how it is used. * * Remove redundant externs already included with temperature.h * Add ifdefs in Dcodes.cpp when using extern variables * Remove useless extern in cmdqueue.cpp * lcd_change_fil_state has two extern's in Marlin.h, only one needed. * Remove redundant extern variable is_usb_printing from tmc2130.cpp This extern variable is included from Marlin.h * Remove redundant extern variable lcd_encoder from menu.cpp This extern variable is included from lcd.h Co-authored-by: Yuri D'Elia <[email protected]> Co-authored-by: Panayiotis-git <[email protected]> Co-authored-by: D.R.racer <[email protected]> Co-authored-by: DRracer <[email protected]> Co-authored-by: Voinea Dragos <[email protected]> Co-authored-by: Yuri D'Elia <[email protected]> Co-authored-by: Jonas Meyer <[email protected]> Co-authored-by: metacollin <[email protected]> Co-authored-by: jfestrada <[email protected]> Co-authored-by: Guðni Már Gilbert <[email protected]>
This code is responsible for my printer timing out in octoprint while using MMU2S, its not sending the keep-alive messages anymore like busy: processing and octoprint times out.
and after that point octoprint sends |
@knobik Thank you for reporting. Can you please upload an identical scenario as in the log above, but from serial.log of octoprint so that it has timestamps? Please also upload the gcode if possible. I assume version 3.10.1, MK3S with MMU2S (fw 1.0.6). Is this correct? |
@leptun yes that is correct. |
...
Because the printer is actually executing N16 instead of N17 (what octoprint thinks is happening), octoprint times out the N17 line because it gets no busy messages back, but rather heatup messages as is normal for M190. What plugins are you using? What are your octoprint serial settings (timeouts/behaviours/etc...) |
@leptun plugins: nothing special, just basic stuff: for timeouts, standard octoprint setup: same for behaviors, octoprint defaults: Problem started when i installed MMU2S and upgraded my i3 to 3.10.1 |
Are you sure this is related to fw 3.10.1? To me it looks like a bug in octoprint. Can you try downgrading the firmware to see if anything improves. I for one doubt anything will change |
@leptun might be related but its old: https://community.octoprint.org/t/prusa-mmu-t-not-working-via-octoprint/4726/3 |
@leptun shouldnt there be a |
For the record, yes, And OctoPrint sending |
Code running in the temperature ISR needs to be fully reentrant, which
is hard to track down.
Move autoreporting to the main processing loop. This can make the
autoreporting slower or pause at times, but removes the reentrant
restriction, which allows us to use printf_P.