Rate This Document
Findability
Accuracy
Completeness
Readability

Recovering from a Cold Update Fault

The cold update fault recovery function is added to hinicadm3. This function is disabled by default and is recommended for development and debugging.

  1. Enable the cold update fault recovery function when updating the user firmware.
    hinicadm3 updatefw -i hinic0 -f Hinic3_flash.bin -a hot -t npu -sn  # Method 1
    hinicadm3 updatefw -i hinic0 -f Hinic3_flash.bin -a cold -n  # Method 2

    If information similar to the following is displayed, the update is successful:

    Please do not remove driver or network device.
    Loading...
    Firmware update start: 2025-05-24 00:40:52
    [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] [100%][\]
    Firmware update finish:2025-05-24 00:41:32
    Firmware update time used: 40s
    Loading firmware image succeed.
    Set update active cfg succeed!
    Please reboot OS to take firmware effect.
  2. Collect the mpu_ram log.
    The function is enabled successfully if the following information is displayed:
    Set recovery enable success.
  3. If a fault (such as SSH connection failure or firmware device not found) occurs after the cold update, maintain power to the device for 12 minutes after the initial fault occurrence, then perform power cycle operations for a total of three times. On the fourth boot, the system will automatically roll back to the firmware in the backup region.
    • If the ping to the target device times out, the SSH connection may fail.
    • If the following information is displayed, the firmware device is not found.
      [root@localhost~]# hinicadm3 version -i hinic0
      Device name(hinic0) not exist.
      version command error(-6):Unknown device hinic0.
  4. Disable the cold update fault recovery function.
    hinicadm3 recovery -i hinic0 -s disable

    The function is disabled successfully if the following information is displayed:

    [root@localhost~]# hinicadm3 recovery -i hinic0 -s disable
    fw restore is disabled successfully.

If the device remains unrecognized after the fourth boot (including three power cycles), you are advised to uninstall the current SDK driver using rpm -e <driver-package-name>. According to the original power cycle procedure, perform three power cycles again. After the fourth boot, reinstall the SDK driver.