[Arm64 Linux kernel stability] 2. Kernel crash workflow

[Arm64 Linux kernel stability] 2. Kernel crash workflow

* Linux kernel: v6.1

Everytime the kernel crash occurs, we might observe various kernel crash log below.

[ 4211.686621] <1>Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 4211.686658] <1>Mem abort info:
[ 4211.686675] <1>  ESR = 0x96000045
[ 4211.686693] <1>  EC = 0x25: DABT (current EL), IL = 32 bits
[ 4211.686713] <1>  SET = 0, FnV = 0
[ 4211.686730] <1>  EA = 0, S1PTW = 0
[ 4211.686747] <1>  FSC = 0x05: level 1 translation fault
[ 4211.686765] <1>Data abort info:
[ 4211.686779] <1>  ISV = 0, ISS = 0x00000045
[ 4211.686795] <1>  CM = 0, WnR = 1
[ 4211.686812] <1>user pgtable: 4k pages, 39-bit VAs, pgdp=000000006cf0b000
[ 4211.686833] <1>[0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[ 4211.686887] <0>Internal error: Oops: 96000045 [#1] PREEMPT SMP
...
[ 4211.687380] <4>sp : ffffffc00932bcd0
[ 4211.687390] <4>x29: ffffffc00932bcd0 x28: ffffff805d66dac0 x27: 0000000000000000
[ 4211.687420] <4>x26: 0000000000000000 x25: ffffffd174e01b30 x24: ffffffc00932bde0
[ 4211.687445] <4>x23: 000000000000000a x22: ffffff806ceb3000 x21: ffffffd17506c6c0
[ 4211.687470] <4>x20: ffffffd174e01b70 x19: 0000000000000004 x18: 0000000000000000
[ 4211.687494] <4>x17: 0000000000000000 x16: 0000000000000000 x15: 00000055b19f3330
[ 4211.687518] <4>x14: 0000000000000000 x13: 4e4f495450454358 x12: 45207972746e6520
[ 4211.687543] <4>x11: 7463657269642067 x10: ffffffd175556778 x9 : ffffffd17499d104
[ 4211.687568] <4>x8 : 00000000ffffefff x7 : ffffffd1755ae778 x6 : ffffffd1755ae778
[ 4211.687593] <4>x5 : 0000000000000000 x4 : 0000000000000002 x3 : 0000000000000000
[ 4211.687616] <4>x2 : 0000000000000000 x1 : ffffff805d66dac0 x0 : 0000000000000000
...
[ 4211.687640] <4>Call trace: 
[ 4211.687651] <4> lkdtm_EXCEPTION+0x14/0x1c
[ 4211.687671] <4> direct_entry+0x128/0x1c0
[ 4211.687688] <4> full_proxy_write+0x68/0xbc
[ 4211.687710] <4> vfs_write+0xf8/0x2b0
[ 4211.687730] <4> ksys_write+0x70/0x100
[ 4211.687744] <4> __arm64_sys_write+0x24/0x30
[ 4211.687760] <4> invoke_syscall+0x50/0x120
[ 4211.687780] <4> el0_svc_common.constprop.0+0x68/0x124
[ 4211.687799] <4> do_el0_svc+0x30/0x9c
[ 4211.687816] <4> el0_svc+0x2c/0x90
[ 4211.687833] <4> el0t_64_sync_handler+0xa4/0x130
[ 4211.687850] <4> el0t_64_sync+0x1a0/0x1a4        

Looking into a lot of kernel crashes in Linux system, I figured out that workflow is the same. The key information is below;

(1) When 'exception' occurs in EL1(Linux kernel), Arm throws synchronous exception with updating cause of exception to ESR_EL1[31:26].

(2) Fault handler prints a set of register and stack trace. Then reset the system.

The detailed workflow of kernel crash is below;

[1]. When 'exception' occurs in the Linux kernel(EL1), Arm core performs the followings;

1.1: Remain the same exception level(EL1)

1.2: It branches to exception handler(VBAR_EL1 + 0x200) while generating synchronous exception.

[2]. From exception handler, the call to el1_sync, el1_sync_handler, ... , panic are made. Be note that ESR_EL1[31:26] holds cause of synchronous exception

[3]. In panic function, reset routine is executed to reset the system.

The subroutine of panic function is executed depending on the type of kernel image.

[3.1] Engineering Image: System is getting into panic mode.

[3.2] Release Image: System is reset.

Hope this post is helpful for debugging.



Meenakshi A.

Technologist & Believer in Systems for People and People for Systems

1 年

Thanks for the good ??

要查看或添加评论,请登录

Austin Kim的更多文章

社区洞察

其他会员也浏览了