시스템 구동 중 Thread Hangup 발생 이슈
페이지 정보
본문
Compute(Host) 서버의 메모리 사용율이 99%(Cache포함)를 넘은 상태에서 Compute에 I/O를 방생시키면
해당 Compute에 속한 VM의 system I/O CPU가 100%까지 증가하여 VM이 멈추고
systecm log인 messages 에 kernel: BUGLsoft lockup - CPU#2struck for 23이 출력되는 이슈 발생
---------------------------------------------------------------------------------
분석 결과 다수의 soft lockup이 발견됬으며, khugepaged 타스크에서 대부분 발생되고 있음
하이퍼바이저의 자원 오버커밋과 연관이 있을 가능성이 있는걸로 파악
문제 해결을 위해
- THP 기능의 사용을 비활성화하여 해당 코드의 동작을 제어, 하이퍼바이저의 자우너 사용량이 적절한 상황인지도 확인 필요
- 운용중인 커널버전이 다소 낮아 성능이 향상된 최신 버전의 커널 및 해키지로 시스템 업그레이드 고려 필요
- abrt 서비스의 지속적인 동작이 확인되어 성능에 영향으 미칠 가능성도 배제할 수없어 abrt 서비스를 중지하는 것도 고려 필요
관련문서
[1] How to disable transparent hugepages (THP) on Red Hat Enterprise Linux 7
https://access.redhat.com/solutions/1320153
[2] Kernel panics due to soft lockup. It is part of a Openstack "compute node", running under KVM as its hypervisor
https://access.redhat.com/solutions/2137691
로그 내용
---------------------------------------------------------------------------------
khugepaged 타스크가 hugepage 동작 중에 20 초 이상 지연되어 soft lockup 이 발생되었습니다.
[2219243.590785] BUG: soft lockup - CPU#6 stuck for 22s! [khugepaged:92]
[2219243.591882] Modules linked in: fuse btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc vfat fat virtio_balloon pcspkr crc32_pclmul ghash_clmulni_intel ppdev aesni_intel i2c_piix4 lrw gf128mul glue_helper ablk_helper parport_pc parport cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm ata_piix crct10dif_pclmul crct10dif_common drm crc32c_intel virtio_pci i2c_core serio_raw virtio_ring libata virtio floppy dm_mirror
[2219243.591939] dm_region_hash dm_log dm_mod
[2219243.591943] CPU: 6 PID: 92 Comm: khugepaged Tainted: G L ------------ 3.10.0-327.28.2.el7.x86_64 #1
[2219243.591944] Hardware name: Fedora Project OpenStack Nova, BIOS 0.5.1 01/01/2011
[2219243.591946] task: ffff880791178b80 ti: ffff8807911dc000 task.ti: ffff8807911dc000
[2219243.591947] RIP: 0010:[<ffffffff81300755>] [<ffffffff81300755>] copy_page_rep+0x5/0x10
[2219243.591954] RSP: 0018:ffff8807911dfd90 EFLAGS: 00010206
[2219243.591955] RAX: 000000042657b000 RBX: 000000006717b000 RCX: 0000000000000200
[2219243.591956] RDX: ffff880000000000 RSI: ffff8805b4fc8000 RDI: ffff88042657b000
[2219243.591957] RBP: ffff8807911dfe40 R08: 0000000000000048 R09: ffff8807bfb84500
[2219243.591958] R10: 000000000000009c R11: 0000000000000000 R12: 0000000010995ec0
[2219243.591959] R13: ffffea0016d3f200 R14: ffff8807911dc000 R15: ffff8800a5e53bd8
[2219243.591960] FS: 0000000000000000(0000) GS:ffff8807a1300000(0000) knlGS:0000000000000000
[2219243.591961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2219243.591962] CR2: 0000000085bbf006 CR3: 000000000194a000 CR4: 00000000001406e0
[2219243.591966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2219243.591967] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[2219243.591968] Stack:
[2219243.591969] ffffffff811ca53d 0000000067000000 00000000a5e53067 0000000067000000
[2219243.591973] ffff8807957497a8 ffffea0002971fb0 ffff8807911dfe60 ffff8800a5c7e9c0
[2219243.591976] ffff8807911dffd8 ffff8807911dffd8 ffff880035d96ab8 ffff880793c74870
[2219243.591979] Call Trace:
[2219243.591986] [<ffffffff811ca53d>] ? khugepaged_scan_mm_slot+0x9fd/0xc40
[2219243.591990] [<ffffffff811ca9d7>] khugepaged+0x257/0x480
[2219243.591994] [<ffffffff810a6b20>] ? wake_up_atomic_t+0x30/0x30
[2219243.591996] [<ffffffff811ca780>] ? khugepaged_scan_mm_slot+0xc40/0xc40
[2219243.592000] [<ffffffff810a5b2f>] kthread+0xcf/0xe0
[2219243.592002] [<ffffffff810a5a60>] ? kthread_create_on_node+0x140/0x140
[2219243.592006] [<ffffffff81646b98>] ret_from_fork+0x58/0x90
[2219243.592009] [<ffffffff810a5a60>] ? kthread_create_on_node+0x140/0x140
[2219243.592009] Code: b7 90 90 90 90 90 9c fa 65 48 3b 06 75 14 65 48 3b 56 08 75 0d 65 48 89 1e 65 48 89 4e 08 9d b0 01 c3 9d 30 c0 c3 b9 00 02 00 00 <f3> 48 a5 c3 0f 1f 80 00 00 00 00 eb ee 0f 1f 84 00 00 00 00 00
var/log/messages-20190923:Sep 21 12:52:22 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#2 stuck for 23s! [khugepaged:92]
var/log/messages-20190923:Sep 21 12:55:01 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#7 stuck for 22s! [sopagt:17813]
var/log/messages-20190923:Sep 21 12:55:02 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#7 stuck for 22s! [sopagt:17813]
...
var/log/messages-20190923:Sep 23 02:35:27 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#3 stuck for 45s! [cpd:25642]
var/log/messages-20190923:Sep 23 02:35:54 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#4 stuck for 23s! [khugepaged:92]
var/log/messages-20190923:Sep 23 03:00:42 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#2 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 11:04:42 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:05:22 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#1 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:07:58 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#5 stuck for 23s! [khugepaged:92]
var/log/messages:Sep 23 11:23:38 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:25:50 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#0 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:45:31 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#6 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 25 13:30:10 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#2 stuck for 23s! [supv:12188]
Sep 25 13:30:36 csgn-2-cp-3 sh: abrt-dump-oops: Found oopses: 1
Sep 25 13:30:36 csgn-2-cp-3 sh: abrt-dump-oops: Creating problem directories
Sep 25 13:30:36 csgn-2-cp-3 sh: abrt-dump-oops: Not going to make dump directories world readable because PrivateReports is on
Sep 25 13:30:37 csgn-2-cp-3 abrt-dump-oops: Reported 1 kernel oopses to Abrt
Sep 25 13:31:32 csgn-2-cp-3 python: detected unhandled Python exception in '/usr/sbin/sosreport'
Sep 25 13:31:36 csgn-2-cp-3 abrt-server: Lock file '/var/spool/abrt/post-create.lock' is locked by process 22904
Sep 25 13:31:37 csgn-2-cp-3 abrt-server: Lock file '/var/spool/abrt/post-create.lock' is locked by process 22904
Sep 25 13:31:38 csgn-2-cp-3 abrt-server: Lock file '/var/spool/abrt/post-create.lock' is locked by process 22904
DMIDECODE
BIOS:
Vend: Seabios Vers: 0.5.1 Date: 01/01/2011 BIOS Rev: 1.0 FW Rev:
Mfr: Fedora Project Prod: OpenStack Nova Vers: 12.0.4-1.el7
CPU:
10 of 10 CPU sockets populated, 0 cores/0 threads per CPU
10 Intel Core Processor (Haswell) (flags: aes,constant_tsc,lm,nx,pae,rdrand)
Memory:
Total: 30720 MiB (30 GiB)
OS
OS
Hostname: csgn-2-cp-3
Distro: [redhat-release] Red Hat Enterprise Linux Server release 7.2 (Maipo)
Kernel:
Booted kernel: 3.10.0-327.28.2.el7.x86_64
Booted kernel cmdline:
root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8
Taint-check: 16384 (see https://access.redhat.com/solutions/40594)
14 SOFTLOCKUP: A soft lockup has previously occurred
- - - - - - - - - - - - - - - - - - -
Sys time: Wed Sep 25 13:34:48 KST 2019
Boot time: Wed Aug 28 10:18:26 UTC 2019 (epoch: 1566987506)
Uptime: 27 days, 18:16, 7 users
LoadAvg: [10 CPU] 10.06 (101%), 10.44 (104%), 13.30 (133%)
KDUMP CONFIG
kexec-tools rpm version:
kexec-tools-2.0.7-38.el7_2.1.x86_64
Service enablement:
UNIT STATE
kdump.service enabled
kdump initrd/initramfs:
18318863 Aug 20 2018 initramfs-3.10.0-327.28.2.el7.x86_64kdump.img
18211727 Aug 11 2016 initramfs-3.10.0-327.el7.x86_64kdump.img
Memory reservation config:
/proc/cmdline { crashkernel=auto }
GRUB default { crashkernel=auto }
Actual memory reservation per /proc/iomem:
2a000000-341fffff : Crash kernel
kdump.conf:
path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31
kdump.conf "path" available space:
System MemTotal (uncompressed core size) { 29.29 GiB }
Available free space on target path's fs { 38.31 GiB } (fs=/)
Panic sysctls:
kernel.sysrq [bitmask] = "16" (see proc man page)
kernel.panic [secs] = 0 (no autoreboot on panic)
kernel.hung_task_panic = 0
kernel.panic_on_oops = 1
kernel.panic_on_io_nmi = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.panic_on_stackoverflow = 0
kernel.softlockup_panic = 0
kernel.unknown_nmi_panic = 0
kernel.nmi_watchdog = 1
vm.panic_on_oom [0-2] = 0 (no panic)
MEMORY
Stats graphed as percent of MemTotal:
MemUsed ▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊.................................. 31.4%
Buffers .................................................. 0.1%
Cached ▊▊▊▊▊▊▊........................................... 13.6%
HugePages .................................................. 0.0%
Dirty .................................................. 0.3%
RAM:
29.3 GiB total ram
9.2 GiB (31%) used
5.2 GiB (18%) used excluding Buffers/Cached
0.08 GiB (0%) dirty
HugePages:
No ram pre-allocated to HugePages
THP:
260096 kB allocated to THP
LowMem/Slab/PageTables/Shmem:
0.57 GiB (2%) of total ram used for Slab
0.02 GiB (0%) of total ram used for PageTables
0.16 GiB (1%) of total ram used for Shmem
Swap:
0 GiB (0%) used of 1 GiB total
SYSCTLS
kernel.
hostname = "csgn-2-cp-3"
osrelease = "3.10.0-327.28.2.el7.x86_64"
tainted = "16384" (see https://access.redhat.com/solutions/40594)
14 SOFTLOCKUP: A soft lockup has previously occurred
hung_task_panic [bool] = "0"
hung_task_timeout_secs = "120" (secs task must be D-state to trigger)
hung_task_warnings [num_warnings] = "0" (warnings disabled, either intentionally or after original num_warnings reached)
vm.
dirty_ratio = "30" (% of total system memory)
dirty_background_ratio = "10" (% of total system memory)
dirty_expire_centisecs = "3000"
dirty_writeback_centisecs = "500"
max_map_count = "65530"
min_free_kbytes = "67584"
swappiness [0-100] = "30"
vfs_cache_pressure [0-100] = "100"
해당 Compute에 속한 VM의 system I/O CPU가 100%까지 증가하여 VM이 멈추고
systecm log인 messages 에 kernel: BUGLsoft lockup - CPU#2struck for 23이 출력되는 이슈 발생
---------------------------------------------------------------------------------
분석 결과 다수의 soft lockup이 발견됬으며, khugepaged 타스크에서 대부분 발생되고 있음
하이퍼바이저의 자원 오버커밋과 연관이 있을 가능성이 있는걸로 파악
문제 해결을 위해
- THP 기능의 사용을 비활성화하여 해당 코드의 동작을 제어, 하이퍼바이저의 자우너 사용량이 적절한 상황인지도 확인 필요
- 운용중인 커널버전이 다소 낮아 성능이 향상된 최신 버전의 커널 및 해키지로 시스템 업그레이드 고려 필요
- abrt 서비스의 지속적인 동작이 확인되어 성능에 영향으 미칠 가능성도 배제할 수없어 abrt 서비스를 중지하는 것도 고려 필요
관련문서
[1] How to disable transparent hugepages (THP) on Red Hat Enterprise Linux 7
https://access.redhat.com/solutions/1320153
[2] Kernel panics due to soft lockup. It is part of a Openstack "compute node", running under KVM as its hypervisor
https://access.redhat.com/solutions/2137691
로그 내용
---------------------------------------------------------------------------------
khugepaged 타스크가 hugepage 동작 중에 20 초 이상 지연되어 soft lockup 이 발생되었습니다.
[2219243.590785] BUG: soft lockup - CPU#6 stuck for 22s! [khugepaged:92]
[2219243.591882] Modules linked in: fuse btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc vfat fat virtio_balloon pcspkr crc32_pclmul ghash_clmulni_intel ppdev aesni_intel i2c_piix4 lrw gf128mul glue_helper ablk_helper parport_pc parport cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm ata_piix crct10dif_pclmul crct10dif_common drm crc32c_intel virtio_pci i2c_core serio_raw virtio_ring libata virtio floppy dm_mirror
[2219243.591939] dm_region_hash dm_log dm_mod
[2219243.591943] CPU: 6 PID: 92 Comm: khugepaged Tainted: G L ------------ 3.10.0-327.28.2.el7.x86_64 #1
[2219243.591944] Hardware name: Fedora Project OpenStack Nova, BIOS 0.5.1 01/01/2011
[2219243.591946] task: ffff880791178b80 ti: ffff8807911dc000 task.ti: ffff8807911dc000
[2219243.591947] RIP: 0010:[<ffffffff81300755>] [<ffffffff81300755>] copy_page_rep+0x5/0x10
[2219243.591954] RSP: 0018:ffff8807911dfd90 EFLAGS: 00010206
[2219243.591955] RAX: 000000042657b000 RBX: 000000006717b000 RCX: 0000000000000200
[2219243.591956] RDX: ffff880000000000 RSI: ffff8805b4fc8000 RDI: ffff88042657b000
[2219243.591957] RBP: ffff8807911dfe40 R08: 0000000000000048 R09: ffff8807bfb84500
[2219243.591958] R10: 000000000000009c R11: 0000000000000000 R12: 0000000010995ec0
[2219243.591959] R13: ffffea0016d3f200 R14: ffff8807911dc000 R15: ffff8800a5e53bd8
[2219243.591960] FS: 0000000000000000(0000) GS:ffff8807a1300000(0000) knlGS:0000000000000000
[2219243.591961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2219243.591962] CR2: 0000000085bbf006 CR3: 000000000194a000 CR4: 00000000001406e0
[2219243.591966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2219243.591967] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[2219243.591968] Stack:
[2219243.591969] ffffffff811ca53d 0000000067000000 00000000a5e53067 0000000067000000
[2219243.591973] ffff8807957497a8 ffffea0002971fb0 ffff8807911dfe60 ffff8800a5c7e9c0
[2219243.591976] ffff8807911dffd8 ffff8807911dffd8 ffff880035d96ab8 ffff880793c74870
[2219243.591979] Call Trace:
[2219243.591986] [<ffffffff811ca53d>] ? khugepaged_scan_mm_slot+0x9fd/0xc40
[2219243.591990] [<ffffffff811ca9d7>] khugepaged+0x257/0x480
[2219243.591994] [<ffffffff810a6b20>] ? wake_up_atomic_t+0x30/0x30
[2219243.591996] [<ffffffff811ca780>] ? khugepaged_scan_mm_slot+0xc40/0xc40
[2219243.592000] [<ffffffff810a5b2f>] kthread+0xcf/0xe0
[2219243.592002] [<ffffffff810a5a60>] ? kthread_create_on_node+0x140/0x140
[2219243.592006] [<ffffffff81646b98>] ret_from_fork+0x58/0x90
[2219243.592009] [<ffffffff810a5a60>] ? kthread_create_on_node+0x140/0x140
[2219243.592009] Code: b7 90 90 90 90 90 9c fa 65 48 3b 06 75 14 65 48 3b 56 08 75 0d 65 48 89 1e 65 48 89 4e 08 9d b0 01 c3 9d 30 c0 c3 b9 00 02 00 00 <f3> 48 a5 c3 0f 1f 80 00 00 00 00 eb ee 0f 1f 84 00 00 00 00 00
var/log/messages-20190923:Sep 21 12:52:22 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#2 stuck for 23s! [khugepaged:92]
var/log/messages-20190923:Sep 21 12:55:01 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#7 stuck for 22s! [sopagt:17813]
var/log/messages-20190923:Sep 21 12:55:02 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#7 stuck for 22s! [sopagt:17813]
...
var/log/messages-20190923:Sep 23 02:35:27 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#3 stuck for 45s! [cpd:25642]
var/log/messages-20190923:Sep 23 02:35:54 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#4 stuck for 23s! [khugepaged:92]
var/log/messages-20190923:Sep 23 03:00:42 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#2 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 11:04:42 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:05:22 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#1 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:07:58 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#5 stuck for 23s! [khugepaged:92]
var/log/messages:Sep 23 11:23:38 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:25:50 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#0 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:45:31 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#6 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 25 13:30:10 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#2 stuck for 23s! [supv:12188]
Sep 25 13:30:36 csgn-2-cp-3 sh: abrt-dump-oops: Found oopses: 1
Sep 25 13:30:36 csgn-2-cp-3 sh: abrt-dump-oops: Creating problem directories
Sep 25 13:30:36 csgn-2-cp-3 sh: abrt-dump-oops: Not going to make dump directories world readable because PrivateReports is on
Sep 25 13:30:37 csgn-2-cp-3 abrt-dump-oops: Reported 1 kernel oopses to Abrt
Sep 25 13:31:32 csgn-2-cp-3 python: detected unhandled Python exception in '/usr/sbin/sosreport'
Sep 25 13:31:36 csgn-2-cp-3 abrt-server: Lock file '/var/spool/abrt/post-create.lock' is locked by process 22904
Sep 25 13:31:37 csgn-2-cp-3 abrt-server: Lock file '/var/spool/abrt/post-create.lock' is locked by process 22904
Sep 25 13:31:38 csgn-2-cp-3 abrt-server: Lock file '/var/spool/abrt/post-create.lock' is locked by process 22904
DMIDECODE
BIOS:
Vend: Seabios Vers: 0.5.1 Date: 01/01/2011 BIOS Rev: 1.0 FW Rev:
Mfr: Fedora Project Prod: OpenStack Nova Vers: 12.0.4-1.el7
CPU:
10 of 10 CPU sockets populated, 0 cores/0 threads per CPU
10 Intel Core Processor (Haswell) (flags: aes,constant_tsc,lm,nx,pae,rdrand)
Memory:
Total: 30720 MiB (30 GiB)
OS
OS
Hostname: csgn-2-cp-3
Distro: [redhat-release] Red Hat Enterprise Linux Server release 7.2 (Maipo)
Kernel:
Booted kernel: 3.10.0-327.28.2.el7.x86_64
Booted kernel cmdline:
root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8
Taint-check: 16384 (see https://access.redhat.com/solutions/40594)
14 SOFTLOCKUP: A soft lockup has previously occurred
- - - - - - - - - - - - - - - - - - -
Sys time: Wed Sep 25 13:34:48 KST 2019
Boot time: Wed Aug 28 10:18:26 UTC 2019 (epoch: 1566987506)
Uptime: 27 days, 18:16, 7 users
LoadAvg: [10 CPU] 10.06 (101%), 10.44 (104%), 13.30 (133%)
KDUMP CONFIG
kexec-tools rpm version:
kexec-tools-2.0.7-38.el7_2.1.x86_64
Service enablement:
UNIT STATE
kdump.service enabled
kdump initrd/initramfs:
18318863 Aug 20 2018 initramfs-3.10.0-327.28.2.el7.x86_64kdump.img
18211727 Aug 11 2016 initramfs-3.10.0-327.el7.x86_64kdump.img
Memory reservation config:
/proc/cmdline { crashkernel=auto }
GRUB default { crashkernel=auto }
Actual memory reservation per /proc/iomem:
2a000000-341fffff : Crash kernel
kdump.conf:
path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31
kdump.conf "path" available space:
System MemTotal (uncompressed core size) { 29.29 GiB }
Available free space on target path's fs { 38.31 GiB } (fs=/)
Panic sysctls:
kernel.sysrq [bitmask] = "16" (see proc man page)
kernel.panic [secs] = 0 (no autoreboot on panic)
kernel.hung_task_panic = 0
kernel.panic_on_oops = 1
kernel.panic_on_io_nmi = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.panic_on_stackoverflow = 0
kernel.softlockup_panic = 0
kernel.unknown_nmi_panic = 0
kernel.nmi_watchdog = 1
vm.panic_on_oom [0-2] = 0 (no panic)
MEMORY
Stats graphed as percent of MemTotal:
MemUsed ▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊.................................. 31.4%
Buffers .................................................. 0.1%
Cached ▊▊▊▊▊▊▊........................................... 13.6%
HugePages .................................................. 0.0%
Dirty .................................................. 0.3%
RAM:
29.3 GiB total ram
9.2 GiB (31%) used
5.2 GiB (18%) used excluding Buffers/Cached
0.08 GiB (0%) dirty
HugePages:
No ram pre-allocated to HugePages
THP:
260096 kB allocated to THP
LowMem/Slab/PageTables/Shmem:
0.57 GiB (2%) of total ram used for Slab
0.02 GiB (0%) of total ram used for PageTables
0.16 GiB (1%) of total ram used for Shmem
Swap:
0 GiB (0%) used of 1 GiB total
SYSCTLS
kernel.
hostname = "csgn-2-cp-3"
osrelease = "3.10.0-327.28.2.el7.x86_64"
tainted = "16384" (see https://access.redhat.com/solutions/40594)
14 SOFTLOCKUP: A soft lockup has previously occurred
hung_task_panic [bool] = "0"
hung_task_timeout_secs = "120" (secs task must be D-state to trigger)
hung_task_warnings [num_warnings] = "0" (warnings disabled, either intentionally or after original num_warnings reached)
vm.
dirty_ratio = "30" (% of total system memory)
dirty_background_ratio = "10" (% of total system memory)
dirty_expire_centisecs = "3000"
dirty_writeback_centisecs = "500"
max_map_count = "65530"
min_free_kbytes = "67584"
swappiness [0-100] = "30"
vfs_cache_pressure [0-100] = "100"
- 이전글segfault log 분석 확인 요청 20.10.20
- 다음글call trace 관련 log 분석 20.10.20
댓글목록
등록된 댓글이 없습니다.