시스템 구동 중 Thread Hangup 발생 이슈 > OS

본문 바로가기
사이트 내 전체검색

OS

시스템 구동 중 Thread Hangup 발생 이슈

페이지 정보

profile_image
작성자 jhseol
댓글 0건 조회 273회 작성일 20-10-20 15:33

본문

Compute(Host) 서버의 메모리 사용율이 99%(Cache포함)를 넘은 상태에서 Compute에 I/O를 방생시키면
해당 Compute에 속한 VM의 system I/O CPU가 100%까지 증가하여 VM이 멈추고
systecm log인 messages 에 kernel: BUGLsoft lockup - CPU#2struck for 23이 출력되는 이슈 발생

---------------------------------------------------------------------------------
분석 결과 다수의 soft lockup이 발견됬으며, khugepaged 타스크에서 대부분 발생되고 있음
하이퍼바이저의 자원 오버커밋과 연관이 있을 가능성이 있는걸로 파악

문제 해결을 위해
- THP 기능의 사용을 비활성화하여 해당 코드의 동작을 제어, 하이퍼바이저의 자우너 사용량이 적절한 상황인지도 확인 필요
- 운용중인 커널버전이 다소 낮아 성능이 향상된 최신 버전의 커널 및 해키지로 시스템 업그레이드 고려 필요
- abrt 서비스의 지속적인 동작이 확인되어 성능에 영향으 미칠 가능성도 배제할 수없어 abrt 서비스를 중지하는 것도 고려 필요

관련문서
[1] How to disable transparent hugepages (THP) on Red Hat Enterprise Linux 7
    https://access.redhat.com/solutions/1320153

[2] Kernel panics due to soft lockup. It is part of a Openstack "compute node", running under KVM as its hypervisor
    https://access.redhat.com/solutions/2137691

로그 내용
---------------------------------------------------------------------------------
khugepaged 타스크가 hugepage 동작 중에 20 초 이상 지연되어 soft lockup 이 발생되었습니다.

[2219243.590785] BUG: soft lockup - CPU#6 stuck for 22s! [khugepaged:92]
[2219243.591882] Modules linked in: fuse btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc vfat fat virtio_balloon pcspkr crc32_pclmul ghash_clmulni_intel ppdev aesni_intel i2c_piix4 lrw gf128mul glue_helper ablk_helper parport_pc parport cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm ata_piix crct10dif_pclmul crct10dif_common drm crc32c_intel virtio_pci i2c_core serio_raw virtio_ring libata virtio floppy dm_mirror
[2219243.591939]  dm_region_hash dm_log dm_mod
[2219243.591943] CPU: 6 PID: 92 Comm: khugepaged Tainted: G            L ------------  3.10.0-327.28.2.el7.x86_64 #1
[2219243.591944] Hardware name: Fedora Project OpenStack Nova, BIOS 0.5.1 01/01/2011
[2219243.591946] task: ffff880791178b80 ti: ffff8807911dc000 task.ti: ffff8807911dc000
[2219243.591947] RIP: 0010:[<ffffffff81300755>]  [<ffffffff81300755>] copy_page_rep+0x5/0x10
[2219243.591954] RSP: 0018:ffff8807911dfd90  EFLAGS: 00010206
[2219243.591955] RAX: 000000042657b000 RBX: 000000006717b000 RCX: 0000000000000200
[2219243.591956] RDX: ffff880000000000 RSI: ffff8805b4fc8000 RDI: ffff88042657b000
[2219243.591957] RBP: ffff8807911dfe40 R08: 0000000000000048 R09: ffff8807bfb84500
[2219243.591958] R10: 000000000000009c R11: 0000000000000000 R12: 0000000010995ec0
[2219243.591959] R13: ffffea0016d3f200 R14: ffff8807911dc000 R15: ffff8800a5e53bd8
[2219243.591960] FS:  0000000000000000(0000) GS:ffff8807a1300000(0000) knlGS:0000000000000000
[2219243.591961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2219243.591962] CR2: 0000000085bbf006 CR3: 000000000194a000 CR4: 00000000001406e0
[2219243.591966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2219243.591967] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[2219243.591968] Stack:
[2219243.591969]  ffffffff811ca53d 0000000067000000 00000000a5e53067 0000000067000000
[2219243.591973]  ffff8807957497a8 ffffea0002971fb0 ffff8807911dfe60 ffff8800a5c7e9c0
[2219243.591976]  ffff8807911dffd8 ffff8807911dffd8 ffff880035d96ab8 ffff880793c74870
[2219243.591979] Call Trace:
[2219243.591986]  [<ffffffff811ca53d>] ? khugepaged_scan_mm_slot+0x9fd/0xc40
[2219243.591990]  [<ffffffff811ca9d7>] khugepaged+0x257/0x480
[2219243.591994]  [<ffffffff810a6b20>] ? wake_up_atomic_t+0x30/0x30
[2219243.591996]  [<ffffffff811ca780>] ? khugepaged_scan_mm_slot+0xc40/0xc40
[2219243.592000]  [<ffffffff810a5b2f>] kthread+0xcf/0xe0
[2219243.592002]  [<ffffffff810a5a60>] ? kthread_create_on_node+0x140/0x140
[2219243.592006]  [<ffffffff81646b98>] ret_from_fork+0x58/0x90
[2219243.592009]  [<ffffffff810a5a60>] ? kthread_create_on_node+0x140/0x140
[2219243.592009] Code: b7 90 90 90 90 90 9c fa 65 48 3b 06 75 14 65 48 3b 56 08 75 0d 65 48 89 1e 65 48 89 4e 08 9d b0 01 c3 9d 30 c0 c3 b9 00 02 00 00 <f3> 48 a5 c3 0f 1f 80 00 00 00 00 eb ee 0f 1f 84 00 00 00 00 00

var/log/messages-20190923:Sep 21 12:52:22 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#2 stuck for 23s! [khugepaged:92]
var/log/messages-20190923:Sep 21 12:55:01 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#7 stuck for 22s! [sopagt:17813]
var/log/messages-20190923:Sep 21 12:55:02 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#7 stuck for 22s! [sopagt:17813]
...
var/log/messages-20190923:Sep 23 02:35:27 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#3 stuck for 45s! [cpd:25642]
var/log/messages-20190923:Sep 23 02:35:54 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#4 stuck for 23s! [khugepaged:92]
var/log/messages-20190923:Sep 23 03:00:42 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#2 stuck for 22s! [khugepaged:92]

var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
var/log/messages:Sep 23 10:27:39 csgn-2-cp-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

var/log/messages:Sep 23 11:04:42 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:05:22 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#1 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:07:58 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#5 stuck for 23s! [khugepaged:92]
var/log/messages:Sep 23 11:23:38 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:25:50 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#0 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 23 11:45:31 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#6 stuck for 22s! [khugepaged:92]
var/log/messages:Sep 25 13:30:10 csgn-2-cp-3 kernel: BUG: soft lockup - CPU#2 stuck for 23s! [supv:12188]

Sep 25 13:30:36 csgn-2-cp-3 sh: abrt-dump-oops: Found oopses: 1
Sep 25 13:30:36 csgn-2-cp-3 sh: abrt-dump-oops: Creating problem directories
Sep 25 13:30:36 csgn-2-cp-3 sh: abrt-dump-oops: Not going to make dump directories world readable because PrivateReports is on
Sep 25 13:30:37 csgn-2-cp-3 abrt-dump-oops: Reported 1 kernel oopses to Abrt
Sep 25 13:31:32 csgn-2-cp-3 python: detected unhandled Python exception in '/usr/sbin/sosreport'
Sep 25 13:31:36 csgn-2-cp-3 abrt-server: Lock file '/var/spool/abrt/post-create.lock' is locked by process 22904
Sep 25 13:31:37 csgn-2-cp-3 abrt-server: Lock file '/var/spool/abrt/post-create.lock' is locked by process 22904
Sep 25 13:31:38 csgn-2-cp-3 abrt-server: Lock file '/var/spool/abrt/post-create.lock' is locked by process 22904

DMIDECODE
  BIOS:
    Vend: Seabios    Vers: 0.5.1    Date: 01/01/2011    BIOS Rev: 1.0    FW Rev: 
    Mfr:  Fedora Project    Prod: OpenStack Nova    Vers: 12.0.4-1.el7
  CPU:
    10 of 10 CPU sockets populated, 0 cores/0 threads per CPU
    10 Intel Core Processor (Haswell) (flags: aes,constant_tsc,lm,nx,pae,rdrand)
  Memory:
    Total: 30720 MiB (30 GiB)

OS
OS
  Hostname: csgn-2-cp-3
  Distro:  [redhat-release] Red Hat Enterprise Linux Server release 7.2 (Maipo)
  Kernel:
    Booted kernel:  3.10.0-327.28.2.el7.x86_64
    Booted kernel cmdline:
      root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8
    Taint-check: 16384  (see https://access.redhat.com/solutions/40594)
      14  SOFTLOCKUP: A soft lockup has previously occurred
    - - - - - - - - - - - - - - - - - - -
  Sys time:  Wed Sep 25 13:34:48 KST 2019
  Boot time: Wed Aug 28 10:18:26 UTC 2019  (epoch: 1566987506)
  Uptime:    27 days, 18:16,  7 users
  LoadAvg:  [10 CPU] 10.06 (101%), 10.44 (104%), 13.30 (133%)

KDUMP CONFIG
  kexec-tools rpm version:
    kexec-tools-2.0.7-38.el7_2.1.x86_64
  Service enablement:
    UNIT          STATE
    kdump.service  enabled
  kdump initrd/initramfs:
    18318863 Aug 20  2018 initramfs-3.10.0-327.28.2.el7.x86_64kdump.img
    18211727 Aug 11  2016 initramfs-3.10.0-327.el7.x86_64kdump.img
  Memory reservation config:
    /proc/cmdline { crashkernel=auto }
    GRUB default  { crashkernel=auto }
  Actual memory reservation per /proc/iomem:
      2a000000-341fffff : Crash kernel
  kdump.conf:
    path /var/crash
    core_collector makedumpfile -l --message-level 1 -d 31
  kdump.conf "path" available space:
    System MemTotal (uncompressed core size) { 29.29 GiB }
    Available free space on target path's fs { 38.31 GiB }  (fs=/)
  Panic sysctls:
    kernel.sysrq [bitmask] =  "16"  (see proc man page)
    kernel.panic [secs] =  0  (no autoreboot on panic)
    kernel.hung_task_panic =  0
    kernel.panic_on_oops =  1
    kernel.panic_on_io_nmi =  0
    kernel.panic_on_unrecovered_nmi =  0
    kernel.panic_on_stackoverflow =  0
    kernel.softlockup_panic =  0
    kernel.unknown_nmi_panic =  0
    kernel.nmi_watchdog =  1
    vm.panic_on_oom [0-2] =  0  (no panic)

MEMORY
  Stats graphed as percent of MemTotal:
    MemUsed    ▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊..................................  31.4%
    Buffers    ..................................................  0.1%
    Cached    ▊▊▊▊▊▊▊...........................................  13.6%
    HugePages  ..................................................  0.0%
    Dirty      ..................................................  0.3%
  RAM:
    29.3 GiB total ram
    9.2 GiB (31%) used
    5.2 GiB (18%) used excluding Buffers/Cached
    0.08 GiB (0%) dirty
  HugePages:
    No ram pre-allocated to HugePages
  THP:
    260096 kB allocated to THP
  LowMem/Slab/PageTables/Shmem:
    0.57 GiB (2%) of total ram used for Slab
    0.02 GiB (0%) of total ram used for PageTables
    0.16 GiB (1%) of total ram used for Shmem
  Swap:
    0 GiB (0%) used of 1 GiB total

SYSCTLS
  kernel.
    hostname =  "csgn-2-cp-3"
    osrelease =  "3.10.0-327.28.2.el7.x86_64"
    tainted =  "16384"  (see https://access.redhat.com/solutions/40594)
      14  SOFTLOCKUP: A soft lockup has previously occurred
    hung_task_panic [bool] =  "0"
    hung_task_timeout_secs =  "120"  (secs task must be D-state to trigger)
    hung_task_warnings [num_warnings] =  "0"  (warnings disabled, either intentionally or after original num_warnings reached)
  vm.
    dirty_ratio =  "30"  (% of total system memory)
    dirty_background_ratio =  "10"  (% of total system memory)
    dirty_expire_centisecs =  "3000"
    dirty_writeback_centisecs =  "500"
    max_map_count =  "65530"
    min_free_kbytes =  "67584"
    swappiness [0-100] =  "30"
    vfs_cache_pressure [0-100] =  "100"

댓글목록

등록된 댓글이 없습니다.

회원로그인

회원가입

사이트 정보

회사명 : (주)리눅스데이타시스템 / 대표 : 정정모
서울본사 : 서울특별시 강남구 봉은사로 114길 40 홍선빌딩 2층 / tel : 02-6207-1160
대전지사 : 대전광역시 유성구 노은로174 도원프라자 5층 / tel : 042-331-1161

공지사항

  • 게시물이 없습니다.

접속자집계

오늘
20
어제
154
최대
301
전체
6,380
Copyright © www.linuxdata.org All rights reserved.