From 41e3e29dc25f950fabe921384acb6146eb37548d Mon Sep 17 00:00:00 2001 From: Alexander Daichendt Date: Tue, 11 Jun 2024 20:47:22 +0200 Subject: [PATCH] feat: add smi blog post --- src/routes/blog/detecting-smi/+page.md | 159 +++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 src/routes/blog/detecting-smi/+page.md diff --git a/src/routes/blog/detecting-smi/+page.md b/src/routes/blog/detecting-smi/+page.md new file mode 100644 index 0000000..1fdbec0 --- /dev/null +++ b/src/routes/blog/detecting-smi/+page.md @@ -0,0 +1,159 @@ +--- +created: '2024-01-15' +title: 'Detecting System Management Interrupts (SMIs)' +description: '' +keywords: + - SMI +--- + +# System Management Interrutps (SMI) + +- high priority interrupts caused by the hardware +- transparent to the operating system +- can be used by the mainboard for power management, thermal management, or other system-level functions independent of the OS +- can take a long time to execute, causing a CPU core to be blocked from other work + +## Detecting SMIs + +- compile a kernel with hwlat tracing capabilities; usually, a typical Linux kernel has this enabled; if not, the config can be found in the appendix +- after starting the machine with a trace-capable image + - check available tracers `cat /sys/kernel/debug/tracing/available_tracers` + - enable the tracer `echo hwlat > /sys/kernel/debug/tracing/current_tracer` + - there now should be a process "hwlat" running that takes up 50% of one CPU +- output of the hwlat tracer available `cat /sys/kernel/debug/tracing/trace` or `cat /sys/kernel/debug/tracing/trace_pipep` + +## Example Output + +``` +# tracer: hwlat +# +# entries-in-buffer/entries-written: 1/1 #P:32 +# +# _-----=> irqs-off +# / _----=> need-resched +# | / _---=> hardirq/softirq +# || / _--=> preempt-depth +# ||| / delay +# TASK-PID CPU# |||| TIMESTAMP FUNCTION +# | | | |||| | | + <...>-30395 [010] d... 533.362276: #1 inner/outer(us): 0/388 ts:1711469939.505579595 count:1 +``` + +- inner/outer: where the latency was detected, see next section + +### How does it work? + +- this hwlat process is taking timestamps in a loop +- if distance between two timestamps is unreasonably large (bigger than ns), there was an SMI +- we should lower the threshold of this distance to 1us by executing `echo 1 > /sys/kernel/debug/tracing/tracing_thresh` +- the hwlat process is migrated over all the cores to catch SMIs there +- inner vs outer latency + +``` + while (run) { + start_ts = trace_local_clock(); + end_ts = trace_local_clock(); + if (!first && start_ts - last_ts > thresh) + record_outer(); + if (end_ts - start_ts > thresh) + record_inner(); + last_ts = end_ts; + first = 0; + } +``` + +### Further options + +- by default, only 50% CPU time is used +- this can be increased by echoing into `echo 9999999 > /sys/kernel/debug/tracing/hwlat_detector/width`, where the value is smaller than the set window `cat /sys/kernel/debug/tracing/hwlat_detector/window` to avoid starving the system. +- from my experience, this, however, is not necessary to catch SMIs. The default option is "good enough". + +## Firing an SMI manually + +- There is a nice small kernel module [here](https://github.com/jib218/kernel-module-smi-trigger) for manually triggering an SMI to verify the setup +- follow the instructions in the readme to compile and load the module + +## Hardware Registers for counting SMIs + +- Intel: MSR0x34, can be read out with turbostat / perf +- AMD: ls\_msi\_rx, can be used with `perf stat -e ls_smi_rx -I 60000` +However, doesn't seem to count everything; counts seem incorrect + +--- + +## Sources, Appendix + +- https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/hwlat +- https://www.kernel.org/doc/html/latest/trace/hwlat_detector.html +- https://lwn.net/Articles/860578/ +- https://www.jabperf.com/ima-let-you-finish-but-hunting-down-system-interrupts/ + +Custom Kernel Fragment: files/kernel\_config\_fragments/trace + +``` +CONFIG_USER_STACKTRACE_SUPPORT=y +CONFIG_NOP_TRACER=y +CONFIG_HAVE_RETHOOK=y +CONFIG_RETHOOK=y +CONFIG_HAVE_FUNCTION_TRACER=y +CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y +CONFIG_HAVE_FUNCTION_GRAPH_RETVAL=y +CONFIG_HAVE_DYNAMIC_FTRACE=y +CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y +CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y +CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y +CONFIG_HAVE_DYNAMIC_FTRACE_NO_PATCHABLE=y +CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y +CONFIG_HAVE_SYSCALL_TRACEPOINTS=y +CONFIG_HAVE_FENTRY=y +CONFIG_HAVE_OBJTOOL_MCOUNT=y +CONFIG_HAVE_OBJTOOL_NOP_MCOUNT=y +CONFIG_HAVE_C_RECORDMCOUNT=y +CONFIG_HAVE_BUILDTIME_MCOUNT_SORT=y +CONFIG_BUILDTIME_MCOUNT_SORT=y +CONFIG_TRACER_MAX_TRACE=y +CONFIG_TRACE_CLOCK=y +CONFIG_RING_BUFFER=y +CONFIG_EVENT_TRACING=y +CONFIG_CONTEXT_SWITCH_TRACER=y +CONFIG_RING_BUFFER_ALLOW_SWAP=y +CONFIG_PREEMPTIRQ_TRACEPOINTS=y +CONFIG_TRACING=y +CONFIG_GENERIC_TRACER=y +CONFIG_TRACING_SUPPORT=y +CONFIG_FTRACE=y +CONFIG_FUNCTION_TRACER=y +CONFIG_FUNCTION_GRAPH_TRACER=y +CONFIG_FUNCTION_GRAPH_RETVAL=y +CONFIG_DYNAMIC_FTRACE=y +CONFIG_DYNAMIC_FTRACE_WITH_REGS=y +CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y +CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y +CONFIG_FPROBE=y +CONFIG_FUNCTION_PROFILER=y +CONFIG_TRACE_PREEMPT_TOGGLE=y +CONFIG_IRQSOFF_TRACER=y +CONFIG_PREEMPT_TRACER=y +CONFIG_SCHED_TRACER=y +CONFIG_HWLAT_TRACER=y +CONFIG_OSNOISE_TRACER=y +CONFIG_TIMERLAT_TRACER=y +CONFIG_MMIOTRACE=y +CONFIG_FTRACE_SYSCALLS=y +CONFIG_TRACER_SNAPSHOT=y +CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y +CONFIG_BRANCH_PROFILE_NONE=y +CONFIG_BLK_DEV_IO_TRACE=y +CONFIG_FPROBE_EVENTS=y +CONFIG_KPROBE_EVENTS=y +CONFIG_UPROBE_EVENTS=y +CONFIG_DYNAMIC_EVENTS=y +CONFIG_PROBE_EVENTS=y +CONFIG_FTRACE_MCOUNT_RECORD=y +CONFIG_FTRACE_MCOUNT_USE_CC=y +CONFIG_PROVIDE_OHCI1394_DMA_INIT=y +CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y +CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y +CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y +CONFIG_STRICT_DEVMEM=y +```