登录查看更多内容

Corehandler für Linux-Systeme in Python3

Thomas Boor

Gesch?ftsführer bei thovid GmbH

发布日期: 2020年6月11日

Nicht nur, wenn man man Software in C oder C++ entwickelt, muss man im Betrieb mit einem core rechnen. Neben einigen nicht zu bejubelnden Neuerungen in manchen Linux-Systemen (die unter dem Signet 'Security' eingebracht wurden) gibt es ein sehr nützliches feature: Man kann ein eigenes Filterprogramm als corehandler hinterlegen, das dann vom kernel im Falle eines Abbruchs aufgerufen wird und dabei den eigentlichen core als Standard-Eingabe stdin/cin (daher Filterprogramm genannt) erh?lt. Da dies genau zum Zeitpunkt des Abbruchs geschieht, ist dies die beste Gelegenheit, alle m?glichen Informationen rund um den Prozess bzw. rund um das Betriebssystem einzuholen. Darüberhinaus kann man bei der Gelegenheit auch die shared-libraries der Anwendung einsammeln und zum Konvolut der Erhebung hinzufügen. (Damit wird das debuggen der Anwendung auch auf anderen OS-Versionen als der im Produktionsumfeld eingesetzten Version m?glich.)

Neben dem core kümmert sich der w.u. gezeigte corehandler um

shared-libraries Ermittlung (und Sicherung) via /usr/bin/ldd
Kopie des abgestürzten Executables
Kopie der Log- und Trace-files der Anwendung
/usr/sbin/iptables -n --list
/bin/netstat
/usr/sbin/ip addr
/usr/sbin/ip route
/usr/sbin/ip 6 route
/usr/bin/dmesg
/usr/bin/ps aux
cat /proc/cpuinfo
cat /proc/meminfo
cat /etc/os-release
cat /etc/hostname
cat /proc/cmdline

Alle diese Informationen werden in separaten files abgelegt, die zum Schluss in ein gezipptes tar gepackt werden. Zu diesem Zeitpunkt der Verarbeitung w?re dann eine Notifikation (via mail, sms, telegram, oder, wie hier im Script, per checkMK) an die entsprechende Abteilung sinnvoll. (Wir haben nun Maschine, Zeitpunkt und alle Informationen (als Pfad zum tar-Archiv) zu vermelden.)

Die Installation des eigenen corehandlers erfolgt durch folgenden Aufruf:

echo "|/opt/thovid/corephae/bin/corehandler %P %p %e %t" > /proc/sys/kernel/core_pattern

Informationen zu den Aufrufparametern erh?lt man mit "man 5 core".

Der Aufruf erfolgt am besten in dem Start-Script der Anwendung, da auch andere player auf dem System den corehandler setzen k?nnten.

Wie oben angemerkt, erh?lt 'unser' corehandler den eigentlichen core via Standard-Eingabe. Zu beachten ist dabei, dass - sobald die Standard-Eingabe vollst?ndig gelesen ist - die assoziierte proc-Struktur des abgebrochenen Prozesses gel?scht wird. Daher sind alle Zugriffe auf /proc des Prozesses vor dem Verarbeiten des cores zu erledigen.

#!/usr/bin/env python3
# core_handler in python 3
# install (as root):
#       echo "|$(pwd)/corehandler %P %p %e %t" > /proc/sys/kernel/core_pattern
# be careful: the corehandler runs as kernel-module!!! do not access NFS-mounts or java-stuff.
# As long as the corehandler runs, no other instance of the cored executable can be spawned!

import os
import sys
import time
import shlex
import socket
import subprocess
import ch_utils as util
import configparser

# first, read from config ip and port of the checkMK-system to send udp-messages to it:
UdpIp = None
UdpPort = None
config = configparser.ConfigParser()
try:
    config.read("/opt/thovid/corephae/corehandler.cfg")
    UdpIp = config["checkMK"]["ip"]
    UdpPort = int(config["checkMK"]["port"])
    print("ip=%s, port=%d" % (UdpIp, UdpPort))
except:
    pass

# we have to expand the local path, as SuSe has multiple places where to store system-utils...
os.environ["PATH"] = "/bin:/sbin:/usr/sbin:/usr/bin:/usr/local/bin"

# the 4 command-line-args (%P %p %e %t):
arg_ipid = int(sys.argv[1])
arg_pid = int(sys.argv[2])
arg_binary = str(sys.argv[3])
arg_timestamp = str(sys.argv[4])

workdir = "/var/opt/thovid/core_storage/pm.%s.%d.%s/" % (arg_binary, arg_pid, arg_timestamp)
try:
    os.makedirs(workdir)
except:
    print("Failed to creat workdir %s - EXIT now" % workdir)
    sys.exit(2)

msg = open(workdir + "info_file", 'w+') # the message file

proc_exe_path = "/proc/%d/exe" % (arg_pid)
executable = os.readlink(proc_exe_path)

if len(executable) == 0:
    print("corehandler: could'nt get executable by proc-map - early exit", file = msg)
    sys.exit(4)

print("executable by proc-map = %s" % (executable), file = msg)
bin_path = os.path.dirname(executable)
binary = os.path.basename(executable)
print("binary path by proc-map = %s, basename = %s" % (bin_path, binary), file = msg)

# access the proc's /proc/-subtree as long it exists:
proc_so_map = "/proc/%d/maps" % (arg_pid)
util.fetch_cmd(["cat", proc_so_map], workdir + "proc_so_map", msg)

# extract the shared libraries used via 'ldd':
libdir = workdir + "libs"    # the local root, where to store the shared libraries
try:
    os.makedirs(libdir)
except:
    pass
util.fetch_cmd(["ldd", executable], workdir + "so_list", msg)
with open(workdir + "so_list", 'r') as so:
    for line in so.readlines():
        tokens = line.split()
        if len(tokens) > 3 and tokens[2].startswith("/"):
            a_so_lib = tokens[2]
            so_target_dir = libdir + os.path.dirname(a_so_lib)
            so_target     = libdir + a_so_lib
            print("so: %s" % a_so_lib, file=msg)
            print("so-target_dir=%s" % so_target_dir, file=msg)
            print("so-target=%s" % so_target, file=msg)
            try:
                os.makedirs(so_target_dir)
                os.mkdir(so_target_dir)
            except:
                pass
            cmd = "cp %s %s" % (a_so_lib, so_target)
            rc = subprocess.call(cmd, shell=True)

# create a conf-file for the gdb that expands the search-path for shared libs by our storage:
with open(workdir + "gdb.conf", 'w') as gdb:
    print("set sysroot libs\nset substitute-path /ffmbuild/systest .\n", file=gdb)

# /// additional info:
util.fetch_cmd(["iptables", str("--list"), "-n"], workdir + "iptables", msg)
util.fetch_cmd(["ip6tables", str("--list"), "-n"], workdir + "ip6tables", msg)
util.fetch_cmd(["ss"], workdir + "netstat", msg)
util.fetch_cmd(["ip", "addr"], workdir + "ip_addr", msg)
util.fetch_cmd(["ip", "route"], workdir + "ip_route", msg)
util.fetch_cmd(["ip", "-6", "route"], workdir + "ip_route_6", msg)

util.fetch_cmd(["dmesg"], workdir + "kernel_dmesg", msg)
util.fetch_cmd(["ps", "-aux"], workdir + "processes", msg)

util.fetch_cmd(["cat", "/proc/cpuinfo"], workdir + "proc_cpu_info", msg)
util.fetch_cmd(["cat", "/proc/meminfo"], workdir + "proc_mem_info", msg)
util.fetch_cmd(["cat", "/etc/os-release"], workdir + "etc_os_release", msg)
util.fetch_cmd(["cat", "/etc/hostname"], workdir + "etc_hostname", msg)
util.fetch_cmd(["cat", "/proc/cmdline"], workdir + "kernel_cmd_line", msg)

# cp the excutable:
try:
    cmd = "cp %s %s" % (executable, workdir + binary)
    print("corehandler going to exec <%s>" % (cmd), file=msg)
    rc = subprocess.call(cmd, shell=True)
    print("corehandler copied binary from %s to %s ended with rc=%d" % (executable, workdir + binary, int(rc)), file=msg)
except:
    print("corehandler failed to cp executable (%s) to workdir (%s)" % (ececutable, workdir + binary), file=msg)

# Call the application specific "collect_appl_crash_info"-file if it exists:
cmd_path = bin_path + "/collect_appl_crash_info"
cmd = cmd_path + ' ' + workdir
print("corehandler: checking for crash_info-script <%s>" % (cmd_path), file=msg)
if os.path.isfile(cmd_path):
    try:
        rc = subprocess.call(cmd, shell=True)
        print("corehandler: executed <%s> with rc=%d" % (cmd, int(rc)), file=msg)
    except:
        pass
else:
    print("corehandler: <%s> does not exist" % (cmd_path), file=msg)

# Notify cerberus (checkMK) via UDP:
if UdpIp != None:
    date = time.asctime(time.localtime(time.time()))
    message = str("linux coredump of <%s> on <%s> at %s, storage: %s" % (str(binary), str(socket.gethostname()), str(date), workdir + "tar.gz"))
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    try:
        sock.sendto(message.encode(), (UdpIp, UdpPort))
        print("corehandler: notified cerberus with '%s' ..." % (message), file=msg)
    except:
        print("corehandler: failed to notify cerberus (%s:%d) (message was '%s')." % (UdpIp, UdpPort, message), file=msg)
else:
        print("corehandler: does not notify cerberus as it's not configured.", file=msg)

t_util = time.process_time()

# FINALLY write the core itself. finally, cause after stdin has been consumed,
# the /proc//-subtree vanishes away:

# /// ------------------------------------------ write the core
core_file_name = workdir + "core"
written = 0
sum = 0
with open(core_file_name, "ab") as f:
    while True:
        core_data = sys.stdin.buffer.read(65536)
        if len(core_data) < 1:
            break;
        else:
            written = f.write(core_data)
            sum = sum + written
t_sum = time.process_time()
print("core data written into file, %d bytes" % (sum), file=msg)
# /// ------------------------------------------ / write the core

print("corehandler: done, t(process)=%lf seconds" % (t_sum), file=msg)
print("corehandler: core-creation took %lf, anything else took %lf" % (t_sum - t_util, t_util), file=msg)

msg.flush()

try:
    # invoke the compressed tar creator as backgound-job:
    p = subprocess.Popen(['/opt/thovid/corephae/creat_compressed_tar.sh', workdir], cwd=workdir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = p.communicate()    # returns byteareas
    print("corehandler: invoked compressed-tar-creation in backgound, stdout=%s, stderr=%s", stdout, stderr, file=msg)
except:
    print("corehandler: failed to invoke compressed-tar-creation in backgound", file=msg)

msg.close() # finally, close message file

Optimierungspotential: Man k?nnte das executable auch schon beim Start kopieren, zum Zeitpunkt des cores referenzieren (man hat ja die pid). Grund: Ist die Anwendung l?ngere Zeit gelaufen, k?nnte das executable l?ngst ausgetauscht worden sein - in diesem Fall würden wir das neue executable kopieren, nicht das ge-core-te!

Zu dem eigentlichen corehandler geh?rt noch ein kleines Submodul, ch_utils.py:

import subprocess
import sys
import string
import threading
import os.path


# executor of given commandline. The commandline has to be an array
def execute(cmd, directory):
    proc = subprocess.Popen(
        cmd,
        cwd=directory,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE)


    stdout, stderr = proc.communicate() # returns byteareas
    try:
        stdout = stdout.decode('utf-8')
    except:
        pass
    return proc.returncode, str(stdout)


# execute a command and read its output, write into 'file', trace at msg_fd
def fetch_cmd(cmd, file, msg_fd):
    rc, result = execute(cmd, ".")
    result = result.replace('\\n', '\n')
    result = result.replace('\\t', '\t')
    with open(file, "w") as x:
        for item in result:
            x.write("%s" % item)
    print(str("Retrieved data from command %s into file %s" % (cmd, file)), file=msg_fd)
~

要查看或添加评论，请登录

Thomas Boor的更多文章

git-repo als file transferieren

2023年2月27日

git-repo als file transferieren

Im zu exportierenden Repository den export als Bundle starten: tar cf /tmp/myrepo.tar .
Netzwerk-Performance messen und darstellen

2022年11月22日

Netzwerk-Performance messen und darstellen

In modernen, verteilten Systemen ist die Zuverl?ssigkeit des Netzwerks essentiell. Unabh?ngig von Schichtung und…
Konvertierung von Python-floats in IEEE754-binaries und vice versa

2021年1月6日

Konvertierung von Python-floats in IEEE754-binaries und vice versa

Einige knappe Python3-Hilfsfunktionen zum Konvertieren von floats in IEEE754-Bin?rzahlen und umgekehrt. Am Ende des…
DevOps (a.k.a. CI/CD) – warum ?

2020年10月9日

DevOps (a.k.a. CI/CD) – warum ?

Durchforstet man einschl?gige Projekt-Angebote (projektwerk.de, freelancermap.
Python/openCV: Gesichtskennung, Teil 2: face-recognition

2020年6月10日

Python/openCV: Gesichtskennung, Teil 2: face-recognition

Die openCV-Bibliothek - in C++ geschrieben - steht für etliche Sprachen zur Verfügung. OpenCV steht für Open Source…
C++: Wann bedarf es einer virtuellen Ableitung

2020年6月3日

C++: Wann bedarf es einer virtuellen Ableitung

Wenn es um performante Anwendungen geht, ist C++ noch immer eine der Top-Sprachen. Seit seiner Einführung hat C++…
Python/openCV: Gesichtskennung, Teil 1: face-detection

2020年5月29日

Python/openCV: Gesichtskennung, Teil 1: face-detection

Die openCV-Bibliothek - in C++ geschrieben - steht für etliche Sprachen zur Verfügung. OpenCV steht für Open Source…
AI (Artificial Intelligence, aka AI) with Python, tensorflow and keras

2020年5月23日

AI (Artificial Intelligence, aka AI) with Python, tensorflow and keras

Even without in-depth knowledge of 2-D and 3-D graphics processing, Python - thanks to the tensorflow module provided…
Künstliche Intelligenz mit Python, tensorflow und keras

2020年5月22日

Künstliche Intelligenz mit Python, tensorflow und keras

KI (Künstliche Intelligenz, aka AI) mit Python, tensorflow und keras Auch ohne fundierte Kenntnis von 2- und 3-D…
Daten-Visualisierung in Python3

2020年4月3日

Daten-Visualisierung in Python3

Die folgenden, kleinen Python-Scripts zeigen, wie man in Python mit wenig code Daten-Visualisierung betreiben kann…

See all articles

Corehandler für Linux-Systeme in Python3

Thomas Boor

Gesch?ftsführer bei thovid GmbH

Thomas Boor的更多文章

社区洞察

其他会员也浏览了

The Power of Clustering in Node.js for Optimal Performance

A Beginner's Tutorial on Basics of Delegates, Anonymous Functions, and Lambda Expressions in C#

Spring Boot Configuration: @ConditionalOnProperty

Operating System For Interviews-Part 1

When to Use volatile?

Learning Ring And Building Echo

Anagrams as primes

Downside of the Executor Service with context to thread local

Built-in Macros in the Linux Kernel

Supercharge your JS Arrays with Mutils

Thomas Boor的更多文章

git-repo als file transferieren

Netzwerk-Performance messen und darstellen

Konvertierung von Python-floats in IEEE754-binaries und vice versa

DevOps (a.k.a. CI/CD) – warum ?

Python/openCV: Gesichtskennung, Teil 2: face-recognition

C++: Wann bedarf es einer virtuellen Ableitung

Python/openCV: Gesichtskennung, Teil 1: face-detection

AI (Artificial Intelligence, aka AI) with Python, tensorflow and keras

Künstliche Intelligenz mit Python, tensorflow und keras

Daten-Visualisierung in Python3

社区洞察

其他会员也浏览了

The Power of Clustering in Node.js for Optimal Performance

A Beginner's Tutorial on Basics of Delegates, Anonymous Functions, and Lambda Expressions in C#

Spring Boot Configuration: @ConditionalOnProperty

Operating System For Interviews-Part 1

When to Use volatile?

Learning Ring And Building Echo

Anagrams as primes

Downside of the Executor Service with context to thread local

Built-in Macros in the Linux Kernel

Supercharge your JS Arrays with Mutils