Advanced Ubuntu Troubleshooting Techniques for Site Reliability Engineers with Maximum Security Measures
Here's another "Mad Scientist" Fidel Vetino advanced troubleshooting techniques tailored for Site Reliability Engineers (SREs) working with Ubuntu systems.
I'll cover in-depth system monitoring, network troubleshooting, kernel tuning, file system management, process management, container and virtualization management, security auditing, and high availability configurations, each section integrates maximum security measures. I'll provide detailed scripts, commands, and configurations to ensure the reliability, performance, and security of Ubuntu environments.
Below are the advanced troubleshooting techniques with added security configurations for each step.
1. In-depth System Monitoring and Logging
Prometheus and Grafana
Prometheus Installation with Security:
bash
# Install Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.26.0/prometheus-2.26.0.linux-amd64.tar.gz
tar xvf prometheus-2.26.0.linux-amd64.tar.gz
cd prometheus-2.26.0.linux-amd64
# Create Prometheus user and directories
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo cp prometheus /usr/local/bin/
sudo cp promtool /usr/local/bin/
sudo cp -r consoles /etc/prometheus
sudo cp -r console_libraries /etc/prometheus
# Secure permissions
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
# Configuration file with authentication
sudo nano /etc/prometheus/prometheus.yml
# Add scrape configs and enable authentication
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
basic_auth:
username: 'admin'
password: 'password'
# Create systemd service file
sudo nano /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.enable-admin-api \
--web.listen-address="localhost:9090"
[Install]
WantedBy=multi-user.target
# Start Prometheus
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
Grafana Installation with Security:
bash
# Install Grafana
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update
sudo apt-get install grafana
# Start Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
# Secure Grafana with authentication
sudo nano /etc/grafana/grafana.ini
[security]
admin_user = admin
admin_password = strongpassword
[server]
protocol = https
cert_file = /path/to/your/cert/file
cert_key = /path/to/your/cert/key
# Access Grafana at https://your_server_ip:3000
ELK Stack
Elasticsearch Installation with Security:
bash
# Install Elasticsearch
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-add-repository "deb https://artifacts.elastic.co/packages/7.x/apt stable main"
sudo apt-get update
sudo apt-get install elasticsearch
# Secure Elasticsearch
sudo nano /etc/elasticsearch/elasticsearch.yml
# Add or update the following lines
network.host: localhost
xpack.security.enabled: true
# Start Elasticsearch
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch
# Set up passwords for built-in users
/usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive
Logstash Installation with Security:
bash
# Install Logstash
sudo apt-get install logstash
# Secure Logstash configuration
sudo nano /etc/logstash/conf.d/logstash.conf
input {
beats {
port => 5044
ssl => true
ssl_certificate => "/etc/logstash/logstash.crt"
ssl_key => "/etc/logstash/logstash.key"
}
}
filter {
# Add your filters here
}
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
user => "elastic"
password => "your_password"
}
}
# Start Logstash
sudo systemctl start logstash
sudo systemctl enable logstash
Kibana Installation with Security:
bash
# Install Kibana
sudo apt-get install kibana
# Secure Kibana
sudo nano /etc/kibana/kibana.yml
# Add or update the following lines
server.host: "localhost"
elasticsearch.hosts: ["https://localhost:9200"]
elasticsearch.username: "kibana"
elasticsearch.password: "your_password"
server.ssl.enabled: true
server.ssl.certificate: /path/to/your/cert/file
server.ssl.key: /path/to/your/cert/key
# Start Kibana
sudo systemctl start kibana
sudo systemctl enable kibana
# Access Kibana at https://your_server_ip:5601
Sysdig Installation with Security:
bash
# Install Sysdig
curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash
# Secure Sysdig capture
sudo sysdig -z -w capture.scap
# Restrict access to captured files
sudo chown root:root capture.scap
sudo chmod 600 capture.scap
2. Advanced Network Troubleshooting
tcpdump and Wireshark
tcpdump Usage with Security:
bash
# Capture packets with specific permissions
sudo setcap cap_net_raw,cap_net_admin=eip /usr/sbin/tcpdump
# Capture packets on interface eth0 and save to file with secure permissions
sudo tcpdump -i eth0 -w capture.pcap
sudo chown root:root capture.pcap
sudo chmod 600 capture.pcap
Wireshark Usage with Security:
bash
# Install Wireshark
sudo apt-get install wireshark
# Allow non-root users to capture packets securely
sudo dpkg-reconfigure wireshark-common
sudo usermod -aG wireshark $USER
nmap and netcat
nmap Usage with Security:
bash
# Scan a single IP with limited user privileges
sudo -u limiteduser nmap 192.168.1.1
# Scan specific ports
sudo -u limiteduser nmap -p 22,80,443 192.168.1.1
netcat Usage with Security:
bash
# Check if port is open
nc -zv 192.168.1.1 22
# Start listening on a port with restricted permissions
sudo setcap cap_net_bind_service=+ep `which nc`
nc -l 12345
Traceroute and MTR
Traceroute Usage with Security:
bash
# Install traceroute
sudo apt-get install traceroute
# Use traceroute
sudo traceroute google.com
MTR Usage with Security:
bash
# Install MTR
sudo apt-get install mtr
# Use MTR with restricted permissions
sudo mtr google.com
3. Kernel and Performance Tuning
Perf and eBPF
Perf Usage with Security:
bash
# Install perf
sudo apt-get install linux-tools-common linux-tools-generic
# Record performance data with restricted access
sudo perf record -a -g sleep 10
sudo chown root:root perf.data
sudo chmod 600 perf.data
# Generate report
sudo perf report
eBPF Usage with bpftrace:
bash
# Install bpftrace
sudo apt-get install bpftrace
# Run a simple eBPF program with secure permissions
sudo bpftrace -e 'kprobe:do_sys_open { printf("%s\n", str(arg1)); }' > bpftrace_output.txt
sudo chown root:root bpftrace_output.txt
sudo chmod 600 bpftrace_output.txt
Tuning Sysctl Parameters
bash
# Edit sysctl configuration
sudo nano /etc/sysctl.conf
# Example settings with security considerations
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 2048
vm.swappiness = 10
kernel.randomize_va_space = 2
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
# Apply changes
sudo sysctl -p
4. File System and Disk Management
iostat, vmstat, and dstat
iostat Usage with Security:
bash
# Install iostat
sudo apt-get install sysstat
# Use iostat with restricted permissions
sudo iostat -x 1 10 > iostat_output.txt
sudo chown root:root iostat_output.txt
sudo chmod 600 iostat_output.txt
vmstat Usage with Security:
bash
# Use vmstat with restricted permissions
vmstat 1 10 > vmstat_output.txt
sudo chown root:root vmstat_output.txt
sudo chmod 600 vmstat_output.txt
dstat Usage with Security:
bash
# Install dstat
sudo apt-get install dstat
# Use dstat with restricted permissions
dstat -cdngy 5 > dstat_output.txt
sudo chown root:root dstat_output.txt
sudo chmod 600 dstat_output.txt
Filesystem Check and Repair
bash
# Check filesystem with secure permissions
sudo fsck /dev/sda1
LVM and RAID
LVM Setup with Security:
bash
# Create physical volume
sudo pvcreate /dev/sdb
# Create volume group
sudo vgcreate myvg /dev/sdb
# Create logical volume
sudo lvcreate -L 10G -n mylv myvg
# Format and mount with secure permissions
sudo mkfs.ext4 /dev/myvg/mylv
sudo mount /dev/myvg/mylv /mnt
sudo chown root:root /mnt
sudo chmod 700 /mnt
RAID Setup with mdadm and Security:
bash
# Install mdadm
sudo apt-get install mdadm
# Create RAID array with secure permissions
sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sda /dev/sdb /dev/sdc
# Format and mount with secure permissions
sudo mkfs.ext4 /dev/md0
sudo mount /dev/md0 /mnt
sudo chown root:root /mnt
sudo chmod 700 /mnt
5. Memory Management
OOM Killer Analysis
bash
# Check OOM events in kernel log
sudo grep -i 'out of memory' /var/log/kern.log
# Check memory info
sudo cat /proc/meminfo > meminfo_output.txt
sudo chown root:root meminfo_output.txt
sudo chmod 600 meminfo_output.txt
Heap and Stack Analysis
Valgrind Usage with Security:
bash
# Install valgrind
sudo apt-get install valgrind
# Check for memory leaks with secure output
valgrind --leak-check=full ./my_application > valgrind_output.txt
sudo chown root:root valgrind_output.txt
sudo chmod 600 valgrind_output.txt
GDB Usage with Security:
bash
# Install gdb
sudo apt-get install gdb
# Debug application with restricted access
gdb ./my_application
# (gdb) run
# (gdb) backtrace
领英推荐
6. Process and Service Management
strace and lsof
strace Usage with Security:
bash
# Trace system calls of a process with secure output
sudo strace -p <pid> -o strace_output.txt
sudo chown root:root strace_output.txt
sudo chmod 600 strace_output.txt
# Trace a command with secure output
sudo strace -o output.txt ls
sudo chown root:root output.txt
sudo chmod 600 output.txt
lsof Usage with Security:
bash
# List open files with secure output
lsof > lsof_output.txt
sudo chown root:root lsof_output.txt
sudo chmod 600 lsof_output.txt
# List files opened by a specific process with secure output
lsof -p <pid> > lsof_pid_output.txt
sudo chown root:root lsof_pid_output.txt
sudo chmod 600 lsof_pid_output.txt
# List files opened by a specific user with secure output
lsof -u <user> > lsof_user_output.txt
sudo chown root:root lsof_user_output.txt
sudo chmod 600 lsof_user_output.txt
systemd-analyze with Security
bash
# Analyze boot-up performance with secure output
systemd-analyze > systemd_analyze_output.txt
sudo chown root:root systemd_analyze_output.txt
sudo chmod 600 systemd_analyze_output.txt
# Critical chain analysis with secure output
systemd-analyze critical-chain > systemd_critical_chain_output.txt
sudo chown root:root systemd_critical_chain_output.txt
sudo chmod 600 systemd_critical_chain_output.txt
# Blame (list units ordered by time) with secure output
systemd-analyze blame > systemd_blame_output.txt
sudo chown root:root systemd_blame_output.txt
sudo chmod 600 systemd_blame_output.txt
7. Application-Level Debugging
Application Profilers
JProfiler (for Java Applications) with Security:
bash
# Download JProfiler
# https://www.ej-technologies.com/products/jprofiler/download.html
# Extract and run
tar xvfz jprofiler_linux.tar.gz
cd jprofiler<version>/bin
./jprofiler
# Secure JProfiler session data
sudo chown root:root *.jps
sudo chmod 600 *.jps
Debugging Tools
Python pdb with Security:
python
# Example usage in a script with secure logging
import pdb
def buggy_function():
pdb.set_trace()
x = [1, 2, 3]
print(x[3])
buggy_function()
# Secure pdb log
sudo chown root:root pdb_log.txt
sudo chmod 600 pdb_log.txt
8. Container and Virtualization Troubleshooting
Docker and Kubernetes
Docker Logs and Inspect with Security:
bash
# View logs of a container with secure permissions
docker logs <container_id> > docker_logs.txt
sudo chown root:root docker_logs.txt
sudo chmod 600 docker_logs.txt
# Inspect a container with secure permissions
docker inspect <container_id> > docker_inspect.txt
sudo chown root:root docker_inspect.txt
sudo chmod 600 docker_inspect.txt
Kubernetes Logs and Describe with Security:
bash
# View logs of a pod with secure permissions
kubectl logs <pod_name> > kubectl_logs.txt
sudo chown root:root kubectl_logs.txt
sudo chmod 600 kubectl_logs.txt
# Describe a pod with secure permissions
kubectl describe pod <pod_name> > kubectl_describe.txt
sudo chown root:root kubectl_describe.txt
sudo chmod 600 kubectl_describe.txt
# Monitor resources with secure permissions
kubectl top nodes > kubectl_top_nodes.txt
sudo chown root:root kubectl_top_nodes.txt
sudo chmod 600 kubectl_top_nodes.txt
kubectl top pods > kubectl_top_pods.txt
sudo chown root:root kubectl_top_pods.txt
sudo chmod 600 kubectl_top_pods.txt
Virtual Machine Management with virsh and Security
bash
# List all VMs with secure permissions
sudo virsh list --all > virsh_list.txt
sudo chown root:root virsh_list.txt
sudo chmod 600 virsh_list.txt
# Start a VM with restricted access
sudo virsh start <vm_name>
# Shutdown a VM with restricted access
sudo virsh shutdown <vm_name>
9. Security Auditing and Hardening
Auditd
Auditd Installation and Configuration:
bash
# Install auditd
sudo apt-get install auditd
# Secure auditd configuration
sudo nano /etc/audit/audit.rules
# Example rule: Monitor /etc/passwd
-w /etc/passwd -p wa -k passwd_changes
# Restart auditd
sudo systemctl restart auditd
# View audit logs with secure permissions
sudo ausearch -k passwd_changes > audit_logs.txt
sudo chown root:root audit_logs.txt
sudo chmod 600 audit_logs.txt
SELinux/AppArmor
AppArmor with Security:
bash
# Check AppArmor status
sudo aa-status
# Enforce a profile with restricted permissions
sudo aa-enforce /etc/apparmor.d/usr.bin.mysqld
# Check logs for AppArmor denials with secure permissions
sudo grep -i 'apparmor="DENIED"' /var/log/syslog > apparmor_denials.txt
sudo chown root:root apparmor_denials.txt
sudo chmod 600 apparmor_denials.txt
10. Automation and Configuration Management
Ansible with Security
Ansible Playbook Example with Secure Configuration:
yaml
# Install Ansible
sudo apt-get install ansible
# Secure Ansible configuration
sudo nano /etc/ansible/ansible.cfg
# Example playbook
---
- name: Install and start Apache securely
hosts: webservers
become: yes
tasks:
- name: Install Apache
apt:
name: apache2
state: present
- name: Start Apache
service:
name: apache2
state: started
enabled: yes
- name: Secure Apache
lineinfile:
path: /etc/apache2/conf-available/security.conf
regexp: '^#?ServerTokens'
line: 'ServerTokens Prod'
state: present
- name: Enable security headers
lineinfile:
path: /etc/apache2/conf-available/security.conf
line: 'Header always set X-Content-Type-Options "nosniff"'
state: present
Run the playbook with secure inventory:
bash
# Create secure inventory file
sudo nano inventory.ini
[webservers]
server1 ansible_host=192.168.1.1
# Secure inventory file
sudo chown root:root inventory.ini
sudo chmod 600 inventory.ini
# Run the playbook
ansible-playbook -i inventory.ini playbook.yml
Puppet with Security
Puppet Manifests Example with Secure Configuration:
puppet
# Install Puppet
sudo apt-get install puppet
# Secure Puppet configuration
sudo nano /etc/puppet/puppet.conf
# Example manifest
node 'webserver' {
package { 'apache2':
ensure => installed,
}
service { 'apache2':
ensure => running,
enable => true,
}
file { '/etc/apache2/conf-available/security.conf':
ensure => present,
content => 'ServerTokens Prod\nHeader always set X-Content-Type-Options "nosniff"',
mode => '0644',
owner => 'root',
group => 'root',
}
}
Apply the manifest with secure permissions:
bash
# Apply the manifest
sudo puppet apply -e 'include webserver'
# Secure Puppet manifest file
sudo chown root:root /etc/puppet/manifests/site.pp
sudo chmod 600 /etc/puppet/manifests/site.pp
11. High Availability and Load Balancing
HAProxy and Keepalived with Security
HAProxy Configuration with Security:
bash
# Install HAProxy
sudo apt-get install haproxy
# Secure HAProxy configuration
sudo nano /etc/haproxy/haproxy.cfg
frontend http_front
bind *:80
stats uri /haproxy?stats
stats auth admin:strongpassword
default_backend http_back
backend http_back
balance roundrobin
server web1 192.168.1.2:80 check
server web2 192.168.1.3:80 check
# Secure HAProxy configuration file
sudo chown root:root /etc/haproxy/haproxy.cfg
sudo chmod 600 /etc/haproxy/haproxy.cfg
# Start HAProxy
sudo systemctl start haproxy
sudo systemctl enable haproxy
Keepalived Configuration with Security:
bash
# Install Keepalived
sudo apt-get install keepalived
# Secure Keepalived configuration
sudo nano /etc/keepalived/keepalived.conf
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
192.168.1.100
}
}
# Secure Keepalived configuration file
sudo chown root:root /etc/keepalived/keepalived.conf
sudo chmod 600 /etc/keepalived/keepalived.conf
# Start Keepalived
sudo systemctl start keepalived
sudo systemctl enable keepalived
Corosync and Pacemaker with Security
Install Corosync and Pacemaker:
bash
sudo apt-get install corosync pacemaker
# Secure Corosync configuration
sudo nano /etc/corosync/corosync.conf
# Example configuration
totem {
version: 2
cluster_name: mycluster
transport: udpu
}
nodelist {
node {
ring0_addr: node1
nodeid: 1
}
node {
ring0_addr: node2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
}
# Secure Corosync configuration file
sudo chown root:root /etc/corosync/corosync.conf
sudo chmod 600 /etc/corosync/corosync.conf
# Start and enable Corosync
sudo systemctl start corosync
sudo systemctl enable corosync
# Secure Pacemaker configuration
sudo crm configure primitive myservice lsb:myservice op monitor interval=30s
Implementing advanced troubleshooting techniques with a focus on security is essential for maintaining the integrity and efficiency of Ubuntu systems. By following the detailed steps and incorporating robust security measures outlined in this guide, Site Reliability Engineers can enhance their ability to diagnose and resolve issues while safeguarding their infrastructure against potential threats. This proactive approach not only ensures system reliability but also fortifies the overall security posture of their environments.
Your support means a lot!
Thank you so much for taking the time to review my project.
Fidel Vetino (the Mad Scientist)
Tech Innovator & Solution Engineer
?? Fidel V. - Technology Innovator & Visionary
#AI / #AI_mindmap / #AI_ecosystem / #ai_model / #Automation / #analytics / #automotive / #aviation / #genai / #gen_ai / #LLM / #ML / #SecuringAI / #python / #machine_learning / #machinelearning / #deeplearning / #artificialintelligence / #businessintelligence / #cloud / #Mobileapplications / #SEO / #Website / #Education / #engineering / #management / #security / #blockchain / #marketingdigital / #entrepreneur / #linkedin / #lockdown / #energy / #startup / #retail / #fintech / #tecnologia / #programing / #future / #technology / #creativity / #innovation / #data / #bigdata / #datamining / #strategies /
#DataModel / #cybersecurity / #itsecurity / #facebook / #accenture / #twitter / #ibm / #dell / #intel / #emc2 / #spark / #salesforce / #Databrick / #snowflake / #SAP / #linux / #memory / #ubuntu / #apps / #software / #io / #pipeline / #florida / #tampatech / #Georgia / #atlanta / #north_carolina / #south_carolina / #personalbranding / #Jobposting / #HR / #Recruitment / #Recruiting / #Hiring / #Entrepreneurship / #moon2mars / #nasa / #Aerospace / #spacex / #mars / #orbit / #AWS / #oracle / #microsoft / #GCP / #Azure / #ERP / #spark / #walmart / #smallbusiness