# MySQL主从复制监控脚本完整实现:从入门到生产级方案
## 摘要
本文深入讲解MySQL主从复制监控的完整实现,从基础原理到生产级监控方案。涵盖Shell脚本开发、PHP可视化面板、Prometheus集成、智能告警系统等核心技术。基于真实生产环境经验,提供可直接使用的监控模板和故障排查指南。
## 背景与问题
### 为什么需要MySQL复制监控?
根据我的经验,在生产环境中,MySQL主从复制是保证数据高可用、读写分离、数据备份的核心架构。但复制过程可能出现的异常往往导致严重后果:
#### 真实案例1:复制延迟导致的业务故障
某电商网站在大促期间,从库复制延迟达到30分钟,导致用户在商品页面看到的库存数据和实际库存不一致,引发超卖事故,直接损失超过50万元。
**根本原因分析**:
– 主库大量写入操作(订单创建、库存扣减)
– 从库硬件性能不足(单核CPU、5400转机械硬盘)
– 缺乏复制延迟监控,未能及时发现
#### 真实案例2:复制中断导致的数据丢失
某SaaS平台的从库复制中断72小时无人察觉,当主库磁盘故障时,从库数据落后3天,无法及时接管业务,导致服务中断24小时。
**根本原因分析**:
– 网络抖动导致IO线程停止
– 没有配置复制监控告警
– 缺乏定期主从数据一致性校验
### MySQL复制常见异常类型
#### 1. 复制延迟(Replication Lag)
**症状**:
– 从库Seconds_Behind_Master值持续增长
– 用户查询到旧数据
– 读写分离失效
**影响**:
– 数据一致性无法保证
– 业务决策基于错误数据
– 用户体验下降
#### 2. 复制中断(Replication Failure)
**症状**:
– Slave_IO_Running: No
– Slave_SQL_Running: No
– Last_Error字段有错误信息
**常见原因**:
– 网络连接中断
– 主库binlog丢失
– 从库重复键冲突
– 从库执行出错
#### 3. 数据不一致(Data Inconsistency)
**症状**:
– 主从记录数不一致
– 同一记录数据不同
– 索引或约束错误
**检测方法**:
“`bash
# 使用pt-table-checksum工具
pt-table-checksum –host=master –user=root –password=xxx
–databases=mydb –replicate=test.checksums
“`
—
## MySQL复制原理深度解析
### MySQL复制工作流程
“`
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ 主库 │ │ 主库 │ │ 网络 │ │ 从库 │
│ 数据变更 │ ─── │ Binlog │ ─── │ 传输 │ ─── │ Relay │
│ (DML/DDL)│ │ 写入 │ │ Binlog │ │ Log │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
│
▼
┌─────────┐
│ SQL │
│ 线程 │
│ 重放 │
└─────────┘
“`
#### 关键组件详解
**1. Binlog(Binary Log)**
– 记录所有数据变更语句(DML、DDL)
– 三种格式:Statement、Row、Mixed
– 位置:`/var/lib/mysql/mysql-bin.00000x`
**2. IO线程**
– 从库连接主库
– 读取主库Binlog
– 写入Relay Log
**3. SQL线程**
– 读取Relay Log
– 执行SQL语句
– 应用数据变更
**4. Relay Log**
– 中继日志
– 格式与Binlog相同
– 位置:`/var/lib/mysql/relay-bin.00000x`
### GTID模式下的复制
**传统复制 vs GTID复制**
| 特性 | 传统复制 | GTID复制 |
|——|———|———-|
| 位置标记 | Binlog文件名+位置 | 全局事务ID |
| 故障恢复 | 需要手动查找位置 | 自动定位 |
| 多源复制 | 复杂 | 简化 |
| 主从切换 | 需要计算位置 | 自动同步 |
**GTID格式**:
“`
server_id:transaction_id
例如:3E11FA47-71CA-11E1-9E33-C80AA9429562:23
“`
**检查GTID状态**:
“`sql
— 查看GTID是否开启
SHOW VARIABLES LIKE ‘gtid_mode’;
— 查看已执行的GTID
SHOW MASTER STATUS;
— 查看从库GTID执行情况
SELECT * FROM performance_schema.replication_connection_status;
“`
—
## 监控指标体系设计
### 核心监控指标
#### 1. 复制状态指标
“`sql
— 从库状态查询
SHOW SLAVE STATUSG
**关键指标解读**:
– Slave_IO_Running: IO线程状态
– Yes: 正常
– No: 异常
– Connecting: 连接中
– Slave_SQL_Running: SQL线程状态
– Yes: 正常
– No: 异常
– Seconds_Behind_Master: 复制延迟(秒)
– 0: 无延迟
– NULL: 复制停止
– >0: 延迟秒数
– Last_IO_Error: IO线程最后错误
– Last_SQL_Error: SQL线程最后错误
– Read_Master_Log_Pos: 已读取Binlog位置
– Exec_Master_Log_Pos: 已执行Binlog位置
– Relay_Log_Pos: Relay Log当前位置
“`
#### 2. 性能指标
“`sql
— 主库吞吐量
SHOW STATUS LIKE ‘Com_%’;
— Binlog大小
SHOW BINARY LOGS;
— 从库Relay Log大小
SHOW RELAYLOG EVENTS;
“`
#### 3. 数据一致性指标
“`sql
— 表记录数对比
SELECT COUNT(*) FROM users;
— 校验和对比(使用pt-table-checksum)
“`
### 监控指标阈值设定
| 指标 | 正常值 | 警告值 | 严重值 |
|——|——–|——–|——–|
| Seconds_Behind_Master | 0-5秒 | 5-60秒 | >60秒 |
| Slave_IO_Running | Yes | – | No |
| Slave_SQL_Running | Yes | – | No |
| 复制延迟率 | 5% |
| Binlog空间使用 | 90% |
—
## 基础监控脚本实现
### 脚本1:复制状态检查脚本
创建文件:`/server/scripts/mysql-replication-monitor.sh`
“`bash
#!/bin/bash
###############################################################################
# MySQL主从复制监控脚本(生产级版本)
# 功能:检查复制状态、延迟、错误,发送告警通知
# 版本:v2.0
# 更新:2026-03-19
###############################################################################
# 配置参数
MYSQL_USER=”root”
MYSQL_PASS=”RootPass2026StrongSecure789XYZ”
MYSQL_PORT=3306
WEBHOOK_URL=”https://www.chencunli.com/webhook-feishu.php”
LOG_FILE=”/var/log/mysql-replication-monitor.log”
NOTICE_DIR=”/tmp/mysql-replication-notice”
MAX_DELAY=60 # 最大允许延迟(秒)
# 创建必要目录
mkdir -p “$NOTICE_DIR”
# 日志函数
log() {
echo “[$(date ‘+%Y-%m-%d %H:%M:%S’)] $1” | tee -a “$LOG_FILE”
}
# 发送告警通知
send_alert() {
local level=$1 # INFO, WARNING, CRITICAL
local title=$2
local msg=$3
# 发送飞书通知
curl -s “$WEBHOOK_URL”
-G –data-urlencode “msg=$msg”
–data-urlencode “title=${level} ${title}”
# 记录日志
log “[${level}] ${title}: ${msg}”
}
# 检查复制状态
check_replication() {
log “========== 开始检查复制状态 ==========”
# 获取从库状态
slave_status=$(mysql -u$MYSQL_USER -p$MYSQL_PASS -P $MYSQL_PORT -e “SHOW SLAVE STATUSG” 2>/dev/null)
if [ -z “$slave_status” ]; then
send_alert “CRITICAL” “? MySQL复制检查失败” “无法获取MySQL从库状态,请检查MySQL服务是否正常运行”
return 1
fi
# 提取关键指标
io_running=$(echo “$slave_status” | grep “Slave_IO_Running:” | awk ‘{print $2}’)
sql_running=$(echo “$slave_status” | grep “Slave_SQL_Running:” | awk ‘{print $2}’)
behind_master=$(echo “$slave_status” | grep “Seconds_Behind_Master:” | awk ‘{print $2}’)
last_io_error=$(echo “$slave_status” | grep “Last_IO_Error:” | cut -d: -f2-)
last_sql_error=$(echo “$slave_status” | grep “Last_SQL_Error:” | cut -d: -f2-)
master_host=$(echo “$slave_status” | grep “Master_Host:” | awk ‘{print $2}’)
log “主库: $master_host”
log “IO线程: $io_running”
log “SQL线程: $sql_running”
log “复制延迟: ${behind_master}秒”
# 检查IO线程
if [ “$io_running” != “Yes” ]; then
if [ ! -f “$NOTICE_DIR/io_thread” ]; then
send_alert “CRITICAL” “? MySQL复制IO线程异常”
“主库: $master_hostn错误: $last_io_errornn请立即检查网络连接和主库状态!”
touch “$NOTICE_DIR/io_thread”
fi
else
rm -f “$NOTICE_DIR/io_thread”
log “✅ IO线程正常”
fi
# 检查SQL线程
if [ “$sql_running” != “Yes” ]; then
if [ ! -f “$NOTICE_DIR/sql_thread” ]; then
send_alert “CRITICAL” “? MySQL复制SQL线程异常”
“主库: $master_hostn错误: $last_sql_errornn请检查从库数据完整性!”
touch “$NOTICE_DIR/sql_thread”
fi
else
rm -f “$NOTICE_DIR/sql_thread”
log “✅ SQL线程正常”
fi
# 检查复制延迟
if [ “$behind_master” != “NULL” ]; then
if [ $behind_master -gt $MAX_DELAY ]; then
if [ ! -f “$NOTICE_DIR/delay” ]; then
send_alert “WARNING” “⚠️ MySQL复制延迟告警”
“主库: $master_hostn当前延迟: ${behind_master}秒n阈值: ${MAX_DELAY}秒nn请检查从库性能和网络带宽!”
touch “$NOTICE_DIR/delay”
fi
else
rm -f “$NOTICE_DIR/delay”
log “✅ 复制延迟正常 (${behind_master}秒)”
fi
else
if [ ! -f “$NOTICE_DIR/replication_stopped” ]; then
send_alert “CRITICAL” “? MySQL复制已停止”
“主库: $master_hostnn复制已完全停止,请立即处理!”
touch “$NOTICE_DIR/replication_stopped”
fi
fi
log “========== 检查完成 ==========”
echo “”
}
# 执行检查
check_replication
# 返回复制状态供其他脚本调用
if [ “$io_running” = “Yes” ] && [ “$sql_running” = “Yes” ]; then
exit 0
else
exit 1
fi
“`
### 脚本2:主从数据一致性检查
创建文件:`/server/scripts/mysql-data-consistency-check.sh`
“`bash
#!/bin/bash
###############################################################################
# MySQL主从数据一致性检查脚本
# 功能:对比主从表记录数,检测数据不一致
###############################################################################
MYSQL_USER=”root”
MYSQL_PASS=”RootPass2026StrongSecure789XYZ”
MASTER_HOST=”localhost”
SLAVE_HOSTS=(“slave1.your-domain.com” “slave2.your-domain.com”)
DATABASES=(“mydb” “wordpress” “laravel_app”)
LOG_FILE=”/var/log/mysql-consistency-check.log”
log() {
echo “[$(date ‘+%Y-%m-%d %H:%M:%S’)] $1” | tee -a “$LOG_FILE”
}
check_table_consistency() {
local database=$1
local table=$2
local master_count=$3
for slave_host in “${SLAVE_HOSTS[@]}”; do
slave_count=$(mysql -h $slave_host -u$MYSQL_USER -p$MYSQL_PASS
$database -e “SELECT COUNT(*) FROM $table” -s -N 2>/dev/null)
if [ “$slave_count” != “$master_count” ]; then
log “⚠️ 数据不一致: $database.$table”
log ” 主库: $master_count, 从库($slave_host): $slave_count”
# 发送告警
curl -s “https://www.chencunli.com/webhook-feishu.php”
-G –data-urlencode “msg=数据不一致: $database.$tablen主库: $master_count, 从库: $slave_count”
–data-urlencode “title=? MySQL数据一致性告警”
fi
done
}
# 检查所有数据库的所有表
for db in “${DATABASES[@]}”; do
log “检查数据库: $db”
# 获取所有表名
tables=$(mysql -u$MYSQL_USER -p$MYSQL_PASS $db -e “SHOW TABLES” -s -N 2>/dev/null)
for table in $tables; do
# 获取主库记录数
master_count=$(mysql -u$MYSQL_USER -p$MYSQL_PASS $db -e “SELECT COUNT(*) FROM $table” -s -N 2>/dev/null)
# 检查从库一致性
check_table_consistency “$db” “$table” “$master_count”
done
done
log “数据一致性检查完成”
“`
### 脚本3:Binlog空间监控
创建文件:`/server/scripts/mysql-binlog-monitor.sh`
“`bash
#!/bin/bash
###############################################################################
# MySQL Binlog空间监控脚本
# 功能:监控Binlog磁盘使用,自动清理过期日志
###############################################################################
MYSQL_USER=”root”
MYSQL_PASS=”RootPass2026StrongSecure789XYZ”
BINLOG_DIR=”/var/lib/mysql”
MAX_DISK_USAGE=80 # 最大磁盘使用率(%)
BINLOG_EXPIRE_DAYS=7 # Binlog保留天数
# 检查磁盘使用率
disk_usage=$(df $BINLOG_DIR | tail -1 | awk ‘{print $5}’ | sed ‘s/%//’)
if [ $disk_usage -gt $MAX_DISK_USAGE ]; then
echo “⚠️ 磁盘使用率: ${disk_usage}%”
# 清理过期Binlog
mysql -u$MYSQL_USER -p$MYSQL_PASS -e “PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL $BINLOG_EXPIRE_DAYS DAY)”
echo “✅ 已清理${BINLOG_EXPIRE_DAYS}天前的Binlog”
# 发送通知
curl -s “https://www.chencunli.com/webhook-feishu.php”
-G –data-urlencode “msg=Binlog已自动清理n磁盘使用率: ${disk_usage}%”
–data-urlencode “title=? MySQL Binlog清理通知”
fi
# 显示当前Binlog列表
mysql -u$MYSQL_USER -p$MYSQL_PASS -e “SHOW BINARY LOGS”
“`
—
## 生产级监控方案
### 方案1:系统级监控(使用systemd)
创建服务文件:`/etc/systemd/system/mysql-replication-monitor.service`
“`ini
[Unit]
Description=MySQL Replication Monitoring Service
After=mysql.service
Requires=mysql.service
[Service]
Type=oneshot
ExecStart=/server/scripts/mysql-replication-monitor.sh
User=root
Group=root
[Install]
WantedBy=multi-user.target
“`
创建定时器:`/etc/systemd/system/mysql-replication-monitor.timer`
“`ini
[Unit]
Description=MySQL Replication Monitoring Timer
Requires=mysql-replication-monitor.service
[Timer]
OnBootSec=5min
OnUnitActiveSec=2min
AccuracySec=1s
[Install]
WantedBy=timers.target
“`
启用服务:
“`bash
# 重载systemd配置
systemctl daemon-reload
# 启动定时器
systemctl enable mysql-replication-monitor.timer
systemctl start mysql-replication-monitor.timer
# 查看状态
systemctl status mysql-replication-monitor.timer
# 查看日志
journalctl -u mysql-replication-monitor.service -f
“`
### 方案2:Supervisor进程管理
安装Supervisor:
“`bash
yum install -y supervisor
“`
配置文件:`/etc/supervisord.d/mysql-replication-monitor.ini`
“`ini
[program:mysql-replication-monitor]
command=/server/scripts/mysql-replication-monitor.sh
directory=/server/scripts
user=root
autostart=true
autorestart=true
startsecs=10
startretries=3
stderr_logfile=/var/log/mysql-replication-monitor.err.log
stdout_logfile=/var/log/mysql-replication-monitor.out.log
“`
启动服务:
“`bash
# 重载配置
supervisorctl reread
supervisorctl update
# 启动服务
supervisorctl start mysql-replication-monitor
# 查看状态
supervisorctl status mysql-replication-monitor
“`
—
## 可视化监控面板
### PHP监控面板实现
创建文件:`/server/www-htdocs/wordpress/mysql-monitor.php`
“`php
[
‘host’ => ‘localhost’,
‘user’ => ‘root’,
‘pass’ => ‘RootPass2026StrongSecure789XYZ’,
‘name’ => ‘主库1’,
],
‘slave1’ => [
‘host’ => ‘slave1.your-domain.com’,
‘user’ => ‘root’,
‘pass’ => ‘RootPass2026StrongSecure789XYZ’,
‘name’ => ‘从库1’,
],
‘slave2’ => [
‘host’ => ‘slave2.your-domain.com’,
‘user’ => ‘root’,
‘pass’ => ‘RootPass2026StrongSecure789XYZ’,
‘name’ => ‘从库2’,
],
];
/**
* 获取MySQL从库状态
*/
function getSlaveStatus($config) {
try {
$mysqli = new mysqli(
$config[‘host’],
$config[‘user’],
$config[‘pass’]
);
if ($mysqli->connect_error) {
throw new Exception($mysqli->connect_error);
}
$result = $mysqli->query(“SHOW SLAVE STATUS”);
if (!$result) {
throw new Exception($mysqli->error);
}
$status = $result->fetch_assoc();
// 添加额外信息
$status[‘server_name’] = $config[‘name’];
$status[‘host’] = $config[‘host’];
$mysqli->close();
return $status;
} catch (Exception $e) {
return [
‘error’ => $e->getMessage(),
‘server_name’ => $config[‘name’],
‘host’ => $config[‘host’],
];
}
}
/**
* 获取主库Binlog信息
*/
function getMasterStatus($config) {
try {
$mysqli = new mysqli(
$config[‘host’],
$config[‘user’],
$config[‘pass’]
);
$result = $mysqli->query(“SHOW MASTER STATUS”);
if (!$result) {
throw new Exception($mysqli->error);
}
$status = $result->fetch_assoc();
$status[‘server_name’] = $config[‘name’];
// 获取Binlog列表
$binlogs = $mysqli->query(“SHOW BINARY LOGS”);
$status[‘binlog_count’] = $binlogs->num_rows;
$mysqli->close();
return $status;
} catch (Exception $e) {
return [
‘error’ => $e->getMessage(),
‘server_name’ => $config[‘name’],
];
}
}
/**
* 获取系统信息
*/
function getSystemInfo() {
// CPU使用率
$load = sys_getloadavg();
$cpu_usage = $load[0] * 100;
// 内存使用率
$meminfo = file_get_contents(‘/proc/meminfo’);
preg_match(‘/MemTotal:s+(d+)/’, $meminfo, $total);
preg_match(‘/MemAvailable:s+(d+)/’, $meminfo, $avail);
$mem_usage = (($total[1] – $avail[1]) / $total[1]) * 100;
// 磁盘使用率
$df = disk_free_space(‘/’);
$dt = disk_total_space(‘/’);
$disk_usage = (($dt – $df) / $dt) * 100;
return [
‘cpu’ => round($cpu_usage, 2),
‘memory’ => round($mem_usage, 2),
‘disk’ => round($disk_usage, 2),
];
}
// 获取所有从库状态
$slave_statuses = [];
foreach ($mysql_configs as $key => $config) {
if ($key !== ‘master’) {
$slave_statuses[] = getSlaveStatus($config);
}
}
// 获取主库状态
$master_status = getMasterStatus($mysql_configs[‘master’]);
// 获取系统信息
$system_info = getSystemInfo();
?>
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
font-family: -apple-system, BlinkMacSystemFont, “Segoe UI”, Roboto, “Helvetica Neue”, Arial, sans-serif;
background: #f5f5f5;
padding: 20px;
}
.container {
max-width: 1400px;
margin: 0 auto;
}
.header {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
padding: 30px;
border-radius: 10px;
margin-bottom: 30px;
box-shadow: 0 4px 6px rgba(0,0,0,0.1);
}
.header h1 {
font-size: 28px;
margin-bottom: 10px;
}
.header p {
opacity: 0.9;
}
.dashboard {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
gap: 20px;
margin-bottom: 30px;
}
.card {
background: white;
border-radius: 10px;
padding: 20px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}
.card h3 {
color: #333;
margin-bottom: 15px;
font-size: 16px;
}
.metric {
display: flex;
justify-content: space-between;
align-items: center;
padding: 10px 0;
border-bottom: 1px solid #eee;
}
.metric:last-child {
border-bottom: none;
}
.metric-label {
color: #666;
font-size: 14px;
}
.metric-value {
font-weight: bold;
font-size: 16px;
}
.status-ok { color: #52c41a; }
.status-warning { color: #faad14; }
.status-error { color: #f5222d; }
.progress-bar {
width: 100%;
height: 8px;
background: #f0f0f0;
border-radius: 4px;
overflow: hidden;
margin-top: 5px;
}
.progress-fill {
height: 100%;
background: linear-gradient(90deg, #52c41a 0%, #73d13d 100%);
transition: width 0.3s ease;
}
.progress-fill.warning {
background: linear-gradient(90deg, #faad14 0%, #ffc53d 100%);
}
.progress-fill.error {
background: linear-gradient(90deg, #f5222d 0%, #ff4d4f 100%);
}
.refresh-btn {
background: #1890ff;
color: white;
border: none;
padding: 10px 20px;
border-radius: 5px;
cursor: pointer;
font-size: 14px;
margin-bottom: 20px;
}
.refresh-btn:hover {
background: #40a9ff;
}
.error-box {
background: #fff2f0;
border: 1px solid #ffccc7;
padding: 15px;
border-radius: 5px;
margin-top: 10px;
color: #f5222d;
font-size: 14px;
}
.timestamp {
text-align: right;
color: #999;
font-size: 12px;
margin-top: 20px;
}
? MySQL主从复制监控面板
实时监控MySQL复制状态、延迟和性能指标
? 系统资源
%
%
%