普罗米修斯

Prometheus简介

Prometheus（普罗米修斯）是一个开源的系统监控和告警工具包，最初由SoundCloud开发，现在已成为云原生计算基金会（CNCF）的第二个项目（仅次于Kubernetes）。Prometheus以其强大的多维度数据模型、灵活的查询语言、高效的时间序列数据库和现代化的告警处理而闻名。

核心特性

多维数据模型：通过指标名称和键值对（标签）来标识时间序列数据
强大的查询语言PromQL：支持对收集的数据进行复杂查询和分析
不依赖分布式存储：单个服务器节点是自治的
通过HTTP拉取模式收集时间序列数据
通过中间网关支持推送时间序列数据
通过服务发现或静态配置发现目标
支持多种图形和仪表盘，特别是与Grafana集成

架构组件

Prometheus架构图

Prometheus生态系统由多个组件组成：

Prometheus Server：核心组件，负责数据采集和存储
Exporters：为特定服务暴露指标，如Node Exporter、MySQL Exporter等
Pushgateway：支持短期作业的指标推送
Alertmanager：处理告警
Web UI：内置的表达式浏览器和图形界面

安装Prometheus

使用Docker安装

# 拉取Prometheus镜像
docker pull prom/prometheus

# 创建配置文件目录
mkdir -p /etc/prometheus

# 创建基本配置文件
cat > /etc/prometheus/prometheus.yml << EOF
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
EOF

# 启动Prometheus容器
docker run -d \
  --name prometheus \
  -p 9090:9090 \
  -v /etc/prometheus:/etc/prometheus \
  prom/prometheus \
  --config.file=/etc/prometheus/prometheus.yml

使用二进制文件安装

# 下载最新版本
wget https://github.com/prometheus/prometheus/releases/download/v2.37.0/prometheus-2.37.0.linux-amd64.tar.gz

# 解压文件
tar -xvf prometheus-2.37.0.linux-amd64.tar.gz
cd prometheus-2.37.0.linux-amd64/

# 启动Prometheus
./prometheus --config.file=prometheus.yml

基本概念

1. 数据模型

Prometheus存储的所有数据都是时间序列，由以下要素标识：

指标名称：如http_requests_total
标签：键值对，如{method="GET", endpoint="/api"}
时间戳：精确到毫秒的时间点
值：64位浮点数

2. 指标类型

Counter：只增不减的计数器，如请求总数
Gauge：可增可减的仪表盘，如内存使用量
Histogram：对观测值进行采样并分布到可配置的桶中
Summary：类似Histogram，但提供分位数计算

3. PromQL查询语言

PromQL是Prometheus的查询语言，用于选择和聚合时间序列数据：

# 查询HTTP请求总数
http_requests_total

# 按method标签过滤
http_requests_total{method="GET"}

# 计算5分钟内的请求率
rate(http_requests_total[5m])

# 聚合所有实例的CPU使用率
sum(rate(node_cpu_seconds_total{mode!="idle"}[1m])) by (instance)

监控实例：Node Exporter

Node Exporter是Prometheus官方提供的用于监控Linux系统指标的导出器。

安装Node Exporter

# 使用Docker
docker run -d \
  --name node-exporter \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  prom/node-exporter \
  --path.rootfs=/host

# 或使用二进制文件
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar -xvf node_exporter-1.3.1.linux-amd64.tar.gz
cd node_exporter-1.3.1.linux-amd64/
./node_exporter

配置Prometheus采集Node Exporter指标

在prometheus.yml中添加：

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

告警配置

创建告警规则

在prometheus.yml同级目录创建alert.rules.yml：

groups:
- name: example
  rules:
  - alert: HighCPULoad
    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU load (instance {{ $labels.instance }})"
      description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

在prometheus.yml中引用规则文件

1 2	rule_files: - "alert.rules.yml"

与Grafana集成

Grafana是可视化监控数据的最佳选择，与Prometheus配合使用效果极佳：

安装Grafana：

docker run -d \
  --name grafana \
  -p 3000:3000 \
  grafana/grafana

添加Prometheus数据源：
- 访问http://localhost:3000（默认用户名/密码：admin/admin）
- 配置 -> 数据源 -> 添加数据源 -> 选择Prometheus
- URL设置为http://prometheus:9090（如果在同一网络）或http://localhost:9090
导入仪表盘：
- Grafana官方提供了许多预配置的仪表盘，如Node Exporter仪表盘ID: 1860

最佳实践

合理设置采集间隔：根据监控需求和资源消耗平衡
使用标签进行多维度分析：充分利用Prometheus的标签系统
设置适当的数据保留期：避免存储空间过度消耗
实施高可用性：关键环境中部署Prometheus的高可用方案
定期备份数据：使用prometheus的--storage.tsdb.path选项指定数据存储路径

常见问题排查

无法抓取目标：检查网络连接、防火墙设置和目标服务状态
数据不完整：检查采集间隔和数据保留策略
高内存使用：调整查询复杂度和并发查询数量
告警未触发：验证告警规则表达式和阈值设置