Prometheus 查询语言 PromQL 的 CPU 使用率计算方法(Calculation method of CPU utilization of Prometheus query language promql)

参考文档:https://blog.csdn.net/qq_35753140/article/details/105121525

————–cpu使用率————–

100 * (1 – sum by (instance)(increase(node_cpu_seconds_total{mode=”idle”}[5m])) / sum by (instance)(increase(node_cpu_seconds_total[5m])))

会将所有主机的CPU使用率显示在一个panel中

以下是对参考文档的一些备注说明:

increase(node_cpu_seconds_total{cpu=”0″,mode=”idle”}[5m])

首先 node_cpu_seconds_total 表示的是 系统自开机以来,cpu的总时间 单位 秒{cpu=”0″} 表示的是第一颗CPU{mode=”idle”} 表示的是CPU处于idle状态[5m] 表示的是5分钟前的数值

increase 表示增量

13:00 13:45 13:50启动 13000 此刻13567

[5m] 就表示 13:45

increase 就表示 13:45 ~ 13:50 之间的增量 即 567

cpu0 5分钟内处于空闲状态的时间占比increase(node_cpu_seconds_total{cpu=”0″,mode=”idle”}[5m]) / increase(node_cpu_seconds_total{cpu=”0″}[5m])

首先限定 CPU0然后限定 5分钟

针对CPU0 在这5分钟内,处于idle状态的增量为 20 在这5分钟内,总增量(user + sys + idle + …) 500

则这5分钟内,百分比为 20 / 500 = 4%

一个服务器可能有4颗CPU上面计算了1个CPU的

sum (increase(node_cpu_seconds_total{mode=”idle”}[5m])) / sum (increase(node_cpu_seconds_total[5m]))

increase(cpu0 idle [5m]) 5分钟的增量 20increase(cpu1 idle [5m]) 5分钟的增量 30increase(cpu2 idle [5m]) 5分钟的增量 40increase(cpu3 idle [5m]) 5分钟的增量 70

sum() 计算总和 20 + 30 + 40 + 70 = 160

increase(cpu0 [5m]) 5分钟的增量 1000increase(cpu1 [5m]) 5分钟的增量 1200increase(cpu2 [5m]) 5分钟的增量 1300increase(cpu3 [5m]) 5分钟的增量 1500

sum() 计算总和 1000 + 1200 + 1300 + 1500 = 5000

一台服务器 5分钟内 处于idle状态的时间 占 总CPU时间的百分比

160 / 5000 = 3.2% (0.032)

如果要监控多台主机

如果我们写 node_cpu_seconds_total 则显示以下所有信息

increase(cpu0 instance=”localhost:8080″ [5m]) 5分钟的增量 1000\increase(cpu1 instance=”localhost:8080″ [5m]) 5分钟的增量 1200 |increase(cpu2 instance=”localhost:8080″ [5m]) 5分钟的增量 1300 | 这些分为一组increase(cpu3 instance=”localhost:8080″ [5m]) 5分钟的增量 1500/

increase(cpu0 instance=”localhost:8081″ [5m]) 5分钟的增量 1000\increase(cpu1 instance=”localhost:8081″ [5m]) 5分钟的增量 1200 |increase(cpu2 instance=”localhost:8081″ [5m]) 5分钟的增量 1300 | 这些分为一组increase(cpu3 instance=”localhost:8081″ [5m]) 5分钟的增量 1500/

按主机分组 进行求和

sum by (instance) ()

————————

参考文档:https://blog.csdn.net/qq_35753140/article/details/105121525

————–CPU utilization————–

100 * (1 – sum by (instance)(increase(node_cpu_seconds_total{mode=”idle”}[5m])) / sum by (instance)(increase(node_cpu_seconds_total[5m])))

The CPU utilization of all hosts is displayed in one panel

< strong > here are some notes on the reference document: < / strong >

increase(node_cpu_seconds_total{cpu=”0″,mode=”idle”}[5m])

First, node_ cpu_ seconds_ Total represents the total time of CPU since the system was started. The unit of second {CPU = “0”} represents the first CPU {mode = “idle”} represents that the CPU is in idle state [5m] represents the value 5 minutes ago

Increase means increment

13: 00 13:45 13:50 start 13000 13567

[5m] means 13:45

Increase means the increment between 13:45 and 13:50, i.e. 567

cpu0 5分钟内处于空闲状态的时间占比increase(node_cpu_seconds_total{cpu=”0″,mode=”idle”}[5m]) / increase(node_cpu_seconds_total{cpu=”0″}[5m])

First limit cpu0 and then 5 minutes

For cpu0 In these 5 minutes, the increment in idle state is 20 In these 5 minutes, the total increment (user + sys + idle +…) is 500

Then within these 5 minutes, the percentage is 20 / 500 = 4%

A server may have four CPUs, and the cost of one CPU is calculated

sum (increase(node_cpu_seconds_total{mode=”idle”}[5m])) / sum (increase(node_cpu_seconds_total[5m]))

Increase (cpu0 idle [5m]) 5-minute increment 20 increase (CPU1 idle [5m]) 5-minute increment 30 increase (CPU2 idle [5m]) 5-minute increment 40 increase (cpu3 idle [5m]) 5-minute increment 70

Sum() calculates the sum of 20 + 30 + 40 + 70 = 160

Increase (cpu0 [5m]) 5-minute increment 1000 increase (CPU1 [5m]) 5-minute increment 1200 increase (CPU2 [5m]) 5-minute increment 1300increase (cpu3 [5m]) 5-minute increment 1500

Sum() calculates the sum 1000 + 1200 + 1300 + 1500 = 5000

Percentage of idle state time of a server in total CPU time within 5 minutes

160 / 5000 = 3.2% (0.032)

If you want to monitor multiple hosts

If we write node_ cpu_ seconds_ Total displays all of the following information

increase(cpu0 instance=”localhost:8080″ [5m]) 5分钟的增量 1000\increase(cpu1 instance=”localhost:8080″ [5m]) 5分钟的增量 1200 |increase(cpu2 instance=”localhost:8080″ [5m]) 5分钟的增量 1300 | 这些分为一组increase(cpu3 instance=”localhost:8080″ [5m]) 5分钟的增量 1500/

increase(cpu0 instance=”localhost:8081″ [5m]) 5分钟的增量 1000\increase(cpu1 instance=”localhost:8081″ [5m]) 5分钟的增量 1200 |increase(cpu2 instance=”localhost:8081″ [5m]) 5分钟的增量 1300 | 这些分为一组increase(cpu3 instance=”localhost:8081″ [5m]) 5分钟的增量 1500/

Sum by host group

sum by (instance) ()