欢迎光临
我们一直在努力

数据库性能优化分析案例—解决SQL语句过度消耗CPU问题-数据库专栏,SQL Server

建站超值云服务器,限时71元/月

问题描述:
10月25日上午滨州网通的工程师报告oss应用系统运行缓慢,具体操作是通过oss系统查询话单,很长时间才能返回结果,严重影响了客户的正常使用。

问题处理:
1.登陆数据库主机,用sar命令看到idle的值持续为0,cpu的资源已经耗尽:

bz_db1# sar 2 4

sunos kest 5.8 generic_108528-19 sun4u    10/26/04

10:56:46    %usr    %sys    %wio   %idle
10:56:48       1       4      95       0
10:56:50       1       5      94       0
10:56:52       0       6      93       0
10:56:54       1       6      93       0

average        1       5      94      0

2.使用top命令看到有两个明显占用cpu利用率过高的进程,以下是top命令的结果:

bz_db1# top

last pid:  1664;load averages:  3.26,  3.24,  3.69                       
159 processes: 152 sleeping, 2 running, 2 zombie, 1 stopped, 2 on cpu
cpu states:  1.5% idle, 72.5% user, 17.9% kernel,  8.0% iowait,  0.0% swap
memory: 2.0g real, 233m free, 2.0g swap in use, 3.4g swap free

   pid username thr pr nce  size   res state   time flts    cpu command
 27420 oracle     1 10   0  1.3g  1.2g cpu01  22.9h    2 31.94% oracle
 27418 oracle     1 10   0  1.3g  1.2g run    23.0h    6 26.86% oracle
  5943 oracle     1 59   0  1.3g  1.2g sleep  25:26   37  4.92% oracle
  6295 oracle     1 55   0  1.3g  1.2g run    25:14   74  4.90% oracle
  7778 oracle     1 43   0  1.3g  1.2g sleep  11:43  110  4.86% oracle
 13270 oracle     1 59   0  1.3g  1.2g sleep 210.6h    0  0.96% oracle
 13056 oracle     1 48   0  1.3g  1.2g sleep 303:30    0  0.37% oracle
 10653 root       1 58   0 2560k 1624k cpu00   0:00    0  0.32% top
 18827 oracle     1 58   0  1.3g  1.2g sleep  18.4h    0  0.31% oracle
 12748 oracle   258 58   0  1.3g  1.2g sleep 555:14    0  0.21% oracle
 10634 oracle     1 59   0  1.3g  1.2g sleep   0:01    0  0.21% oracle
 28458 oracle     1 58   0  1.3g  1.2g sleep 535:02    0  0.18% oracle
 13075 oracle     1 59   0  1.3g  1.2g sleep 326:33    0  0.15% oracle
 13173 oracle     1 58   0  1.3g  1.2g sleep 593:07    0  0.13% oracle
  4927 oracle     1 59   0  1.3g  1.2g sleep  33.4h    0  0.11% oracle

可以看到这两个进程号分别是27420和27418.

3.捕获占用cpu利用率过高的sql语句:

以下用到了我总结的sql语句:

sql>set line 240
sql>set verify off
sql>column sid format 999
sql>column pid format 999
sql>column s_# format 999
sql>column username format a9 heading “ora user”
sql>column program  format a29
sql>column sql format a60
sql>column osname format a9 heading “os user”
sql>select p.pid pid,s.sid sid,p.spid spid,s.username username,
s.osuser osname,p.serial# s_#,p.terminal,p.program  program,
p.background,s.status,rtrim(substr(a.sql_text, 1, 80))  sql
from v$process p, v$session s,v$sqlarea a where p.addr = s.paddr
and s.sql_address = a.address (+)  and p.spid like %&1%;

enter value for 1: 27420(注意这里应输入占用cpu最高的进程对应的pid)

得到以下sql语句:

select nvl(sum(localcharge),0),nvl(sum(usage),0) from localusage where to_char(endtime,yyyymmdd)=20041016
and localcharge>0 and caller like 0543886%;

27418进程对应的sql语句如下:
select nvl(sum(localcharge),0) from localusage where to_char(endtime,yyyymmdd)=20041016 and caller like 0543888%;

4.使用相关用户连接到数据库,检查其执行计划:
sql>connect wacos/oss
connected.

sql>@?/rdbms/admin/utlxplan.sql
table created.

sql>set autotrace on

sql>set timing on

sql>select nvl(sum(localcharge),0),nvl(sum(usage),0) from localusage where to_char(endtime,yyyymmdd)=20041016
 and localcharge>0 and caller like 0543886%;

nvl(sum(localcharge),0) nvl(sum(usage),0)
———————– —————–
                      0                 0

elapsed: 00:02:56.37

execution plan
———————————————————-
   0      select statement optimizer=choose (cost=13435 card=1 bytes=5
          3)

   1    0   sort (aggregate)
   2    1     partition range (all)
   3    2       table access (full) of localusage (cost=13435 card=1
          81 bytes=9593)

statistics
———————————————————-
        258  recursive calls
          0  db block gets
      88739  consistent gets
      15705  physical reads
          0  redo size
        580  bytes sent via sql*net to client
        651  bytes received via sql*net from client
          2  sql*net roundtrips to/from client
          8  sorts (memory)
          0  sorts (disk)
          1  rows processed

发现对localusage表做了全表扫描,什么记录也没有返回居然用了2分多钟。

sql> select nvl(sum(localcharge),0) from localusage where to_char(endtime,yyyymmdd)=20040816 and caller like 0543888%;

nvl(sum(localcharge),0)
———————–
                   27.6

elapsed: 00:03:56.46

execution plan
———————————————————-
   0      select statement optimizer=choose (cost=13435 card=1 bytes=4
          0)

   1    0   sort (aggregate)
   2    1     partition range (all)
   3    2       table access (full) of localusage (cost=13435 card=3
          615 bytes=144600)

statistics
———————————————————-
          0  recursive calls
          0  db block gets
      88588  consistent gets
      15615  physical reads
          0  redo size
        507  bytes sent via sql*net to client
        651  bytes received via sql*net from client
          2  sql*net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

这个sql语句有结果返回,发现也是对localusage表做了全表扫描,但速度也很慢,用了3分多钟。

sql> select count(*) from localusage;

  count(*)
———-
   5793776

该表有579万多条记录,数据量很大,全表扫描已经不再适合。

5.检查该表的类型:

sql> select index_name, table_name, status, partitioned from user_indexes where table_name=localusage;

index_name                     table_name                     status   par
—————————— —————————— ——– —
i_localusage_sid               localusage                     n/a      yes
ui_localusage_st_seq           localusage                     n/a      yes

sql> select index_name,table_name,locality from user_part_indexes where table_name=localusage;

index_name                     table_name                     locali
—————————— —————————— ——
i_localusage_sid               localusage                     local
ui_localusage_st_seq           localusage                     local

发现该表是分区表,并在serviceid,startime和cdrsequence列上建立了分区索引,索引类型是local索引。

6.查看分区索引的索引键值:

sql> select index_name,column_name,index_owner from dba_ind_columns where table_name=localusage;

index_name           column_name          index_owner
——————– ——————– ——————————
i_localusage_sid     serviceid            wacos
ui_localusage_st_seq starttime            wacos
ui_localusage_st_seq cdrsequence          wacos

发现在endtime和caller列上都没有建立索引,这也是导致sql语句做全表扫描的最终原因。

7.决定创建新的分区索引以消除全表扫描:

(1).首先查看localusage表分区情况:

sql> select partition_name,tablespace_name from user_tab_partitions where table_name=localusage;

partition_name                 tablespace_name
—————————— ——————————
localusage_200312              wacos
localusage_200401              wacos
localusage_200402              wacos
localusage_200404              wacos
localusage_200405              wacos
localusage_200406              wacos
localusage_200407              wacos
localusage_200409              wacos
localusage_200410              wacos
localusage_200411              wacos
localusage_200403              wacos
localusage_200408              wacos
localusage_200412              wacos

13 rows selected.

(2).在caller列上创建local分区索引:
sql>set timing on
sql>create index i_localusage_caller on localusage(caller)
local
(      
        partition localusage_200312,
 partition localusage_200401,
 partition localusage_200402,
 partition localusage_200404,
 partition localusage_200405,
 partition localusage_200406,
 partition localusage_200407,
 partition localusage_200409,
 partition localusage_200410,
 partition localusage_200411,
 partition localusage_200403,
 partition localusage_200408,
 partition localusage_200412
)
tablespace wacos
storage(
 initial 6553600
 next 6553600
 maxextents unlimited
 pctincrease 0)
 pctfree 5
 nologging;

index created.

elapsed: 00:06:27.90  (由于数据量比较大,耗时6分钟)

8.再次查看执行计划:
sql>select nvl(sum(localcharge),0),nvl(sum(usage),0) from localusage where to_char(endtime,yyyymmdd)=20041016
and localcharge>0  and caller like 0543886%;

nvl(sum(localcharge),0) nvl(sum(usage),0)
———————– —————–
                      0                 0

elapsed: 00:00:03.00

execution plan
———————————————————-
   0      select statement optimizer=choose (cost=22 card=1 bytes=53)
   1    0   sort (aggregate)
   2    1     partition range (all)
   3    2       table access (by local index rowid) of localusage (cost=22 card=181 bytes=9593)
   4    3         index (range scan) of i_localusage_caller (non-unique) (cost=14 card=65063)

statistics
———————————————————-
          0  recursive calls
          0  db block gets
      16813  consistent gets
        569  physical reads
          0  redo size
        580  bytes sent via sql*net to client
        651  bytes received via sql*net from client
          2  sql*net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

这次走了索引后速度明显快多了,用了3秒钟就返回了结果。

sql>select nvl(sum(localcharge),0) from localusage where to_char(endtime,yyyymmdd)=20040816 and caller like 0543888%;

nvl(sum(localcharge),0)
———————–
                   27.6

elapsed: 00:00:24.73

execution plan
———————————————————-
   0      select statement optimizer=choose (cost=22 card=1 bytes=40)
   1    0   sort (aggregate)
   2    1     partition range (all)
   3    2       table access (by local index rowid) of localusage (cost=22 card=3615 bytes=144600)
   4    3         index (range scan) of i_localusage_caller (non-unique) (cost=14 card=65063)

statistics
———————————————————-
          0  recursive calls
          0  db block gets
     129336  consistent gets
       7241  physical reads
          0  redo size
        507  bytes sent via sql*net to client
        651  bytes received via sql*net from client
          2  sql*net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

这个sql语句走了索引,用了24秒钟返回结果,性能明显提高了很多。

赞(0)
版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com 特别注意:本站所有转载文章言论不代表本站观点! 本站所提供的图片等素材,版权归原作者所有,如需使用,请与原作者联系。未经允许不得转载:IDC资讯中心 » 数据库性能优化分析案例—解决SQL语句过度消耗CPU问题-数据库专栏,SQL Server
分享到: 更多 (0)