Home > 5月 27th, 2013

2013.05.27

OpenStack構築後にCPU負荷が異様に高まったのでその対応

OpenStackは以下を参考にインストール。
http://docs.openstack.org/grizzly/basic-install/apt/content/

インストールバージョンはgrizzly。
OSはUbuntu12.04。
コントローラノード1台、ネットワークノード1台、コンピュータノード2台の構成。

で、インストール後管理画面を開くと異様に遅いOpenStackが。

とりあえずvmstatで状況を確認。

hoge@fuga:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  1  34888  73116  35168 881976    0    0  2151   139   83    6  6  1 51 41
 0  1  34888  68032  35200 887116    0    0  5068    76  638 1142  4  1 48 47
 1  0  34888  75720  35144 879248    0    0  6944    48  580 1052  3  1 49 48
 0  1  34888  72124  35144 882692    0    0  4816     0  316  653  1  1 50 49
 2  0  34888  67412  35184 887384    0    0  4736   828  576 1058  3  1 48 48
 0  1  34888  74232  35168 880540    0    0  5580    64 1149 2193  7  1 45 46
 0  1  34888  70512  35192 884328    0    0  3712    80  885 1625  5  1 45 49
 0  1  34888  65428  35200 889336    0    0  5008    24  473  899  2  1 45 53
 0  1  34888  73240  35120 881360    0    0  5872    28  458  847  2  1 51 47
 0  1  34888  68776  35160 885860    0    0  4480   104  672 1174  4  1 40 56
 0  1  34888  64808  35176 889948    0    0  4016    52  547 1064  3  1 49 48
 0  1  34888  72124  35148 882600    0    0  5120     0  300  638  1  0 50 49
 0  1  34888  66420  35148 888280    0    0  5664     0  310  627  1  1 50 49
 0  1  34888  75100  35100 879676    0    0  3760    52 1135 2110  9  1 42 49
 0  1  34888  71504  35180 883196    0    0  3620  1052 1161 2023  6  2 43 50

CPUのwaitが高過ぎますね。

hoge@fuga:~$ top

top - 14:37:50 up 2 days, 21:56,  1 user,  load average: 1.67, 1.54, 1.46
Tasks: 138 total,   2 running, 136 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.8%us,  1.0%sy,  0.0%ni, 47.8%id, 47.3%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   2050020k total,  1977740k used,    72280k free,    23848k buffers
Swap:  2097148k total,    34888k used,  2062260k free,   894848k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1436 mysql     20   0 1310m 191m 3928 S    2  9.6 100:21.52 mysqld
 1996 rabbitmq  20   0 1139m 106m 1844 S    2  5.3  97:33.42 beam.smp
 1341 quantum   20   0  186m  39m 3748 S    1  2.0 100:18.12 python
 1345 cinder    20   0 91428  19m 3324 S    1  1.0  22:43.32 cinder-volume
 1362 nova      20   0  213m  49m 3712 S    0  2.5   1:44.69 nova-consoleaut
 1397 cinder    20   0  197m  35m 3672 S    0  1.8   2:20.62 cinder-schedule
    1 root      20   0 24584 2156 1132 S    0  0.1   4:37.98 init

load averageも高い。動いてるのは強いて言えばmysqldぽい。
beam.smpは見かけない子なのでググってみるとRabbitMQの関係の子らしい。特に原因には関係ないぽい。

ひたすらwaitなのでIOも確認してみる。

hoge@fuga:~$ sudo iotop
Total DISK READ:       4.95 M/s | Total DISK WRITE:     144.30 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
22748 be/4 mysql       4.78 M/s    0.00 B/s  0.00 % 96.90 % mysqld
 2227 be/4 rabbitmq    0.00 B/s    7.80 K/s  0.00 % 17.65 % beam.smp -W w -K true -A30 -P 1048576 --~ir "/var/lib/rabbitmq/mnesia/rabbit@fuga"
27136 be/4 www-data    0.00 B/s    0.00 B/s  0.00 %  0.00 % apache2 -k start
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]

mysqldが犯人で確定。

mysqlが何してるのか確認してみる。

mysql> show processlist;
+------+----------+-----------+----------+---------+------+--------------+------------------------------------------------------------------------------------------------------+
| Id   | User     | Host      | db       | Command | Time | State        | Info                                                                                                 |
+------+----------+-----------+----------+---------+------+--------------+------------------------------------------------------------------------------------------------------+
| 1923 | quantum  | localhost | quantum  | Sleep   |    1 |              | NULL                                                                                                 |
| 1924 | nova     | localhost | nova     | Sleep   |    1 |              | NULL                                                                                                 |
| 1925 | nova     | localhost | nova     | Sleep   |    1 |              | NULL                                                                                                 |
| 1926 | nova     | localhost | nova     | Sleep   |    6 |              | NULL                                                                                                 |
| 1927 | cinder   | localhost | cinder   | Sleep   |    3 |              | NULL                                                                                                 |
| 1928 | cinder   | localhost | cinder   | Sleep   |    7 |              | NULL                                                                                                 |
| 1929 | nova     | localhost | nova     | Sleep   |   11 |              | NULL                                                                                                 |
| 1931 | nova     | localhost | nova     | Sleep   |    7 |              | NULL                                                                                                 |
| 1932 | nova     | localhost | nova     | Sleep   |    7 |              | NULL                                                                                                 |
| 1933 | quantum  | localhost | quantum  | Sleep   |    1 |              | NULL                                                                                                 |
| 1935 | quantum  | localhost | quantum  | Sleep   |    1 |              | NULL                                                                                                 |
| 1936 | quantum  | localhost | quantum  | Sleep   |    0 |              | NULL                                                                                                 |
| 1938 | nova     | localhost | nova     | Sleep   |   16 |              | NULL                                                                                                 |
| 1939 | nova     | localhost | nova     | Sleep   |    6 |              | NULL                                                                                                 |
| 1946 | quantum  | localhost | quantum  | Sleep   |    2 |              | NULL                                                                                                 |
| 1949 | nova     | localhost | nova     | Sleep   |    1 |              | NULL                                                                                                 |
| 1950 | keystone | localhost | keystone | Query   |  238 | Sending data | SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid |
| 1951 | root     | localhost | NULL     | Query   |    0 | NULL         | show processlist                                                                                     |
+------+----------+-----------+----------+---------+------+--------------+------------------------------------------------------------------------------------------------------+

keystoneがデータを送り続けてるらしい。

keystoneのログを見てみる。

root@fuga:/var/log/keystone# tail -f keystone.log
2013-05-27 14:50:00     INFO [sqlalchemy.engine.base.Engine] SELECT role.id AS role_id, role.name AS role_name, role.extra AS role_extra_
FROM role_
WHERE role.id = %s_
 LIMIT %s
2013-05-27 14:50:00     INFO [sqlalchemy.engine.base.Engine] ('bc1d3f04415a4feb89fee7ad327a9f6b', 1)
2013-05-27 14:50:00    DEBUG [keystone.policy.backends.rules] enforce admin_required: {'tenant_id': u'b1493d6491b54dcabb33792632df4945', 'user_id': u'6b06407ab6be40c7a90185774d082921', u'roles': [u'_member_', u'admin']}
2013-05-27 14:50:00     INFO [sqlalchemy.engine.base.Engine] SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id_
FROM token_
WHERE token.expires > %s AND token.valid = %s
2013-05-27 14:50:00     INFO [sqlalchemy.engine.base.Engine] (datetime.datetime(2013, 5, 27, 5, 50, 0, 186563), 0)

確かにkeystoneが投げてるクエリっぽい。

mysqlに入って実行してみる。

mysql> SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id  FROM token  WHERE token.expires > '2013-05-27 06:11:45' and token.valid = 0;
Empty set (4 min 44.50 sec)

5分近くかかってる。こいつがつまってそう。

レコード数を確認してみる。

mysql> select table_name,table_rows,data_length/1024/1024 as 'data/MB',index_length/1024/1024 as 'index/MB' from information_schema.TABLES where table_schema='keystone';
+------------------------+------------+---------------+------------+
| table_name             | table_rows | data/MB       | index/MB   |
+------------------------+------------+---------------+------------+
| credential             |          0 |    0.01562500 | 0.03125000 |
| domain                 |          1 |    0.01562500 | 0.01562500 |
| ec2_credential         |          0 |    0.01562500 | 0.00000000 |
| endpoint               |         18 |    0.01562500 | 0.01562500 |
| group                  |          0 |    0.01562500 | 0.01562500 |
| group_domain_metadata  |          0 |    0.01562500 | 0.01562500 |
| group_project_metadata |          0 |    0.01562500 | 0.01562500 |
| migrate_version        |          1 |    0.01562500 | 0.00000000 |
| policy                 |          0 |    0.01562500 | 0.00000000 |
| project                |          3 |    0.01562500 | 0.01562500 |
| role                   |          3 |    0.01562500 | 0.01562500 |
| service                |          6 |    0.01562500 | 0.00000000 |
| token                  |     152946 | 1098.00000000 | 0.00000000 |
| trust                  |          0 |    0.01562500 | 0.00000000 |
| trust_role             |          0 |    0.01562500 | 0.00000000 |
| user                   |          6 |    0.01562500 | 0.01562500 |
| user_domain_metadata   |          0 |    0.01562500 | 0.01562500 |
| user_group_membership  |          0 |    0.01562500 | 0.01562500 |
| user_project_metadata  |          6 |    0.01562500 | 0.01562500 |
+------------------------+------------+---------------+------------+
19 rows in set (0.07 sec)

多いな。

インデックスを確認してみる。

mysql> show index from token;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| token |          0 | PRIMARY  |            1 | id          | A         |      145268 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.17 sec)

idにしかふられてない。

こいつじゃないかなー。こいつっぽい気がするなー。
10万行超えるレコードを毎回フルスキャンは遅いんじゃないかなー。
expiresとvalidにも張っておいたほうがいいんじゃないかなー。

index作ってみる。

mysql> alter table token add index expires(expires);
Query OK, 0 rows affected (6 min 6.79 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table token add index valid(valid);
Query OK, 0 rows affected (2 min 59.38 sec)
Records: 0  Duplicates: 0  Warnings: 0

効果を確認。

mysql> explain SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id  FROM token  WHERE token.expires > '2013-05-27 06:11:45' and token.valid = 0;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+
|  1 | SIMPLE      | token | ref  | expires,valid | valid | 1       | const |    4 | Using where |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+
1 row in set (0.02 sec)

mysql> SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id  FROM token  WHERE token.expires > '2013-05-27 06:11:45' and token.valid = 0;
Empty set (0.00 sec)

効いてる効いてる。

hoge@fuga:~$ top

top - 17:48:25 up 3 days,  1:07,  2 users,  load average: 0.16, 0.35, 0.61
Tasks: 144 total,   2 running, 142 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.6%us,  1.0%sy,  0.0%ni, 93.9%id,  0.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2050020k total,  1932860k used,   117160k free,    19116k buffers
Swap:  2097148k total,    52204k used,  2044944k free,   862236k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1341 quantum   20   0  186m  39m 3732 S    4  2.0 104:51.66 python
 1996 rabbitmq  20   0 1132m 105m 1860 S    3  5.3 101:57.68 beam.smp
 1384 nova      20   0  226m  61m 3740 S    1  3.1  17:19.29 nova-conductor
 1345 cinder    20   0 91428  17m 3324 S    0  0.9  23:46.03 cinder-volume
 1436 mysql     20   0 1310m 185m 4196 S    0  9.3 105:59.44 mysqld
 2309 root      20   0 91016 4328 2116 S    0  0.2   0:05.59 apache2

load average下がった。

hoge@fuga:~$ sudo iotop

Total DISK READ:       0.00 B/s | Total DISK WRITE:       2.76 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 1739 be/4 mysql       0.00 B/s  769.22 K/s  0.00 %  0.82 % mysqld
  387 be/3 root        0.00 B/s   26.92 K/s  0.00 %  0.31 % [jbd2/dm-12-8]
 2079 be/4 mysql       0.00 B/s    3.85 K/s  0.00 %  0.03 % mysqld
22747 be/4 mysql       0.00 B/s    3.85 K/s  0.00 %  0.03 % mysqld
 1511 be/4 mysql       0.00 B/s    3.85 K/s  0.00 %  0.02 % mysqld
    1 be/4 root        0.00 B/s    7.69 K/s  0.00 %  0.00 % init

mysqlのIOも下がった。

hoge@fuga:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0  52328 130860  24160 847128    0    0  2140   139   15   38  6  1 51 41
 0  0  52328 130852  24160 847132    0    0     0     0  155  297  0  0 100  0
 0  0  52328 130852  24176 847116    0    0     0    52  368  664  2  1 97  0
 0  0  52328 130852  24192 847124    0    0     0   404  959 1834  8  1 91  1
 0  0  52328 130852  24240 847112    0    0     0  1628  598 1147  4  0 96  0
 0  0  52328 130844  24272 847156    0    0     0    64  386  744  2  1 97  0
 0  0  52328 130844  24288 847156    0    0     0    48  484  876  3  0 97  0
 0  0  52328 130844  24312 847160    0    0     0    76  524  991  3  1 96  0
 0  0  52328 130844  24328 847172    0    0     4    44  360  666  2  1 97  1
 0  0  52328 130844  24328 847192    0    0     0     0  151  289  0  0 100  0

waitも下がった。

これで管理画面も正常に動作するようになったよ!