2013.05.27
OpenStack構築後にCPU負荷が異様に高まったのでその対応
OpenStackは以下を参考にインストール。
http://docs.openstack.org/grizzly/basic-install/apt/content/
インストールバージョンはgrizzly。
OSはUbuntu12.04。
コントローラノード1台、ネットワークノード1台、コンピュータノード2台の構成。
で、インストール後管理画面を開くと異様に遅いOpenStackが。
とりあえずvmstatで状況を確認。
hoge@fuga:~$ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 34888 73116 35168 881976 0 0 2151 139 83 6 6 1 51 41 0 1 34888 68032 35200 887116 0 0 5068 76 638 1142 4 1 48 47 1 0 34888 75720 35144 879248 0 0 6944 48 580 1052 3 1 49 48 0 1 34888 72124 35144 882692 0 0 4816 0 316 653 1 1 50 49 2 0 34888 67412 35184 887384 0 0 4736 828 576 1058 3 1 48 48 0 1 34888 74232 35168 880540 0 0 5580 64 1149 2193 7 1 45 46 0 1 34888 70512 35192 884328 0 0 3712 80 885 1625 5 1 45 49 0 1 34888 65428 35200 889336 0 0 5008 24 473 899 2 1 45 53 0 1 34888 73240 35120 881360 0 0 5872 28 458 847 2 1 51 47 0 1 34888 68776 35160 885860 0 0 4480 104 672 1174 4 1 40 56 0 1 34888 64808 35176 889948 0 0 4016 52 547 1064 3 1 49 48 0 1 34888 72124 35148 882600 0 0 5120 0 300 638 1 0 50 49 0 1 34888 66420 35148 888280 0 0 5664 0 310 627 1 1 50 49 0 1 34888 75100 35100 879676 0 0 3760 52 1135 2110 9 1 42 49 0 1 34888 71504 35180 883196 0 0 3620 1052 1161 2023 6 2 43 50
CPUのwaitが高過ぎますね。
hoge@fuga:~$ top top - 14:37:50 up 2 days, 21:56, 1 user, load average: 1.67, 1.54, 1.46 Tasks: 138 total, 2 running, 136 sleeping, 0 stopped, 0 zombie Cpu(s): 3.8%us, 1.0%sy, 0.0%ni, 47.8%id, 47.3%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 2050020k total, 1977740k used, 72280k free, 23848k buffers Swap: 2097148k total, 34888k used, 2062260k free, 894848k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1436 mysql 20 0 1310m 191m 3928 S 2 9.6 100:21.52 mysqld 1996 rabbitmq 20 0 1139m 106m 1844 S 2 5.3 97:33.42 beam.smp 1341 quantum 20 0 186m 39m 3748 S 1 2.0 100:18.12 python 1345 cinder 20 0 91428 19m 3324 S 1 1.0 22:43.32 cinder-volume 1362 nova 20 0 213m 49m 3712 S 0 2.5 1:44.69 nova-consoleaut 1397 cinder 20 0 197m 35m 3672 S 0 1.8 2:20.62 cinder-schedule 1 root 20 0 24584 2156 1132 S 0 0.1 4:37.98 init
load averageも高い。動いてるのは強いて言えばmysqldぽい。
beam.smpは見かけない子なのでググってみるとRabbitMQの関係の子らしい。特に原因には関係ないぽい。
ひたすらwaitなのでIOも確認してみる。
hoge@fuga:~$ sudo iotop Total DISK READ: 4.95 M/s | Total DISK WRITE: 144.30 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 22748 be/4 mysql 4.78 M/s 0.00 B/s 0.00 % 96.90 % mysqld 2227 be/4 rabbitmq 0.00 B/s 7.80 K/s 0.00 % 17.65 % beam.smp -W w -K true -A30 -P 1048576 --~ir "/var/lib/rabbitmq/mnesia/rabbit@fuga" 27136 be/4 www-data 0.00 B/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
mysqldが犯人で確定。
mysqlが何してるのか確認してみる。
mysql> show processlist; +------+----------+-----------+----------+---------+------+--------------+------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+----------+-----------+----------+---------+------+--------------+------------------------------------------------------------------------------------------------------+ | 1923 | quantum | localhost | quantum | Sleep | 1 | | NULL | | 1924 | nova | localhost | nova | Sleep | 1 | | NULL | | 1925 | nova | localhost | nova | Sleep | 1 | | NULL | | 1926 | nova | localhost | nova | Sleep | 6 | | NULL | | 1927 | cinder | localhost | cinder | Sleep | 3 | | NULL | | 1928 | cinder | localhost | cinder | Sleep | 7 | | NULL | | 1929 | nova | localhost | nova | Sleep | 11 | | NULL | | 1931 | nova | localhost | nova | Sleep | 7 | | NULL | | 1932 | nova | localhost | nova | Sleep | 7 | | NULL | | 1933 | quantum | localhost | quantum | Sleep | 1 | | NULL | | 1935 | quantum | localhost | quantum | Sleep | 1 | | NULL | | 1936 | quantum | localhost | quantum | Sleep | 0 | | NULL | | 1938 | nova | localhost | nova | Sleep | 16 | | NULL | | 1939 | nova | localhost | nova | Sleep | 6 | | NULL | | 1946 | quantum | localhost | quantum | Sleep | 2 | | NULL | | 1949 | nova | localhost | nova | Sleep | 1 | | NULL | | 1950 | keystone | localhost | keystone | Query | 238 | Sending data | SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid | | 1951 | root | localhost | NULL | Query | 0 | NULL | show processlist | +------+----------+-----------+----------+---------+------+--------------+------------------------------------------------------------------------------------------------------+
keystoneがデータを送り続けてるらしい。
keystoneのログを見てみる。
root@fuga:/var/log/keystone# tail -f keystone.log 2013-05-27 14:50:00 INFO [sqlalchemy.engine.base.Engine] SELECT role.id AS role_id, role.name AS role_name, role.extra AS role_extra_ FROM role_ WHERE role.id = %s_ LIMIT %s 2013-05-27 14:50:00 INFO [sqlalchemy.engine.base.Engine] ('bc1d3f04415a4feb89fee7ad327a9f6b', 1) 2013-05-27 14:50:00 DEBUG [keystone.policy.backends.rules] enforce admin_required: {'tenant_id': u'b1493d6491b54dcabb33792632df4945', 'user_id': u'6b06407ab6be40c7a90185774d082921', u'roles': [u'_member_', u'admin']} 2013-05-27 14:50:00 INFO [sqlalchemy.engine.base.Engine] SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id_ FROM token_ WHERE token.expires > %s AND token.valid = %s 2013-05-27 14:50:00 INFO [sqlalchemy.engine.base.Engine] (datetime.datetime(2013, 5, 27, 5, 50, 0, 186563), 0)
確かにkeystoneが投げてるクエリっぽい。
mysqlに入って実行してみる。
mysql> SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id FROM token WHERE token.expires > '2013-05-27 06:11:45' and token.valid = 0; Empty set (4 min 44.50 sec)
5分近くかかってる。こいつがつまってそう。
レコード数を確認してみる。
mysql> select table_name,table_rows,data_length/1024/1024 as 'data/MB',index_length/1024/1024 as 'index/MB' from information_schema.TABLES where table_schema='keystone'; +------------------------+------------+---------------+------------+ | table_name | table_rows | data/MB | index/MB | +------------------------+------------+---------------+------------+ | credential | 0 | 0.01562500 | 0.03125000 | | domain | 1 | 0.01562500 | 0.01562500 | | ec2_credential | 0 | 0.01562500 | 0.00000000 | | endpoint | 18 | 0.01562500 | 0.01562500 | | group | 0 | 0.01562500 | 0.01562500 | | group_domain_metadata | 0 | 0.01562500 | 0.01562500 | | group_project_metadata | 0 | 0.01562500 | 0.01562500 | | migrate_version | 1 | 0.01562500 | 0.00000000 | | policy | 0 | 0.01562500 | 0.00000000 | | project | 3 | 0.01562500 | 0.01562500 | | role | 3 | 0.01562500 | 0.01562500 | | service | 6 | 0.01562500 | 0.00000000 | | token | 152946 | 1098.00000000 | 0.00000000 | | trust | 0 | 0.01562500 | 0.00000000 | | trust_role | 0 | 0.01562500 | 0.00000000 | | user | 6 | 0.01562500 | 0.01562500 | | user_domain_metadata | 0 | 0.01562500 | 0.01562500 | | user_group_membership | 0 | 0.01562500 | 0.01562500 | | user_project_metadata | 6 | 0.01562500 | 0.01562500 | +------------------------+------------+---------------+------------+ 19 rows in set (0.07 sec)
多いな。
インデックスを確認してみる。
mysql> show index from token; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | token | 0 | PRIMARY | 1 | id | A | 145268 | NULL | NULL | | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 1 row in set (0.17 sec)
idにしかふられてない。
こいつじゃないかなー。こいつっぽい気がするなー。
10万行超えるレコードを毎回フルスキャンは遅いんじゃないかなー。
expiresとvalidにも張っておいたほうがいいんじゃないかなー。
index作ってみる。
mysql> alter table token add index expires(expires); Query OK, 0 rows affected (6 min 6.79 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table token add index valid(valid); Query OK, 0 rows affected (2 min 59.38 sec) Records: 0 Duplicates: 0 Warnings: 0
効果を確認。
mysql> explain SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id FROM token WHERE token.expires > '2013-05-27 06:11:45' and token.valid = 0; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+ | 1 | SIMPLE | token | ref | expires,valid | valid | 1 | const | 4 | Using where | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+ 1 row in set (0.02 sec) mysql> SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id FROM token WHERE token.expires > '2013-05-27 06:11:45' and token.valid = 0; Empty set (0.00 sec)
効いてる効いてる。
hoge@fuga:~$ top top - 17:48:25 up 3 days, 1:07, 2 users, load average: 0.16, 0.35, 0.61 Tasks: 144 total, 2 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 4.6%us, 1.0%sy, 0.0%ni, 93.9%id, 0.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2050020k total, 1932860k used, 117160k free, 19116k buffers Swap: 2097148k total, 52204k used, 2044944k free, 862236k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1341 quantum 20 0 186m 39m 3732 S 4 2.0 104:51.66 python 1996 rabbitmq 20 0 1132m 105m 1860 S 3 5.3 101:57.68 beam.smp 1384 nova 20 0 226m 61m 3740 S 1 3.1 17:19.29 nova-conductor 1345 cinder 20 0 91428 17m 3324 S 0 0.9 23:46.03 cinder-volume 1436 mysql 20 0 1310m 185m 4196 S 0 9.3 105:59.44 mysqld 2309 root 20 0 91016 4328 2116 S 0 0.2 0:05.59 apache2
load average下がった。
hoge@fuga:~$ sudo iotop Total DISK READ: 0.00 B/s | Total DISK WRITE: 2.76 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 1739 be/4 mysql 0.00 B/s 769.22 K/s 0.00 % 0.82 % mysqld 387 be/3 root 0.00 B/s 26.92 K/s 0.00 % 0.31 % [jbd2/dm-12-8] 2079 be/4 mysql 0.00 B/s 3.85 K/s 0.00 % 0.03 % mysqld 22747 be/4 mysql 0.00 B/s 3.85 K/s 0.00 % 0.03 % mysqld 1511 be/4 mysql 0.00 B/s 3.85 K/s 0.00 % 0.02 % mysqld 1 be/4 root 0.00 B/s 7.69 K/s 0.00 % 0.00 % init
mysqlのIOも下がった。
hoge@fuga:~$ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 52328 130860 24160 847128 0 0 2140 139 15 38 6 1 51 41 0 0 52328 130852 24160 847132 0 0 0 0 155 297 0 0 100 0 0 0 52328 130852 24176 847116 0 0 0 52 368 664 2 1 97 0 0 0 52328 130852 24192 847124 0 0 0 404 959 1834 8 1 91 1 0 0 52328 130852 24240 847112 0 0 0 1628 598 1147 4 0 96 0 0 0 52328 130844 24272 847156 0 0 0 64 386 744 2 1 97 0 0 0 52328 130844 24288 847156 0 0 0 48 484 876 3 0 97 0 0 0 52328 130844 24312 847160 0 0 0 76 524 991 3 1 96 0 0 0 52328 130844 24328 847172 0 0 4 44 360 666 2 1 97 1 0 0 52328 130844 24328 847192 0 0 0 0 151 289 0 0 100 0
waitも下がった。
これで管理画面も正常に動作するようになったよ!