Some days ago we received a complain from the networking department stating that some connections from one of our database servers were consuming most of the available bandwidth.
Obviously, the highest chance was that it was an oracle process, more, because it was our server who was sending the data, and the only applicacion in this server is Oracle.
But here we faced an issue. This server is a big corporative server where are running around 15 Oracle RAC instances and
there are more that one thousand oracle server processes. So, it was not that easy to discover who was the process(es)
consuming the most of the bandwidth.
Of course, one option is to use something like tcpdump or a network sniffer to check the traffic, but here we can find
mainly a couple of issues.
1. Some companies, for security, do not like you to use a sniffer or put a lot of problems to let you do it.
2. The output is not easy to understand, usually you will need to ask your network department colleagues to check it.
Here is where I want to introduce to you to a nice Linux utility. Nethogs.
Nethogs monitors traffic going to and from a machine, per process. Is available for most of Linux distributions and
the source code can be downloaded from github.
From NetHogs Project Page
NetHogs is a small ‘net top’ tool. Instead of breaking the traffic down per protocol or per subnet,
like most tools do, it groups bandwidth by process. NetHogs does not rely on a special kernel module to be loaded.
If there’s suddenly a lot of network traffic, you can fire up NetHogs and immediately see which PID is causing this.
This makes it easy to identify programs that have gone wild and are suddenly taking up your bandwidth.
For Red Hat users, nethogs is included in the EPEL repository (Extra Packages for Enterprise Linux).
Check this link to learn how to enable it.
http://www.tecmint.com/how-to-enable-epel-repository-for-rhel-centos-6-5/
The page of Nethogs is:
https://github.com/raboof/nethogs
Going back to my issue, this is what I used to check what was happening. I installed nethogs and run it. After running for a while, it was quite clear which
processes were sending the most of the data through the network:
NetHogs version 0.8.5
PID USER PROGRAM DEV SENT RECEIVED
37168 oracle oraclePROD1 eth2 6881.052 193.939 KB/sec
58423 oracle oraclePROD1 eth2 4567.934 119.514 KB/sec
25691 oracle oraclePROD2 eth2 2.008 3.216 KB/sec
? root 192.168.14.109:15410-192.168.75.15:4884 0.860 1.071 KB/sec
? root 192.168.45.136:19509-192.168.45.138:24889 0.195 0.345 KB/sec
? root 192.168.14.109:49703-192.168.14.113:1521 0.648 0.337 KB/sec
? root 192.168.14.109:49705-192.168.14.113:1521 0.366 0.109 KB/sec
49491 oracle ora_nsa2_DB1 eth2 1.072 0.103 KB/sec
? root 192.168.14.109:27261-192.168.156.39:1525 0.166 0.058 KB/sec
46395 oracle ora_nsa2_MARKT1 eth2 0.536 0.052 KB/sec
? root 192.168.45.136:61021-192.168.45.138:45471 0.021 0.029 KB/sec
38445 oracle ora_nsv1_COA1 bond0: 0.107 0.019 KB/sec
? root unknown TCP 0.000 0.000 KB/sec
TOTAL 11466.336 320.938 KB/sec
Obviously, processes with PID 37168 and 58423 are the ones taking the most of the bandwidth.
From then on, was easy. Just run the query:
select ses.sid, ses.serial#, ses.event, ses, username, ses.sql_id, ses.machine, ses.module, ses.program from v$session ses, v$process p where ses.paddr = p.addr and p.spid in (37168, 58423);
and got the info regarding who, from where and using which program was accessing the database. Then, I sent this info to the development department so the could fix the issue.
1 comentario:
Excelent, thanks for sharing!
Publicar un comentario