viernes, 23 de diciembre de 2016

How to find out what is consuming all of my bandwidth

Some days ago we received a complain from the networking department stating that some connections from one of our database servers were consuming most of the available bandwidth.
Obviously, the highest chance was that it was an oracle process, more, because it was our server who was sending the data, and the only applicacion in this server is Oracle.

But here we faced an issue. This server is a big corporative server where are running around 15 Oracle RAC instances and
there are more that one thousand oracle server processes. So, it was not that easy to discover who was the process(es)
consuming the most of the bandwidth.


Of course, one option is to use something like tcpdump or a network sniffer to check the traffic, but here we can find
mainly a couple of issues.

1. Some companies, for security, do not like you to use a sniffer or put a lot of problems to let you do it.
2. The output is not easy to understand, usually you will need to ask your network department colleagues to check it.

Here is where I want to introduce to you to a nice Linux utility. Nethogs.

Nethogs monitors traffic going to and from a machine, per process. Is available for most of Linux distributions and
the source code can be downloaded from github.

From NetHogs Project Page

    NetHogs is a small ‘net top’ tool. Instead of breaking the traffic down per protocol or per subnet,
    like most tools do, it groups bandwidth by process. NetHogs does not rely on a special kernel    module to be loaded.
    If there’s suddenly a lot of network traffic, you can fire up NetHogs and immediately see which PID is causing this.
    This makes it easy to identify programs that have gone wild and are suddenly taking up your bandwidth.


For Red Hat users, nethogs is included in the EPEL repository (Extra Packages for Enterprise Linux).

Check this link to learn how to enable it.

http://www.tecmint.com/how-to-enable-epel-repository-for-rhel-centos-6-5/


The page of Nethogs is:

https://github.com/raboof/nethogs

Going back to my issue, this is what I used to check what was happening. I installed nethogs and run it. After running for a while, it was quite clear which
processes were sending the most of the data through the network:


NetHogs version 0.8.5

    PID USER     PROGRAM                                                             DEV        SENT      RECEIVED
  37168 oracle   oraclePROD1                                                            eth2     6881.052     193.939 KB/sec
  58423 oracle   oraclePROD1                                                            eth2     4567.934     119.514 KB/sec
  25691 oracle   oraclePROD2                                                            eth2           2.008        3.216 KB/sec
           ? root     192.168.14.109:15410-192.168.75.15:4884                              0.860          1.071 KB/sec
           ? root     192.168.45.136:19509-192.168.45.138:24889                          0.195          0.345 KB/sec
           ? root     192.168.14.109:49703-192.168.14.113:1521                            0.648          0.337 KB/sec
           ? root     192.168.14.109:49705-192.168.14.113:1521                            0.366         0.109 KB/sec
  49491 oracle   ora_nsa2_DB1                                                          eth2          1.072          0.103 KB/sec
           ? root     192.168.14.109:27261-192.168.156.39:1525                           0.166          0.058 KB/sec
  46395 oracle   ora_nsa2_MARKT1                                                 eth2          0.536          0.052 KB/sec
          ? root     192.168.45.136:61021-192.168.45.138:45471                          0.021          0.029 KB/sec
  38445 oracle   ora_nsv1_COA1                                                       bond0:      0.107          0.019 KB/sec
           ? root     unknown TCP                                                                               0.000       0.000 KB/sec

  TOTAL                                                                                                            11466.336     320.938 KB/sec




Obviously, processes with PID 37168 and 58423 are the ones taking the most of the bandwidth.

From then on, was easy. Just run the query:

select ses.sid, ses.serial#, ses.event, ses, username, ses.sql_id, ses.machine, ses.module, ses.program  from v$session ses, v$process p where ses.paddr = p.addr and p.spid in (37168, 58423);

and got the info regarding who, from where and using which program was accessing the database. Then, I sent this info to the development department so the could fix the issue.



1 comentario:

Unknown dijo...

Excelent, thanks for sharing!