Saturday, 28 July 2012

PHP Memory Monitoring For All Scripts

We recently had a web server crash due to memory running so low the Linux kernel was killing off processes... Being an extremely busy server with SLA I immediately took a look and found Apache chowing away at pretty much all the memory... The problem which popped-up next was trying to find which PHP script/s were consuming so much memory (24MB memory limit per script) so I came up with this:

The script, eg. /var/www/memory.php in our case should have the similar ownership/permissions in Linux to your existing PHP scripts and should have access to write to the log file mentioned in the script:

<?

$used_bytes = memory_get_peak_usage();

function ByteSize($bytes){
 $size = $bytes / 1024;
 if($size < 1024){
  $size = number_format($size, 2);
  $size .= ' KB';
 } else {
  if ($size / 1024 < 1024){
   $size = number_format($size / 1024, 2);
   $size .= ' MB';
  } else if ($size / 1024 / 1024 < 1024){
   $size = number_format($size / 1024 / 1024, 2);
   $size .= ' GB';
  }
 }
 return $size;
}

$human_size = ByteSize($used_bytes);

$fp = @fopen("/memory_usage.log","a+"); //Change this where you want this log...
if ($fp){
 $script_uri = $_SERVER['REQUEST_URI'];
 fwrite($fp, "$human_size consumed by $script_uri\n");
 fclose($fp);
}

?>

Adding this line to php.ini calls the script:

auto_append_file = /var/www/memory.php


Restart apache and give it a try, Hopefully this helps someone else!

Thursday, 26 July 2012

Ubuntu dual boot grub menu not showing on boot...

Ubuntu (12.04 and other versions seem to have this issue) installed for dual boot but immediately boots into Windows without displaying the menu options to boot into eg. Ubuntu in the first place? The below was tested during a support call from South Africa to Namibia, hopefully it works for you too:

1. Boot from a live cd, open terminal and enter "sudo su -" to become root user, to check if your root use "whoami"
2. Use fdisk -l to locate your Linux partition, eg. /dev/sda6 (could be /dev/sdb/c etc)
3. Make a directory: mkdir /linfix
4. Mount the file system: mount /dev/sda6 /linfix
5. Use nano to edit: nano /etc/default/grub (change GRUB_TIMEOUT=0 to 10 (second) or -1 (indefinite))
6. Save the file in nano using: ctrl+o <enter> ctrl+x
7. Reinstall grub: grub-install --root-directory=/linfix/ /dev/sda
8. Reboot if no errors encountered in 7, if encountered Google em!

Tuesday, 3 July 2012

Linux DDoS on Power Systems?

Imagine receiving a notice from monitoring systems that your facility is using a megawatt more in electricity for no specific reason... enough to make the average person freak out completely but unfortunately exactly what happened this past weekend when an additional leap second was added...

Part of the notice we received from Hetzner - "During the night of 30.06.2012 to 01.07.2012 our internal monitoring systems registered an increase in the level of IT power usage by approximately one megawatt.

The reason for this huge surge is the additional switched leap second which can lead to permanent CPU load on Linux servers."

Apparently a Linux kernel (including version 3.3) bug affecting the hrtimer code fails to set the system time when the leap second was added, this in turn caused an infinite loop on many systems which pushed up CPU utilization to 100% affecting power distribution systems in a huge way!

I cant help but wonder how many facilities would be able to handle the sudden impact of such a bug without going offline... The bug seems to also be the reason why Amazon EC2 went down over the weekend, Reddit, Mozilla, Gawker and more...

Setting the date seems to fix the problem with rebooting as the last resort to also fix the affected server...

Well done to Hetzner for staying online and notifying customers of the problem.

The Linux vmstat command

The vmstat command is often overlooked or forgotten by administrators of Linux machines while it contains some nuggets of information otherwise hard to obtain... The vmstat command can provide you with IO blocks sent/received, context switches per second (not to long ago we had a server doing 10000+ context switches due to a software bug, by using vmstat we were able to determine we weren't going insane...), interrupts per second etc

Example output:

root@onms.net:~# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 157312   7832 1034456    0    0   420   358  650  347  7  4 83  7

From the vmstat man page, for translating the fields returned above to geek:

   Procs
       r: The number of processes waiting for run time.
       b: The number of processes in uninterruptible sleep.

   Memory
       swpd: the amount of virtual memory used.
       free: the amount of idle memory.
       buff: the amount of memory used as buffers.
       cache: the amount of memory used as cache.
       inact: the amount of inactive memory. (-a option)
       active: the amount of active memory. (-a option)

   Swap
       si: Amount of memory swapped in from disk (/s).
       so: Amount of memory swapped to disk (/s).

   IO
       bi: Blocks received from a block device (blocks/s).
       bo: Blocks sent to a block device (blocks/s).

   System
       in: The number of interrupts per second, including the clock.
       cs: The number of context switches per second.

   CPU
       These are percentages of total CPU time.
       us: Time spent running non-kernel code. (user time, including nice time)
       sy: Time spent running kernel code. (system time)
       id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
       wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
       st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.