Archive for the ‘Sysadmin’ Category
Monday, September 3rd, 2007
Today we’d like to share the simple idea of using pivot queries for Oracle workload reporting.
We’ve used the Statspack tables as a data source and queried for total Program Global Area memory utilization, as an example.
He’re is the query used (looks a lot better when not wrapped ) :
select ps.NAME, to_char(snap_time,’YYYY-MM-DD’),
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘00′ then ps.value else null end)*10/1024/1024/1024)/10 h00,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘01′ then ps.value else null end)*10/1024/1024/1024)/10 h01,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘02′ then ps.value else null end)*10/1024/1024/1024)/10 h02,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘03′ then ps.value else null end)*10/1024/1024/1024)/10 h03,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘04′ then ps.value else null end)*10/1024/1024/1024)/10 h04,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘05′ then ps.value else null end)*10/1024/1024/1024)/10 h05,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘06′ then ps.value else null end)*10/1024/1024/1024)/10 h06,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘07′ then ps.value else null end)*10/1024/1024/1024)/10 h07,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘08′ then ps.value else null end)*10/1024/1024/1024)/10 h08,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘09′ then ps.value else null end)*10/1024/1024/1024)/10 h09,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘10′ then ps.value else null end)*10/1024/1024/1024)/10 h10,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘11′ then ps.value else null end)*10/1024/1024/1024)/10 h11,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘12′ then ps.value else null end)*10/1024/1024/1024)/10 h12,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘13′ then ps.value else null end)*10/1024/1024/1024)/10 h13,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘14′ then ps.value else null end)*10/1024/1024/1024)/10 h14,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘15′ then ps.value else null end)*10/1024/1024/1024)/10 h15,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘16′ then ps.value else null end)*10/1024/1024/1024)/10 h16,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘17′ then ps.value else null end)*10/1024/1024/1024)/10 h17,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘18′ then ps.value else null end)*10/1024/1024/1024)/10 h18,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘19′ then ps.value else null end)*10/1024/1024/1024)/10 h19,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘20′ then ps.value else null end)*10/1024/1024/1024)/10 h20,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘21′ then ps.value else null end)*10/1024/1024/1024)/10 h21,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘22′ then ps.value else null end)*10/1024/1024/1024)/10 h22,
floor(avg(case when to_char(ss.snap_time,’HH24′) = ‘23′ then ps.value else null end)*10/1024/1024/1024)/10 h23
from perfstat.stats$snapshot ss,
perfstat.stats$pgastat ps
where ps.name = ‘total PGA allocated’
and ss.SNAP_ID = ps.SNAP_ID
group by ps.name,to_char(snap_time,’YYYY-MM-DD’);
You can now copy and paste returned results into excel and use conditional formatting (just a proposal).
Result (click to enlarge) :

The report presented above is showing a common workload pattern of bigger memory usage during office hours.
Posted in Sysadmin, Capacity management, Monitoring, Oracle, SQL, Database | No Comments »
Thursday, July 26th, 2007
Before deploying any application to production servers it’s always good to check if the application won’t harm the environment. It’s good to check at least CPU usage & I/O load (local & network). That’s where troubles come from in large scale environments.
The CPU usage for one process can be easliy measured using sar command (SYSSTAT tool) with -x option:
sar -x pid.
Choosing one second monitoring interval will generate lots of results but only this way we can capture any CPU peaks generatedy by a process.
Network I/O Load can also be easily monitored using for example tcpdump.
The biggest problem is with checking the I/O Load of local disks. Linux system counts internally I/O load of a whole system, but not of separate processes. I tried to get an average I/O load generated by the application.
Procedure for one host:
- Start the application.
- Run iostat for defined amount of time. Get results: results0
- Stop the application
- Run iostat for defined amount of time. Get results: results1
- Compute change in I/O load for a host: results=results0-results1
The problem with this approach is that everything can affect the I/O Load during measurements time. I tried to use statistcs to solve it. I assumed that when I run this procedure on larger number of hosts and ranodimze procedure startup time, I will be able to extract I/O Load which was generated by the tested application.
I ran this procedure on random time on each of 450 hosts 2 times. Here are results:

It looks like the application generates 0.8 increase in tps on average.
Posted in Unix, Sysadmin, Linux, Capacity management, Monitoring | No Comments »
Saturday, June 9th, 2007
Let’s say you have the script that is executed every minute via cron. Most likely you’ve tested it, whether it is executing in less than one minute or not, but have you tested it while your box CPU resources are consumed at 100% ? Have you tested it how long it will be executing when your system is swapping ?
What if it will be running longer than the scheduled time interval is ? Then you’re likely to hit CPU, memory, number of processes or other resource’s limits.
Such issues can happen, and if your scripts are not immune to such situations, they’ll act just like nails in your system’s coffin. Especially if they’re running with low priority ( like “nice -n 19 ./your_script.sh” for example ).
Securing scripts that way could also be needed, when more then one execution at a time could have bad influence on information consistency - it may be needed to have such executions atomic.
Here are some code examples of how the scripts could be secured. Those examples are showing the trick to lock the script file itself.
For Perl scripts :
#!/usr/bin/perl
use Fcntl qw(:flock);
open SELF, “< $0″ or die;
flock SELF, LOCK_EX | LOCK_NB or exit;
For Python scripts :
#!/usr/local/bin/python
import fcntl
import sys
f = open(sys.argv[0], ‘r+’)
try:
fcntl.flock(f.fileno(),fcntl.LOCK_EX|fcntl.LOCK_NB)
except IOError:
sys.exit(0)
For Ruby scripts :
#!/usr/local/bin/ruby
f = File.open($0, File::RDONLY)
f.flock File::LOCK_EX | File::LOCK_NB or exit
Those snippets are showing the technique of creating exclusive lock on the script file itself. If another instance is trying to obtain that lock, it is not getting it and exits the execution. After the script quits, the lock is released.
You can just place similar code in the beginning of your script to not let it to run more than once at a time.
Posted in Unix, Scripting, Ruby, Perl, Python, Sysadmin, Automation, Linux | 1 Comment »
|