就讲讲监控那些值,基线抓的是那些值。如何做告警
环境:windows 2008r2,sql server 2008r2 sp1
性能基线:
cpu:
\Processor(_Total)\% Processor Time
\Processor(_Total)\% Privileged Time
\SQLServer:SQL Statistics\Batch Requests/sec
\SQLServer:SQL Statistics\SQL Compilations/sec
\SQLServer:SQL Statistics\SQL Re-Compilations/sec
\System\Processor Queue Length
\System\Context Switches/sec
Memory:
\Memory\Available Bytes
\Memory\Pages/sec
\Memory\Page Faults/sec
\Memory\Pages Input/sec
\Memory\Pages Output/sec
\Process(sqlservr)\Private Bytes
\SQLServer:Buffer Manager\Buffer cache hit ratio
\SQLServer:Buffer Manager\Page life expectancy
\SQLServer:Buffer Manager\Lazy writes/sec
\SQLServer:Memory Manager\Memory Grants Pending
\SQLServer:Memory Manager\Target Server Memory (KB)
\SQLServer:Memory Manager\Total Server Memory (KB)
Disk:
\PhysicalDisk(_Total)\% Disk Time
\PhysicalDisk(_Total)\Current Disk Queue Length
\PhysicalDisk(_Total)\Avg. Disk Queue Length
\PhysicalDisk(_Total)\Disk Transfers/sec
\PhysicalDisk(_Total)\Disk Bytes/sec
\PhysicalDisk(_Total)\Avg. Disk sec/Read
\PhysicalDisk(_Total)\Avg. Disk sec/Write
SQL Server:
\SQLServer:Access Methods\FreeSpace Scans/sec
\SQLServer:Access Methods\Full Scans/sec
\SQLServer:Access Methods\Table Lock Escalations/sec
\SQLServer:Access Methods\Worktables Created/sec
\SQLServer:General Statistics\Processes blocked
\SQLServer:General Statistics\User Connections
\SQLServer:Latches\Total Latch Wait Time (ms)
\SQLServer:Locks(_Total)\Lock Timeouts (timeout > 0)/sec
\SQLServer:Locks(_Total)\Lock Wait Time (ms)
\SQLServer:Locks(_Total)\Number of Deadlocks/sec
\SQLServer:SQL Statistics\Batch Requests/sec
\SQLServer:SQL Statistics\SQL Re-Compilations/sec
以上是性能基线监控的信息,当然性能警告也是监控这些信息,其中的阀值是根据基线抓取后体现。
关于性能警告我是使用powershell 写了一个脚本,运行在SQL Agent 中。如果出现警告,就通过dbmail 发送邮件
关于powershell 脚本和一些配置信息看如下:
$server = "(local)" $uid = "sa" $db="master" $pwd="pwd" $mailprfname = "sina" $recipients = "xxxxx@qq.com" $subject = "Proformance Alter" $computernamexml = "f:\computername.xml" $alter_cpuxml = "f:\alter_cpu.xml" function GetServerName($xmlpath) {$xml = [xml] (Get-Content $xmlpath)$return = New-Object Collections.Generic.List[string]for($i = 0;$i -lt $xml.computernames.ChildNodes.Count;$i++){if ( $xml.computernames.ChildNodes.Count -eq 1){$cp = [string]$xml.computernames.computername}else{$cp = [string]$xml.computernames.computername[$i]}$return.Add($cp.Trim())}$return }function GetAlterCounter($xmlpath) {$xml = [xml] (Get-Content $xmlpath)$return = New-Object Collections.Generic.List[string]$list = $xml.counters.Counter$list }function CreateAlter($message) {$SqlConnection = New-Object System.Data.SqlClient.SqlConnection $CnnString ="Server = $server; Database = $db;User Id = $uid; Password = $pwd" $SqlConnection.ConnectionString = $CnnString $CC = $SqlConnection.CreateCommand(); if (-not ($SqlConnection.State -like "Open")) { $SqlConnection.Open() } $cc.CommandText=" EXEC msdb..sp_send_dbmail @profile_name = '$mailprfname',@recipients = '$recipients',@body = '$message',@subject = '$subject' " $cc.ExecuteNonQuery()|out-null $SqlConnection.Close(); }$names = GetServerName($computernamexml) $pfcounters = GetAlterCounter($alter_cpuxml) foreach($cp in $names) {$p = New-Object Collections.Generic.List[string]$report = ""foreach ($pfc in $pfcounters){$b = ""$counter ="\\"+$cp+$pfc.get_InnerText().Trim()$p.Add($counter)}$count = Get-Counter $pfor ($i = 0; $i -lt $count.CounterSamples.Count; $i++){$v = $count.CounterSamples.Get($i).CookedValue$pfc = $pfcounters[$i]#$pfc.get_InnerText()$b = ""$lg = ""if($pfc.operator -eq "lt"){if ($v -ge [double]$pfc.alter){$b = "alter"$lg = "Greater Than"}}elseif ($pfc.operator -eq "gt"){if( $v -le [double]$pfc.alter){$b = "alter"$lg = "Less Than"}}if($b -eq "alter"){$path = "\\"+$cp+$pfc.get_InnerText()$item = "{0}:{1};{2} Threshold:{3}" -f $path,$v.ToString(),$lg,$pfc.alter.Trim()$report += $item + "`n"}}if($report -ne ""){#生产警告 参数 计数器,阀值,当前值CreateAlter $report} }
其中涉及到2个配置文件:computernamexml,alter_cpuxml分别如下:
<computernames><computername>fanr-pc</computername> </computernames>
<Counters><Counter alter = "10" operator = "gt" >\Processor(_Total)\% Processor Time</Counter><Counter alter = "10" operator = "gt" >\Processor(_Total)\% Privileged Time</Counter><Counter alter = "10" operator = "gt" >\SQLServer:SQL Statistics\Batch Requests/sec</Counter><Counter alter = "10" operator = "gt" >\SQLServer:SQL Statistics\SQL Compilations/sec</Counter><Counter alter = "10" operator = "gt" >\SQLServer:SQL Statistics\SQL Re-Compilations/sec</Counter><Counter alter = "10" operator= "lt" >\System\Processor Queue Length</Counter><Counter alter = "10" operator= "lt" >\System\Context Switches/sec</Counter> </Counters>
其中 alter 就是阀值,如第一条,如果 阀值 > 性能计数器值,就会发出警告。