WS2008FCS – Cluster disk failing to come online on a Windows 2008 cluster node

This is one real-time scenario with one of my client impacting production file shares running on Windows 2008 Failover cluster nodes. The cluster disk was automatically moved to maintenance mode by cluster service and there was a chkdsk initiated on it which was causing all the shares to go offline and this clustered disk was marked as dirty.

Analyzing the cluster logs I was able to find that cluster was not able to enumerate files under the root of clustered disk and the cluster log had error 5 – “VerifyFS: Ignoring failure to open file \\?\GLOBALROOT\Device\Harddisk10\Partition2\technicaldetails.xls Error: 5”

In windows 2008 to generate cluster logs you need to run command “Cluster log /gen” and the log will be generated in “C:\windows\cluster\reports” folder on all the cluster nodes. Please note that the cluster logs are written in UTC format time zone and not according to the server time zone.

Whenever the cluster service performs health check on the storage for possible access issues to storage it will try to enumerate files stored in the the root of the clustered disk volume and it runs the check in ‘Local System’ context.
In this scenario cluster was not able to open a handle to a file at root of clustered disk because of permission issues,  you can see from the cluster log that file the cluster is trying to open is technicaldetails.xls and it was getting access denied ‘Error: 5′ message.

Resolution: Removed read-only attribute from the file located in root of cluster disk and performed failover test for that resource group in which the clustered disk was hosted.

Cluster.log
————————————————————————————————————————————
000012b0.0001b900::2011/10/01-09:12:23.393 WARN  [RES] File Server <FileServer-(Filesrv001)(Filesrv001_I Drive)>: Failed in NetShareGetInfo(Filesrv001, Value_Navigator), status 2310. Tolerating…
0000132c.0000e57c::2011/10/01-09:13:21.755 WARN  [RES] Physical Disk <Fileserv1-Data>: VerifyFS: Ignoring failure to open file \\?\GLOBALROOT\Device\Harddisk10\Partition2\technicaldetails.xls Error: 5.
0000132c.0000e57c::2011/10/01-09:14:22.191 WARN  [RES] Physical Disk <Fileserv1-Data>: VerifyFS: Ignoring failure to open file \\?\GLOBALROOT\Device\Harddisk10\Partition2\technicaldetails.xls Error: 5.
0000132c.0000e57c::2011/10/01-09:15:22.628 WARN  [RES] Physical Disk <Fileserv1-Data>: VerifyFS: Ignoring failure to open file \\?\GLOBALROOT\Device\Harddisk10\Partition2\technicaldetails.xls Error: 5.
————————————————————————————————————————————

It is not recommended to store files at the root of a disk as the cluster needs to open handles to files and folders as part of the health detection mechanism used to determine possible access issues to storage. Since the cluster service runs in the context of the ‘Local System’ account, if that account does not have permission to files at the root of a drive, the health check may fail and if we are keeping files on root of cluster disk we need to ensure the files are not in read-only mode and they should be having full access to local system account.

Posted in MS-Clustering, Windows, Windows Troubleshooting | Tagged , , , , , , , | Leave a comment

Troubleshooting non-paged pool memory leak event id 2019 using poolmon

Event id 2019 generated by source srv usually indicates that the server is running short on non-paged pool memory, the non-paged pool limitation on Windows 2003 32-bit is 256 MB which is used by kernel and device drivers

In case the NP pool is overloaded, the system becomes slow and unresponsive and some software components cease to work normally (for example, IIS starts refusing connections).

The NP memory pool shortage can be caused by memory leaks in third-party software, malware, or generally overstraining the system with resource-intensive operations.

I had encountered with one such similar issue during my day to day support for one the client. The server was repeatedly going into hung state and generated event id 2019 “The server was unable to allocate from the system nonpaged pool because the pool was empty”

———————————————————————————————————————–
Log Name: System
Source: srv
Date: 6/16/2014 5:21:06 AM
Event ID: 2019
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: Mytestserver.tech.com
Description:
The server was unable to allocate from the system nonpaged pool because the pool was empty.
———————————————————————————————————————–

After performing analysis on the server I had found that this issue was caused by a third party driver installed on the server, please read below in detail to understand usage of poolmon

Use Windows Task Manager to check NonPaged Pool value. If it is high (>200MB on a 32-bit system), it makes sense to analyze its utilization and fine-tune the server.
taskmgr

 

 

 

 

 

 

 

 

 

 

 

 

Use of poolmon.exe will show the number of allocations and outstanding bytes of allocation by type of pool and tag passed into calls. Various hotkeys cause Poolmon to sort by different columns to find the leaking allocation type, use either ‘b’ to sort by bytes or ‘d’ to sort by the difference between the number of allocations and frees.

Here’s Poolmon running on a system where BLFP had leaked 445342 allocations and BCM0 had leaked 40 allocations.

nonpaged
Once identifying the tag name in the left column the next step is to find the driver file that is using it, this can be achieved by performing search using findstr command with the tag name in the location “c:\windows\system32\drivers” where most of the drivers are located. To know more about using findstr with tag please visit ms kb http://support.microsoft.com/kb/298102

BLFP and BCM0 tags were related to Broadcom network adapter driver which was very old and outdated that caused all the problems on the server. Performing upgrade of Broadcom drivers to latest version fixed this issue.

There are some known pool tag names listed in the MS Technet site, please have a look at them as this list is very much helpful when we troubleshoot such issues. http://blogs.technet.com/b/yongrhee/archive/2009/06/24/pool-tag-list.aspx

Posted in Windows, Windows Troubleshooting | Tagged , , , , | Leave a comment

WS2008R2 – Windows Time service doesn’t start during server boot on Windows 2008 R2 server in a work group environment

Even if you set Windows Time service to Automatic start, it stops within a few seconds. Why?

Event ID 7042 is logged in system event logs on windows 2008 R2 server which is part of work group environment and this can lead to time sync issues on the server that hosts critical applications in the DMZ environment.

In Windows 2008 R2 the newly implemented time service is by default set as trigger start-up service which means the service will start when a specific event is triggered in the system depending on the trigger info configured in the registry for that service.
Background services and processes can have a substantial impact on the overall performance of the system. Trigger activation service has been implemented as a new feature, and reduce the total number of services that start automatically In 2008 R2 Windows Server, and the like to reduce the power consumption to improve performance, and increase the stability of the whole system . It is extended by this, the Service Control Manager, to be able to perform the start and stop the service by certain system events.

For example the trigger info might contain start service when the client is domain joined or start the service when the client has IP address assigned to it.

The name of the log: System
Source: Service Control Manager
Event ID: 7042
Level: Information
Stop control has been sent successfully to the Windows Time service. Reason it is specified: 0x40030011 [Operating System: Network connection (planned)]

The setting contents of the trigger service, which can be determined by running the command sc qtriggerinfo following.

sc qtriggerinfo w32time
Service Name: w32time

timeservicew2k8r2

 

 

 

 

 

 

 

 

To start the Windows Time service from the system start-up, we can use one of the following methods.

Method 1:
Run the following command to remove the trigger events that are registered by default, change the from Manual to Automatic startup type of the Windows Time service.
sc triggerinfo w32time delete

Method 2:
Run the following command, you can define the trigger events that are appropriate for your environment.
As an example, it is determined if the IP address is granted to the host, and stop or start service here.

sc triggerinfo w32time start / networkon stop / networkoff

Posted in Windows, Windows Troubleshooting | Tagged , , , , , | Leave a comment

Windows #The Right way of deleting user profiles from Servers 2000/2003/2008

Whenever we see capacity issues on system drive we find some of the older profiles consuming more space, then we try deleting old profiles on the servers for the user accounts which does not exist anymore in the environment.

Whenever a user logs onto a Windows computer, a user profile folder is created for that user under C:\Documents and Settings on windows 2003 servers and C:\users on windows 2008 servers.
If you want to delete any of these user profiles, don’t simply delete the C:\Documents and Settings\username folder for that user as it leaves the registry settings for that profile intact and this can confuse the profile service and cause unpredictable results. Some times improper deletion of profiles will cause lot of orphaned registry keys left out in the ProfileList

The right way of deleting a user profile is to use the profile applet, which is accessible from the advanced tab of the System applet in Control Panel.

system

 

 

 

 

 

 

 

 

Select the profile name and click on delete button to delete the selected profile

Registry path for profiles list:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList

Note: The profile folder cant be deleted in two conditions, First the user is logged into to the server using the profile which we are trying to delete and second one is the profile folder is being accessed by some process/application.

Posted in Windows, Windows Troubleshooting | Tagged , , | Leave a comment

WS2008 – Dynamic cache service for managing filesystem cache WS2008 and R2

Dynamic cache service is used for managing file system cache dynamically, this is required when we have situations where 80% of physical memory is consumed by file system cache and when this happens we might experience performance issues on servers like system hangs, application errors, app crash.. etc.

Memory management in Microsoft Windows operating systems uses a demand-based algorithm. If any process requests and uses a large amount of memory, the size of the working set (the number of memory pages in the physical RAM) of the process increases. If these requests are continuous and unchecked, the working set of the process will grow to consume all the physical RAM. In this situation, the working sets for all the other processes are paged out to the hard disk. This behavior decreases the performance of applications and services because the memory pages are continuously written to the hard disk and read from the hard disk.

This behavior also applies to the working set of the system file cache. If there is a continuous and high volume of cached read requests from any process or from any driver, the working set size of the system file cache will grow to meet this demand. The system file cache consumes the physical RAM. Therefore, sufficient amounts of physical RAM are not available for other processes.

On 64-bit versions of Windows operating systems, the size of the virtual address range is typically larger than the physical RAM. In this situation, the working set for the system file cache can increase to consume most of the physical RAM.

To work around this issue, use the GetSystemFileCacheSize API function and the SetSystemFileCacheSize API function to set the maximum or minimum size value for the working sets of the system file cache. The use of these functions is the only supported method to restrict the consumption of physical memory by the system file cache.

The Microsoft Windows Dynamic Cache Service is a sample service that demonstrates one strategy to use these APIs to minimize the effects of this issue.

The memory management algorithms in Windows Server 2008 R2 operating systems were updated to address many file caching problems that were found in earlier versions of Windows. There are only certain unique situations in which you have to implement this service on computers that are running Windows Server 2008 R2.

Dynamic cache service is only applicable for Windows 2008 and Windows 2008 R2 versions of operating systems, initially Dynamic cache service was released only for Windows 2008 and later this year Microsoft released a latest version of Dynamic cache service which is supported on Windows 2008 R2.

I had a scenario where a Windows 2008 file server cluster node had 32 GB of physical memory and the file system cache was consuming 22 GB of server’s physical memory causing server to have hang issues, deadlock condition to the cluster resources and then there were countless unexpected failover of cluster resources causing downtime to the file share resources.

Usage of RamMap.exe tool revealed most of memory was consumed by MetaFile and that was File system cache, to fix this issue I had configured Dynamic cache service to limit file system cache to 2GB and that reduced overall cache usage by 20GB. After I had configured the Dynamic cache service there were no unexpected failover or downtime to file share resources and high memory utilization by file system cache issue was fixed..(see my previous post on how to use RamMap.exe)

The parameters I used to limit file system cache are mentioned below:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DynCache\Parameters
MaxSystemCacheMBytes: 2048
MinSystemCacheMBytes: 100

Latest version of Dynamic cache can be downloaded from : http://www.microsoft.com/en-us/download/details.aspx?id=9258

How to configure Dynamic Cache service on Windows 2008 or 2008 R2

1) Copy DynCache.exe to %SystemRoot%\System32.
2) From a command prompt, run: sc create DynCache binpath= %SystemRoot%\System32\DynCache.exe start= auto type= own DisplayName= “Dynamic Cache Service”
3) Import the DynCache.reg registry file.  This registry file contains default settings that you will probably want to modify.
4) Dynamic cache service will monitor the changes made its parameters in registry and any changes made will not require to reboot the server.

To uninstall this service, execute the following commands:
sc stop DynCache
sc delete DynCache

Posted in Dynamic cache service, Windows, Windows Troubleshooting | Tagged , , | 4 Comments

PowerCLI – Upgrading VMtools without rebooting a Virtual Machine

VMware Tools is a suite of utilities that enhances the performance of the virtual machine’s guest operating system and improves management of the virtual machine. Without VMware Tools installed in your guest operating system, guest performance lacks important functionality. Whenever there is a new upgrade performed on ESX/ESXi host the latest version of VMtools will also be available on the hosts which can be installed on Virtual machines.

Performing VMtools upgrade in a large environment manually is a little bit difficult task and the virtual machine will be rebooted multiple times during VMtools upgrade process.

There is another way to perform VMtools upgrade by enabling options on Virtual Machine to check and perform upgrade of VMtools during reboot process however enabling this option might cause VMtools upgrade on a production Virtual machine after there is a heart beat loss between Host and VM causing downtime to production applications during business hours.

I have written a script using PowerCLI which can be used to upgrade the VMtools on multiple virtual machines without rebooting them and this script connects to vCenter then looks for the VMs specified in a text file however during upgrade you’ll observe there will be 3-4 ping request time-outs to the virtual machine and the VMware tools service will be restarted.

Below are steps to execute this PowerCLI script for upgrading VMware tools on virtual machine without reboot

1) Login to VCenter server open Power-CLI command line
2) At PowerCLI traverse to the folder where the PSC-VMToolsUpgd.ps1 is saved/copied
3) Mention the virtual machine names in VMnames.txt file on which VMware tools upgrade is to be performed
4) Make sure you save the VMnames.txt in same folder as PSC-VMToolsUpgd.ps1
5) Now call PSC-VMToolsUpgd.ps1 script at Power-CLI command prompt
6) The script will ask for VCenter server name to connect, we need to provide the correct VCenter name where the VMs are present else the script will not be able to find the Virtual machines listed in VMnames.txt

001

 

 

 

7) Once the PowerCLI script is executed it will look for each VM one by one mentioned in VMnames.txt and then it will perform VMware tools upgrade without reboot, a vmtools upgrade status bar will be shown in the PowerCLI command prompt

002

8) You can also see a recent task initiated in VCenter console

003
9) During VMware tools upgrade the Virtual machine will have two ping timeouts however two ping timeout will not cause production server issues and the VMware tools will get upgraded without any reboot
Note: For virtual machine with windows 2003 OS will not show any ping timeouts

004

 

10) After VMware tools upgrade the PowerCLI script will display a message stating the VMware tools upgrade completed with virtual machine name, and you can also see recent completed task in VCenter

005

 

006
11) After upgrade of VMware tools don’t forget to check DNS record for the virtual machine by doing nslookup

———————– PSC-VMToolsUpgd.ps1 script content ———————–

# PSC-VMToolsUpgd.ps1 Power-CLI script to upgrade VMware tools without reboot
# Mention the virtual machine names in VMnames.txt files and save it in same folder as PSC-VMToolsUpgd.ps1
# Interactive script that asks for vCenter server name
# Script also searches for VM in vCenter and then performs the upgrade
# If the vm is not present in vCenter a message will be displayed on the screen

$ErrorActionPreference = “SilentlyContinue”
$vcname = Read-host “Enter the vCenter or Esxi host name to connect”
Connect-viserver $vcname
foreach ($computers in Get-Content “VMnames.txt”)
{
$Temp = Get-VM $computers
If($Temp -match $computers)
{
Get-VM $Temp | Update-Tools -NoReboot
write-host “VMtools upgrade on $Temp completed”
}
else
{
write-host ” $computers virtual machine is not present in vCenter”
}
}
Disconnect-viserver -Confirm:$false

———————– PSC-VMToolsUpgd.ps1 script content ———————–

Posted in PowerCLI, VMtools, VMware | Tagged | 3 Comments

PowerShell :: Automatic Remote Desktop Connection

Using PowerShell function “Connect-RDP” we can rdp servers using secured cached credentials, it can be used to RDP single/multiple servers using cached credentials

To cache credentials on PowerShell command line we need to cmdkey.exe and the target server name for which you want to cache the credentials or single cached credential can be used against multiple servers.

Save the script in .ps1 powershell extension file type
Open PowerShell command line
To call the PowerShell function in the current PowerShell session you need to execute “. .\Connect-RDP.ps1”
To list the cached credentials on PowerShell command line type –> cmdkey /list
To cache credentials against a single server(target) we can use –> cmdkey/add:targetname /user:username /password:password

Example:– For target testpc1 in below screenshot uses domainname\user username to connect the server.
Same way we can use multiple server names to connect using cached credentials.. for example :- Connect-RDP server1 server2 server4

cmdkeylist

 

 

 

 

 

 

The another way to cache credential against a target is to run the Connect-RDP function with switch -Credential with the username to connect

cmdkeylist-cachecred

 

 

To RDP server run command –> Connect-RDP TargetServer – PowerShell credential request window will appear

rdp

 

Select drop down button under User Name section and select the cached user name using which you want to connect the server

credential list

 

 

 

 

 

 

Once you select the user name click on OK button
The server will be connected(RDP) using the securely cached user name

logon to windows

 

 

Connect-RDP.PS1 script content
————————————————————————————————————————————————

function Connect-RDP {

param (
[Parameter(Mandatory=$true)]
$ComputerName,

[System.Management.Automation.Credential()]
$Credential
)

# take each computername and process it individually
$ComputerName | ForEach-Object {

# if the user has submitted a credential, store it
# safely using cmdkey.exe for the given connection
if ($PSBoundParameters.ContainsKey(‘Credential’))
{
# extract username and password from credential
$User = $Credential.UserName
$Password = $Credential.GetNetworkCredential().Password

# save information using cmdkey.exe
cmdkey.exe /generic:$_ /user:$User /pass:$Password
}

# initiate the RDP connection
# connection will automatically use cached credentials
# if there are no cached credentials, you will have to log on
# manually, so on first use, make sure you use -Credential to submit
# logon credential

mstsc.exe /v $_ /f
}
}
————————————————————————————————————————————————

Posted in Power-Shell, Windows | Tagged , , , | Leave a comment

RamMap – usage of rammap.exe for troubleshooting unexplained high memory usage on windows servers

RamMap Sysinternal tool is a physical memory usage analysis utility for Windows operating systems Microsoft Windows Vista and later..

This great tool will provide a graphical view of physical memory usage in different tabs.

Use Counts: usage summary by type and paging list
Processes: process working set sizes
Priority Summary: prioritized standby list sizes
Physical Pages: per-page use for all physical memory
Physical Ranges: physical memory addresses
File Summary: file data in RAM by file
File Details: individual physical pages by file

This tool can be used to analyze memory usage on servers when you don’t get actual details from task manager or resource monitor. It can be used to perform analysis of how memory has been allocated to a application.

You can download this tool from systinternals site – http://download.sysinternals.com/files/RAMMap.zip

Situations where I found this tool useful….

1. Windows 2008 file servers where high memory is being utilized by file system cache.
2. Windows 2008 server with high memory utilization and task manager or resource monitor are not showing the exact physical memory utilization on server.

To learn more about this tool visit http://blogs.technet.com/b/askperf/archive/2010/08/13/introduction-to-the-new-sysinternals-tool-rammap.aspx

Scenario where I used this tool in real time troubleshooting of high memory utilization on Windows 2008 Ent Sp2 running MSSQL.

One of the SQL server running on Windows 2008 Sp2 was continously at 99% memory utilization and task manager or resource monitor was not showing accurate information about which process was utilizing high memory.
Server was installed with 64 GB of physical memory out of which 17% was being utilized by SQL process and no other information was available in task manager.
Running rammap.exe utility on server shows mapped file count is utilizing 62 GB of memory.
There was a SQL job running to copy(backup) DB files from production server to DR server.
Stopping the SQL copy job resolved the high memory utilization on the server(see the below sreenshots).

 

rammap1

 

rammap2

Posted in Windows Troubleshooting | Tagged , | 1 Comment

WS2012 – Windows could not share your printer there are no more endpoints

We were experiencing a problem on Windows Server 2012 when we tried to share a print queue the printer wizard was showing up a error message “Windows can’t open Add Printer. There are no more endpoints available from the endpoint mapper.”

Errormsg

 

 

 

 

 

 

 

 

 

This issue occurs if the firewall service on Windows Server 2012 is in disabled/stopped state and to resolve this issue we need to enable/start the Windows firewall service.

Spooler service uses the Firewallapi.dll file to make an API call to check the availability of the Windows Firewall service. If sharing is being performed for the first time, the following incoming rules are enabled during this process:

File and Printer Sharing (Spooler Service – RPC-EPMAP)
File and Printer Sharing (Spooler Service – RPC)
File and Printer Sharing (Echo Request – ICMPv4-In)
File and Printer Sharing (Echo Request – ICMPv6-In)
File and Printer Sharing (LLMNR-UDP-In)
File and Printer Sharing (NB-Datagram-In)
File and Printer Sharing (NB-Name-In)
File and Printer Sharing (NB-Session-In)
File and Printer Sharing (SMB-In)

Note: some of these rules are enabled when we share a folder on windows for the first time.

When the Firewall Service is running, the following registry is checked for the firewall rules:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\SharedAccess\Parameters\FirewallPolicy\FirewallRules

Posted in Windows Troubleshooting | Leave a comment

PowerCLI – Disable/Enable hotadd cpu and memory options on multiple virtual machines

Hot Add Memory and Hot Add vCPU options in VMware lets you to add Memory and vCpu to virtual machine while the vm is powered-on however there might be situations where we would like to disable this option on virtual machines due to critical applications running on those VMs.

To disable Hot Add Memory and Hot Add vCPU options on virtual machine we have to manually edit each virtual machine settings after shutdown, this task can be performed easily using Power-CLI script however once the script is executed we need to power-off(shutdown) and then Power-On the vm to see the changes on Virtual machine.

1. Save the VM names in VMnames.txt in folder where you copy this script.
2. After you execute this script it will ask you for the vCenter name or ESXi host to connect.
3. You need to power-off(shutdown) and power-on the vm to see the effect of changes made through this script.
4. The script will look for the vm names supplied int he vmnames.txt file and performs the change only if that vm is present on the vCenter or ESXi host.

Note: Same script can be used to enable hotadd memory and vCpu options, only thing you need to do is change $extra.Value=”false”  to true in both the functions.

Script content PSC-DisableHMVM.ps1:

—————————————————————————————————————————————————–
#Function to disable hotadd memory
Function Disable-MemHotAdd($vm)
{
$vmview = Get-vm $vm | Get-View
$vmConfigSpec = New-Object VMware.Vim.VirtualMachineConfigSpec
$extra = New-Object VMware.Vim.optionvalue
$extra.Key=”mem.hotadd”
$extra.Value=”false”
$vmConfigSpec.extraconfig += $extra
$vmview.ReconfigVM($vmConfigSpec)
}
#Function to disable hotadd vCpu
Function Disable-CPUHotAdd($vm)
{
$vmview = Get-vm $vm | Get-View
$vmConfigSpec = New-Object VMware.Vim.VirtualMachineConfigSpec
$extra = New-Object VMware.Vim.optionvalue
$extra.Key=”vcpu.hotadd”
$extra.Value=”false”
$vmConfigSpec.extraconfig += $extra
$vmview.ReconfigVM($vmConfigSpec)
}

$ErrorActionPreference = “SilentlyContinue”
$vcname = Read-host “Enter the vCenter or Esxi host name to connect”
Connect-viserver $vcname
foreach ($computers in Get-Content “VMnames.txt”)
{
$vm = Get-VM $computers
If($vm -match $computers)
{
Disable-MemHotAdd ($vm)
Disable-CPUHotAdd ($vm)
write-host “Changed the hot add memory and cpu option on vm $vm ”
}
else
{
write-host ” $computers virtual machine is not present in vCenter”
}
}
Disconnect-viserver $vcname

—————————————————————————————————————————————————–

how to run the script

 

Posted in PowerCLI | Tagged , , | Leave a comment