Troubleshooting a Hung Windows Server

Troubleshooting a hung or non-responsive Windows server can be a challenging endeavor. Simply hitting the reset button is no longer a tolerated option as an increasing number of these servers are used for business critical operations.

There are a variety of reasons why a server may hang including both hardware and software issues. For example bad NIC, device driver conflicts, resource pool depletion… etc

Here are some of the basic steps for troubleshooting an incident related to physical windows server in non-responsive/hung state

1. Try to Ping the server in hung state.
2. If server is reachable via ping try RDP to server and see if you are able to login.
3. If ping is working and RDP is not working then try to manage server remotely using ‘computer management’ or ‘pstools’ or “perfmon” from another server in same network.
4. If server is reachable via ping not via any other remote management tools then login to hardware remote console RSA/DRAC/ILO (if available).
5. Check the server status in hardware console and see if you are able to login.
6. If you are able to login to server console via RSA/DRAC/ILO then use perfmon to generate the performance report on server or use task manager and look for process that is consuming high CPU/memory, look for errors in event logs.
7. If server is non-responsive even from server console RSA/DRAC/ILO then perform NMI reboot. NMI generates a forced crash dump which may only be necessary if other means of troubleshooting prove unsuccessful.
8. NMI reboot option is available in power/diagnostics options in RSA/DRAC/ILO depending on the vendor (please see the below screenshots).
9. Before performing NMI reboot/generating memory dump make sure crash control is enabled in windows registry
10. After you have created crash dump file (memory.dmp), you are ready to begin using Windbg to determine what caused that server hung.

NOTE: Please take screenshots at each and every step you perform, screenshots are important for drafting a RCA.

Advertisements

About asifkhandevadi

Hello, I have been working on windows since 9 years and currently working as windows, VMware and MS clustering SME at IBM. Whenever I get free time I participate in Microsoft forums and write some blogs to enhance my technical and communication skills through knowledge sharing. Please contact me on FB or Linkedin if you need any assistance on troubleshooting, implementation and virtualizaton.
This entry was posted in Windows Troubleshooting. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s