WS2008FCS – Cluster disk failing to come online on a Windows 2008 cluster node

This is one real-time scenario with one of my client impacting production file shares running on Windows 2008 Failover cluster nodes. The cluster disk was automatically moved to maintenance mode by cluster service and there was a chkdsk initiated on it which was causing all the shares to go offline and this clustered disk was marked as dirty.

Analyzing the cluster logs I was able to find that cluster was not able to enumerate files under the root of clustered disk and the cluster log had error 5 – “VerifyFS: Ignoring failure to open file \\?\GLOBALROOT\Device\Harddisk10\Partition2\technicaldetails.xls Error: 5”

In windows 2008 to generate cluster logs you need to run command “Cluster log /gen” and the log will be generated in “C:\windows\cluster\reports” folder on all the cluster nodes. Please note that the cluster logs are written in UTC format time zone and not according to the server time zone.

Whenever the cluster service performs health check on the storage for possible access issues to storage it will try to enumerate files stored in the the root of the clustered disk volume and it runs the check in ‘Local System’ context.
In this scenario cluster was not able to open a handle to a file at root of clustered disk because of permission issues,  you can see from the cluster log that file the cluster is trying to open is technicaldetails.xls and it was getting access denied ‘Error: 5′ message.

Resolution: Removed read-only attribute from the file located in root of cluster disk and performed failover test for that resource group in which the clustered disk was hosted.

Cluster.log
————————————————————————————————————————————
000012b0.0001b900::2011/10/01-09:12:23.393 WARN  [RES] File Server <FileServer-(Filesrv001)(Filesrv001_I Drive)>: Failed in NetShareGetInfo(Filesrv001, Value_Navigator), status 2310. Tolerating…
0000132c.0000e57c::2011/10/01-09:13:21.755 WARN  [RES] Physical Disk <Fileserv1-Data>: VerifyFS: Ignoring failure to open file \\?\GLOBALROOT\Device\Harddisk10\Partition2\technicaldetails.xls Error: 5.
0000132c.0000e57c::2011/10/01-09:14:22.191 WARN  [RES] Physical Disk <Fileserv1-Data>: VerifyFS: Ignoring failure to open file \\?\GLOBALROOT\Device\Harddisk10\Partition2\technicaldetails.xls Error: 5.
0000132c.0000e57c::2011/10/01-09:15:22.628 WARN  [RES] Physical Disk <Fileserv1-Data>: VerifyFS: Ignoring failure to open file \\?\GLOBALROOT\Device\Harddisk10\Partition2\technicaldetails.xls Error: 5.
————————————————————————————————————————————

It is not recommended to store files at the root of a disk as the cluster needs to open handles to files and folders as part of the health detection mechanism used to determine possible access issues to storage. Since the cluster service runs in the context of the ‘Local System’ account, if that account does not have permission to files at the root of a drive, the health check may fail and if we are keeping files on root of cluster disk we need to ensure the files are not in read-only mode and they should be having full access to local system account.

Advertisements

About asifkhandevadi

Hello, I have been working on windows since 9 years and currently working as windows, VMware and MS clustering SME at IBM. Whenever I get free time I participate in Microsoft forums and write some blogs to enhance my technical and communication skills through knowledge sharing. Please contact me on FB or Linkedin if you need any assistance on troubleshooting, implementation and virtualizaton.
This entry was posted in MS-Clustering, Windows, Windows Troubleshooting and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s