Testing the Cluster
Before you simulate a failure you should create some test mailboxes and send a few test e-mails back and forth.
Once this has been done we will verify that the cluster is working correctly by using the Exchange Management Shell to fail the Exchange cluster over to the secondary node. Then we will power down the primary node to test a server crash scenario.
Verify the cluster is functioning
1.
Login as a user at http://<CAS role>/OWA
o This URL is only good for clients on Exchange 2007, all clients can go to http://<CAS server>/Exchange and they will be directed to the correct OWA server depending on where their mailbox is.
o The cluster server role does not include the CAS role; therefore users must connect to a server with the CAS role to use OWA. Outlook clients using MAPI over RPC connect directly to the mailbox server role, RPC over HTTP (Outlook Anywhere) clients connect to a CAS server in the same site as their mailbox server.
2.
Confirm that the user can access their mailbox and open e-mails
Carrying out a scheduled move
One of the key advantages of having a clustered Exchange server is the fact that you can carry out routine server maintenance on one node while the other node continues to services clients. Once the maintenance is done on the secondary\passive node you can move the clustered resources to the passive node and carry out maintenance on the primary node. The steps below cover moving the Exchange clustered resources to the passive node. In previous versions of Exchange you could use the Cluster Administrator UI or CLUSTER.EXE CLI to move the Exchange cluster. With Exchange 2007 these tools SHOULD NOT be used. If they are used it can cause corruption of the database files and other issues.
- Open up the Exchange Management Shell, located in the Microsoft Exchange Server 2007 program group
- Run the following command:
Move-ClusteredMailboxServer
-Identity <Exchange Clustered Server Name> -MoveComment <Move comment, saved to the event log>
-TargetMachine <Machine name of the target node>
Example:
Move-ClusteredMailboxServer
-Identity EXC01 -MoveComment "Test Move" -TargetMachine EXB03
o Don't forget to hit the TAB key to auto complete the command, you should only have to type
"MOVE-C <TAB>"
o The Move-ClusteredMailServer command should always be used to move the clustered Exchange server between nodes. It executes additional checks and steps that the Cluster Administrator tool and CLUSTER.EXE does
not. Using the Cluster Administrator to do the move, especially when a node was not shutdown cleanly, can corrupt the database files.
o This command can also be used to move or fail an Exchange cluster over from a node that has a corrupted database to a secondary node. For more information on this command and general information about Exchange 2007 clustering read the Scheduled and Unscheduled Outages section on TechNet.
- Enter "Y" when asked to confirm the move
- You should see something similar to this:

o The command prompt will not return until the cluster has finished failing over.
- Confirm resources have failed over
o If you have Cluster Administrator open on the Exchange cluster group, EXC01 in my environment, you will see the resources go off-line and then moved to the secondary node

o You should see multiple ESE, MSExchangeRepl, MSExchangeIS, and MSExchange Cluster events logging that the database were taken off-line, moved, replication was reestablished, and more
a.
Using Exchange Management Shell
1.
Run the following command:
Get-ClusteredMailboxServerStatus
-Identity <Exchange Clustered Server Name>
- Example:
Get-ClusteredMailboxServerStatus -Identity EXC01
2.
You should see something similar to this:

- Test server access using OWA
- Move the Exchange clustered resources back to the primary node
a.
Repeat the above command, setting the TargetMachine parameter to be the primary node
1.
Example:
Move-ClusteredMailboxServer -Identity EXC01 -MoveComment "Test Move" -TargetMachine EXB02

b.
Wait for all resources to show as on-line on the primary node
- Test server access using OWA
Simulating a server failure
- Power off the primary cluster node VM
- Wait a few minutes and confirm the cluster has failed over to the 2nd node
a.
Open up Cluster Manager on the 2nd node
-
If you get an error that the cluster can not be contacted wait a few more minutes. The failover took about 5 minutes in my VM environment, this can be adjusted by changing the heartbeat settings
b.
Confirm the primary node "EXB02" in my environment, has a red X on it
c.
Expand the secondary node and click on "Active Resources"
d.
Confirm all resources show as Online

- Test server access using OWA
- Power on the primary node and wait for all processes to finish starting up
- Move the Exchange clustered resources back to the primary node
a.
Run the following command
Example:
Move-ClusteredMailboxServer -Identity EXC01 -MoveComment "Failback Move" -TargetMachine EXB02
b.
Wait for all resources to show as on-line on the primary node
- Test server access using OWA
Troubleshooting steps
- Shut down and start up procedures for both nodes at the same time
a.
Shutdown
1.
Confirm which server is the active node for the cluster groups
2.
Shut down passive node
3.
Shut down active node once the passive node has stopped
b.
Startup
1.
Keep passive node off
2.
Start primary node
3.
Confirm Exchange is running
4.
Start passive node
- If one of the storage groups will not start, carry out the following steps
a.
Open up the Exchange Management Shell
b.
Run the following commands to check the status of replications:
Get-StorageGroupCopyStatus
Get-ClusteredMailboxServerStatus
c.
Research any errors or warnings displayed from the above commands
d.
Try moving the resource back to the other node that was working correctly
1.
Example, this assumes EXB02 is able to start all resources successfully:
Move-ClusteredMailboxServer -Identity EXC01 -MoveComment "Test Move" -TargetMachine EXB02
e.
Run the following command on the node that now host the databases, EXB02 in this example
i.
Suspend-StorageGroupCopy
-Identity:"EXC01\First Storage Group"
ii.
Choose Y when prompted
iv.
Repeat the above commands for any storage group that would not come on-line
f.
Remove database files, all log files, and checkpoint files on the node that will not start the storage groups successfully, EXB03 in this example
-
Remove *.log, *.jtx, *.chk, and the .edb files, you can just move all files under the storage group directory to an Old directory if you wish
g.
Run the following commands on the failed node, EXB03 in this example
i.
Update-StorageGroupCopy
-Identity:"EXC01\First Storage Group"
iii.
Resume-StorageGroupCopy
-Identity:"EXC01\First Storage Group"
iv.
Repeat the above commands for any storage group that would not come on-line
Conclusion
Using the steps above, a two node CCR cluster can be setup with low cost hardware, or even virtual hardware. The biggest advantage of a CCR cluster is that it allows for greater uptime. The increase in uptime will mainly come in the form of carrying out maintenance on the passive node. Then moving the Exchange Clustered mailbox server to the passive node and carrying out maintenance on the primary node. Of course in the case of unscheduled downtime, where a server crashes, the store crashes, or another system failure occurs the passive node can take over. If only the store or another application level failure occurs, where the server is still available as far as the clustering service is concerned, manual intervention or a monitoring solution will be required to fail the Exchange Clustered mailbox server to the passive node.
While CCRs main disadvantage is that the file share witness becomes a single point of failure, it does have some key advantages over SCC clusters. SCC clusters do not help if the databases get corrupted, the disk subsystem fails, data center failure, and it does not offload backup overhead.
Of course both SCC and CCR clusters require at least two nodes, of which one must be passive with Exchange 2007. Thus, if you are looking to reduce downtime in the case of a storage system failure or possible database corruption, LCR provides the cheapest method of database redundancy. With both LCR and CCR the space required by Exchange must be doubled since two copies of the databases and transaction logs will exist.
More Information