RSS Subscription 167 Posts and 2,769 Comments

Exchange 2010 Database Activation Coordination (DAC)

Introduction and Database Activation Coordination (DAC) Support

Exchange 2010 introduced a vast amount of changes to the High Availability model with the addition of the Database Availability Group (DAG).  Some features of the DAG are having up to 16 members, automatic database *over to another site as long as you still have quorum, and much more.  Exchange also introduced Database Activation Coordination (DAC) mode as an optional addition to the new High Availability model to prevent split brain syndrome from occurring during a site switchover when utilizing a multi-site DAG configuration with at least 3 DAG members and more than one Active Directory Site.  DAC is off by default and in Exchange 2010 RTM it should not be enabled for:

  • 2 member DAGs
  • Non-Multisite DAGs
  • Multi-site DAGs that are in the same stretched Active Directory Site

In Exchange 2010 SP1,  the following changes are introduced and supported for DAC:

  • DAGs that contain 2 or more members
  • DAGs that are stretched across a single AD Site

Majority Node Set

Before we understand how DAC works, we really have to understand the Cluster Model that DAGs utilize.  Both Exchange 2007 and Exchange 2010 Clusters use Majority Node Set Clustering (MNS).  This means that 50% of your votes (server votes and/or 1 file share witness) need to be up and running.  The proper formula for this is (n / 2) + 1 where n is the number of DAG nodes within the DAG. With DAGs, if you have an odd number of DAG nodes in the same DAG (Cluster), you have an odd number of votes so you don’t have a witness.  If you have an even number of DAGs nodes, you will have a file share witness in case half of your nodes go down, you have a witness who will act as that extra +1 number.

So let’s go through an example.  Let’s say we have 3 servers. This means that we need (number of nodes which is 3 / 2) + 1  which equals 2 as you round down since you can’t have half a server/witness.  This means that at any given time, we need 2 of our nodes to be online which means we can sustain only 1 (either a server or a file share witness) failure in our DAG.  Now let’s say we have 4 servers.  This means that we need (number of nodes which is 4 / 2) + 1 which equals 3.  This means at any given time, we need 3 of our servers/witness to be online which means we can sustain 2 server failures or 1 server failure and 1 witness failure.

Note: Exchange 2010 DAGs do not use the term Majority Node Set anymore.  That term is deprecated and is now called Node Majority or Node Majority with File Share Witness.

Database Activation Coordination (DAC)

In short, DAC mode is enabled when you have at least 3 members to prevent split brain syndrome.  It’s as simple as that. Let’s take a look at an example and see how DAC can help. The longer explanation below talks about this specific model.

Prevention of Split Brain Syndrome

Short Explanation

When the Primary Site goes offline (or we lose too many servers – refer to Majority Node Set above), the Secondary Site will need to be manually activated should you make the choice that a secondary site activation will be required depending on the magnitude of the failure and how long you anticipate the primary site or servers there will be down.  But, when the Primary Site comes back online, the WAN link may be offline.  Because the Primary Site’s Exchange Servers don’t necessarily know about the Manual Site Switchover, they will come up thinking they have Quorum since the Primary Site has the majority of the servers and they are still connected to the old FSW.  Because of this, they will begin to mount databases since to them, they still have Quorum.

DAC mode will enable the usage of a new protocol, Database Activation Coordination Protocol (DACP). This means that DAG members start up with a special memory bit of 0.  They need to contact another DAG node with this special memory bit set to 1.  This memory bit will be set to 1 on one of the DAG members in the Secondary Site since that site is hosting active databases.  Because the WAN link is down, the Primary Site’s DAG members that just came online won’t be able to contact this DAG member with the special memory bit set to 1.  Because of this, they won’t be able to mount databases.  The WAN link will have to come back online which means the Primary Site’s DAG members will now be able to contact the DAG member that has the special memory bit set to 1 which will now allow the Primary Site’s DAG Members to be in a state where they are allowed to mount databases.

Longer Explanation

We can see in this example, there are 5 DAG nodes and no FSW as we have an odd number of DAG nodes.  Our entire Primary Datacenter Fails (or we lose too many servers – in our case, this would be (5 / 2) + 1 which means 3 of our nodes need to remain operational for the DAG to remain operational), the Secondary Site will need to be manually activated should you make the choice that a secondary site activation will be required depending on the magnitude of the failure and how long you anticipate the primary site or servers there will be down.

Part of the switchover process will have us shrink the DAG by removing the DAG nodes in the Primary Site from the cluster so all that remain of the existing 2 DAG nodes in the Secondary Site.  Instructions for shrinking the DAG and doing a manual site actiavtion is located here.  Should we decide to proceed with a a manual site switchover , we will provision the FSW in the secondary site during manual site activation to the secondary datacenter.  But what happens if the Primary Site’s Exchange Servers come back online?  They will think they have majority because the primary site has the majority of the servers and the FSW is located there.  Because of this, when they start up, they will begin mounting databases.

Now this is where DAC comes in.  Without DAC enabled, the Primary Site’s Exchange Servers would indeed come online, think they have majority, and begin mounting databases and you run into a split-brain syndrome scenario.  This is because when power is restored to the datacenter, the servers will usually come up before WAN connectivity is fully restored.  The servers cannot communicate with each other between the sites to see that the active databases are already mounted, and because of that, the Primary Exchange Servers will see they have majority since the majority of your servers and your FSW should be in the Primary Site, and mount the databases.

If the servers were allowed to mount databases, and you ran into a split-brain scenario, something called Database Divergence would occur. Database Divergence is where the databases in the primary site would become different from the secondary site causing  the need for a reseed from the authority database which would cause some database loss from the new database that went into the diverged database due to split-brain from occurring.

The way DAC works, is that all servers have a new protocol known as Database Activation Coordination Protocol (DACP).  One of the DAG Nodes will always have a special memory bit set to 1. What this means is, with DAC on, any time a server wants to mount a database, there are a few ways it will attempt to communicate with other DAG members:

  • If the starting DAG member can communicate with all other members, DACP bit switches to 1
  • If the starting DAG member can communicate with another member, and that other member’s DACP bit is set to 1, starting DAG member DACP bit switches to 1
  • If the starting DAG member can communicate with another member, and that other member’s DACP bits are set to 0, starting DAG member DACP bit remains at 0

Because of this, when the Primary DAG Servers come back online, they will need to either contact all other DAG members or contact a DAG member with DACP bit set to 1 in order to be in a state where it can begin mounting databases.  Because the WAN is down, these Primary Datacenter DAG Servers that are now just coming back online won’t be able to mount databases because none of these servers will have that special memory bit set to 1.  That memory bit will be set on one of the DAG Servers in the Secondary Site. Once WAN connectivity is restored, these Primary Datacenter DAG Servers will now be able to communicate with the DAG Server that happens to have that special memory bit set to 1 and now these DAG Servers will be allowed to mount databases.

Thankfully, in SP1, DAC will work with  2 node DAGs and multi-site DAGs that are using a stretched AD Site.

DAC and ForceQuorum

If you do not know what Forcequorum is,  have a quick look at my blog post here. Essentially, forcequorum allows you to forcefully start a cluster when this cluster has lost quorum.  You’re forcing it to bypass the Majority Node Set requirement to become operational.  In CCR, forcequorum was used in a geographically dispersed CCR cluster.  When the Primary Site went offline, you had to run forcequorum on the node in the Secondary Site and then set a new File Share Witness.  This is similar in Exchange 2010 DAGs when the Primary Site goes offline.

The article here is entitled Datacenter Switchovers and is the article to use when planning Site Resiliency with Exchange 2010.  You can see, in the procedure for terminating a failed site, there are two methods:

  • When the DAG is in DAC mode:
  • When the DAG isn’t in DAC mode

When looking at the procedures for when DAC is NOT enabled, there are more steps that have to be done which involve running clussvc commands.  When looking at the procedures for when DAC is enabled, there are no steps which involve running clussv commands.  This is because when you have DAC mode on, Exchange’s Site Resilient tasks allow it to perform these clussvc tasks in the background. As you can see, it is well worth it to ensure you have at least 3 DAG nodes in a DAG just to utilize DAC.  But again, in Exchange 2010 SP1, DAC can be utilized with DAGs that contain two nodes.

Share

26 Responses to “Exchange 2010 Database Activation Coordination (DAC)”

  1. […] This post was mentioned on Twitter by Mike Pfeiffer. Mike Pfeiffer said: RT @ExchServPro: Exchange 2010 Database Activation Coordination (DAC) – http://bit.ly/arjj1v #exchangeserver2010 […]

  2. on 02 Jul 2010 at 8:35 pmVMR

    Nice Job!

  3. on 03 Jul 2010 at 4:42 pmTurbomcp

    isnt there amistake here:
    Exchange 2010 RTM it should not be enabled for:

    •2 member DAGs
    •Non-Multisite DAGs
    •Multi-site DAGs that are in the same stretched Active Directory Site
    In Exchange 2010 SP1, the following changes are introduced and supported for DAC:

    •DAGs that contain 2 or more members
    •DAGs that are stretched across a single AD Site

    i thought in rtm you need 3 members minimum and in sp1 its 2 membes?

  4. […] http://www.shudnow.net/2010/06/30/exchange-2010-database-activation-coordination-dac/ […]

  5. on 03 Aug 2010 at 1:08 pm3nodeDAG

    How do you activate one of the servers in the primary site that have the bit set to 0 if the WAN is not available?
    (And we know that the remote server has not activated the databases)

  6. on 02 Oct 2010 at 11:52 amAnwar A.Siddiqui

    Dear Shudnow,

    I have two questions

    1. if i have two servers on Primary site and only one server on secondary site which is on extended LAN, then should I create a FSW on secondary site as well or what?
    2. Is this possible to have automatic secondary site failover without using any commands if a common file share witness is used on another site.

    Anwar A.Siddiqui

  7. on 04 Oct 2010 at 1:04 pmAnwar A.Siddiqui

    Thanks a lot Alan for your quick response,

    I will be making a transition to exchange server 2010 with site resilience in October. I will share my experiences with you. Please pray for smooth transition.

    Anwar A.Siddiqui

  8. on 16 Oct 2010 at 3:44 amJorge Salinas

    Dear Friends,

    I have a two node DAG, each Exchange 2010 Server Mailbox is located in a different Active Directory Site. Each Exchange 2010 Mailbox has its own mailbox database and each one of them serves local users. (In short, I have active users in both sites). FSW is located in Site 1

    Anytime I have a WAN outage, The Exchange Server located in Site 2 will loose contact with ths FSW and will dismount its database… so users located in Site 2 cant work.

    I would like to know what should I do to enable service on Site 2

    I really hope you can help me with his one..!

    Jorge Salinas

  9. on 02 Nov 2010 at 5:44 amZAHIR HUSSAIN SHAH

    Hi Elan,

    Leeme first tell you that I’m the biggest fan of your articles, because the way you describe scenario and their solution, is simply commendable!

    I have some questions, for your kind attention:

    Scenario: I’m designing a Exchange 2010 SP1 based Messaging design, where we want DR setup to be available for us using DAG.

    All users will connect to Primary Datacenter, and only in the event of disaster happens, then all users will move to the DR site.

    We are sharing the same name space for both data centers, HO.ABC.COM on both datacenters.

    Primary Data Center will contain:
    2 Nodes for CAS / HUB Exchange 2010 SP1 on Windows Server 2008 R2
    2 Nodes for Mailbox Servers Exchange 2010 SP1 on Windows Server 2008 R2
    Witness Server would one of the HUB Server in Primary Server

    Questions:
    1) Can we have 1 witness server and two Mailbox Server in the primary site, and only one Mailbox/CAS/HUB Server in the DR site?

    2) In this scenario what name space structure do you recommend, either shared name space or different name space?

    3) In the event of complete Primary Datacenter failover, and where I have only single server in DR site containing Exchange 2010 SP1 Mailbox/CAS/HUB, do I also need to create standby Witness server to keep required votes in the DR for mounting databases?

    Looking forward to hear from you soon;

    Zahir Hussain Shah

  10. on 16 May 2011 at 4:10 amAshif

    Hello Elan,

    In my Environment i have implemented two Mailbox server in Primary site and two in DR site, FSW is hosted by CASHT01 server in primary site and Alternate FSW in hosted in DR CASHT03 server. In my case what will happen if primary site gose down ? All DB will mount to DR autometically ? Will Alternate FSW will be act as voter node in case of primary FSW fail ? How the mejority and FSW will count if my primary site goes down?

    Ashif.

  11. on 02 Feb 2012 at 4:03 pmkysellbook

    If I move all the active databases from the primary datacenter to the standby datacenter first, then perform the following steps below. Do you think the mailbox servers in the primary datacenter will mount the database, assuming that the DAC is enabled?

    (1) Stop the DAG in the primary site

    (2) Activate the DAG member servers in the standby site. This step will evict all the nodes in primary site. Also it will stop and disable the cluster service on the mailbox servers in the primary site.

    (3) Transfer the file witness server from primary datacenter to the standby datacenter.

    (4) Suspend the database copy replication between primary datacenter and standby datacenter.

    (5) Sever the WAN link between primary site and standby site.

    Primary Datacenter: 4 mailbox-role servers and one witness server
    Standby Datacenter: 2 mailbox-role servers and one witness server
    One DAG spans two datacenters: primary and standby

    We are trying to simulate the site failure recovery procedure as much as we can, without any data loss.

    I am not able to find any information about this type of simulation on the Internet. I would really appreciate if you can provide me some insights.

  12. […] Exchange 2010 Database Activation Coordination (DAC) http://www.shudnow.net/2010/06/30/exchange-2010-database-activation-coordination-dac/ […]

  13. on 22 Apr 2013 at 4:57 amhenrydavis

    I have to say I am very impressed with the way you efficiently website and your posts are so informative. <a href ="http://igfollowers.us/more-free-followers-fast/">more free followers fast

  14. on 22 Apr 2013 at 4:58 amramijoe

    I have enjoyed this article so much that I have read it multiple times and plan on coming back for any other articles you may publish.Its not the situation that reader should be totally agreed with author's views about post. <a href ="http://igfollowers.us/gain-followers-for-free/">gain followers for free

  15. on 24 Apr 2013 at 6:07 amchrisgriffin

    When i was reading the article.It is very useful for me.. I got a more information.thank you posting…
    <a href ="http://www.serrurier-lyon-69.fr/">serrurier lyon 69000

Trackback this post | Feed on Comments to this post

Leave a Reply