RSS Subscription 167 Posts and 2,643 Comments

Outlook Certificate Error and Autodiscover.Domain.Com Not Working?

I have a blog post on Outlook Certificate Errors which applies to Outlook 2007, Outlook 2010, and Outlook 2013.  You can see that post here. That blog post describes an incorrect certificate on Exchange itself.  For example, you make a connection to Exchange and your InternalURLs, ExternalURLs, and AutodiscoverServiceInternalURI FQDN is not defined on the certificate.  Therefore, you must update the InternalURLs, ExternalURLs, and AutodiscoverServiceInternalURI to match the certificate FQDN.

This specific issue is a bit different.  This issue is that when you are trying to make a connection to Autodiscover via https://autodiscover.domain.com, the Outlook client does not successfully make a connection to it and you get a certificate error.  The certificate you see pop up in Outlook during the error isn’t even the certificate that is located on Exchange.  The certificate error that pops up shows you that it is finding the certificate on your company’s public website.

So the million dollar question? Why the error and why is it showing the company’s public website’s certificate.

Well first, let’s explore a little on the steps External Autodiscover goes through in order to find Exchange.

Internal Autodiscover and the Service Connection Point

The Autodiscover service is a mechanism that can do several things.

  • Automatic Mailbox Creation
  • Redirects Outlook 2007/2010/2013 clients to point to the correct server in which their mailbox is located
  • Provides URLs to Web Services for Outlook 2007/2010/2013

When you first launch your Outlook client (Outlook 2007 or above required for Autodiscover access), it will search Active Directory for a Service Connection Point (SCP) record.  Every time a CAS Server is installed, it will register this SCP record within Active Directory in the following location:

CN=Autodiscover,CN=Protocols,CN=<CASServer>,CN=Servers,CN=Exchange Administrative Group,CN=Administrative Groups,CN=First Organization,CN=Microsoft Exchange,CN=Services

When an Outlook client has the ability to find this record because they are domain joined and on the internal network, they will locate all SCP records and will choose the oldest SCP.  If you have slow links or want to scope what SCPs are available to users in various sites, Autodiscover Site Affinity can be used.  This SCP will return the AutodiscoverInternalURI FQDN in which this client should contact the Autodiscover service.  You can modify this FQDN by using the following command:

Set-ClientAccessServer -Identity CASServer -AutoDiscoverServiceInternalUri https://webmail.domain.com/Autodiscover/Autodiscover.xml

By default, the SCP is configured with the following URL (Uses NetBIOS instead of FQDN).

https://CASServer/Autodiscover/Autodiscover.xml

It is a best practice not to use server FQDNs or NetBIOS names on a cert. So please be sure to update all InternalURLs, ExternalURLs, and the AutodiscoverServiceInternalURI to use an FQDN other than the server FQDN and other than the NetBIOS names of the servers.

So an example of how this works for domain joined clients who have access to Active Directory is included on the Autodiscover Whitepaper:

A domain joined Outlook client (again, only Outlook 2007 or above clients use Autodiscover) will find this SCP record which will return the AutodiscoverInternalServiceURI you specify with Set-ClientAccessServer.  Again, don’t forget that it’ll get a list of all SCPs for all CAS servers in your environment and use one at random.  If you have slow links or just want to force Outlook clients to use a CAS in their own site, check out my article here.

External Autodiscover Access

So we now understand how we access the Autodiscover service when we are domain joined and internal to the network.  What happens if we are on the corporate network and are not domain joined or we are external to the network and don’t have AD connectivity?  Or what happens if we connect via Outlook Anywhere when internal or external to the network? Essentially, what happens when we don’t have access to Active Directory?

Because our Outlook clients don’t have access to Active Directory, we cannot obtain the AutodiscoverServiceinternalURI since the client can’t get to the SCP record.  Because of this, Outlook will fall back to utilizing a different method.  The first method is to contact the following DNS records in order (domain = the user’s primary SMTP domain):

Every client I’ve ever worked with leverages autodiscover.domain.com and does not rely on domain.com.  Taking a close look at the URLs, we will utilize https.  This means that we will need the autodiscover.domain.com name in our certificate.  To have multiple names in our certificate, we will need a Unified Communications Certificate that is provided by various vendors.

So let’s say we want our NetBIOS name on our certificate, FQDN of CAS, our OWA FQDN, and our Autodiscover name, we’d have the following FQDNs on our certificate.

  • webmail.domain.com
  • autodiscover.domain.com

So an example of how this works for domain joined clients who don’t have access to Active Directory, Outlook Anywhere clients, or non-domain joined clients is included on the Autodiscover Whitepaper:

As you can see, Outlook attempts to connect to Active Directory to get a list of all the SCPs and it fails.  Because of this, Outlook then attempts to contact autodiscover.domain.com.  There would need to be an A record for autodiscover which will lead the client to a CAS server.  But, as I mentioned, you’re not getting this far.

What’s the Issue?

As I mentioned, the issue is that when you attempt External Autodiscover, you get a certificate error and see your Public Website’s Certificate, not the Exchange Certificate.  Why is it hitting my Public Company’s website?

When I am creating a new Outlook Profile, that is one point when Autodiscover will be contacted.

OutlookAutoD01

When you are in the process of Searching for your e-mail settings (aka utilizing Autodiscover), you see a certificate error as follows:

OutlookAutoD02

If you click View Certificate, you will see your public website’s certificate.

OutlookAutoD03

In the “External Autodiscover Access” section, I mentioned that https://domain.com/autodiscover/autodiscover.xml is attempted before https://autodiscover.domain.com/autodiscover/autodiscover.xmlhttps://domain.com may be going to your Public Facing website.  This isn’t generally a problem.  Outlook will detect that https://domain.com is not Exchange and Outlook will still fallback to https://autodiscover.domain.com and then eventually connect to Exchange.

However, if your https://domain.com site is having a certificate error even in a Web Browser, Outlook will prompt you with the certificate error and the fallback process to https://autodiscover.domain.com will fail.  Your public facing website is most likely http://www.domain.com, but the website has been set up to allow for http://domain.com, and SSL is activated on https://domain.com.  But, your public facing website only has the www.domain.com name on the cert.  It does not have domain.com on the certificate.  It is this reason that going to https://domain.com will cause a certificate error in a web browser or in Outlook.

There are one of two fixes for this:

  • Add the domain.com to your Public Facing Website’s certificate.  That way, Outlook makes a successful connection to https://domain.com, determines it’s not Exchange, and will fallback to attempting autodiscover via https://autodiscover.domain.com.  (Preferred Option for obvious secure reason)
  • Disable SSL on the public facing website’s root.  That way, no cert error would ever happen and Outlook will happily fallback to using autodiscover.domain.com. (Less Preferred Option for obvious secure reasons.  Be absolutely sure you do not require any kind of encrypted security on your website if you go this route).
Share

How Anonymous Relay works in Exchange 2013

This article is to provide you, the reader, the knowledge on how to properly create an Exchange 2013 Relay Connector.

In Exchange 2013, I am utilizing a multi-role server that has both the Client Access Server and Mailbox Server roles. We’ll want to head to the mail flow section in the Exchange Administration Center (EAC) that you can access by going to https://OWA.domain.com/ECP.

E15Relay02

Once in this mail flow section, we’ll click the tab called receive connectors which will allow us to see all receive connectors that exist.

E15Relay03

As you can see, there are connectors for FrontendTransport and connectors for HubTransport.  FrontEndTransport belongs to the Client Access Server Role and the HubTransport role belongs to the Mailbox Server role.

Let’s take a look at the “Default B-E15DAG1″ receive connector that belongs to the HubTransport role  as well as the “Default Frontend B-E15DAG1″ that belongs to the FrontendTransport role.

Taking a look at the “Default FrontEnd B-E15DAG1″, we can see that the connector listens on port 25 as we would expect.

E15Relay05

Taking a look at the “Default B-E15DAG1″ receive connector, we can see it listens on port 2525 which is something we haven’t seen before.

E15Relay04

All mail flow should come into the Frontend Transport which then delivers it to the appropriate mailbox server where the mailboxes exist.  On a multi-role server, these two roles cannot utilize the same ports as they are two different services.  What this means is, when creating a relay connector, this connector must be created on the Client Access Server role that owns the Frontend Transport because this service is the service that owns port 25.  If you try to create a receive connector on the Mailbox Server role that owns the HubTransport service, mail flow may work temporarily, but it will eventually fail due to both the FrontendTransport and HubTransport services fighting each other for port 25.  Obviously if the Client Access Server and Mailbox Server roles are on different servers, it’s not an issue.

To create our relay connector, we’ll choose the + symbol to create a new Receive Connector.

E15Relay06

Give the connector a name and be sure to choose Frontend Transport and Custom. Click Next.

E15Relay07

The default settings here are fine.  We want port 25 due to what I mentioned above. Click Next.

E15Relay08

In the remote network settings, it is important to ensure that you remove 0.0.0.0-255.255.255.255.  We want to explicitly define what servers are allowed to relay to ensure our server does not turn into an open relay for everybody.  In my case, I am going to add 192.168.50.2 which may be a printer, custom application, etc…  But the server that owns 192.168.50.2 would need to relay.  Once this is done, click Finish.

E15Relay09

Once the relay connector is created, open its properties, go to security, and make sure you check Anonymous Users.

E15Relay10

So what really happens when you place a check mark in the Anonymous users group in the above screenshot?  A lot of people are afraid to place a checkmark in that box in fear that anonymous users will be able to relay off your Exchange Server.  This is NOT the case.

When you place a checkmark in that box, the following permissions are given to the Anonymous Logon group:

  • Ms-Exch-SMTP-Submit
  • Ms-Exch-SMTP-Accept-Any-Sender
  • Ms-Exch-SMTP-Accept-Authoritative-Domain-Sender
  • Ms-Exch-Accept-Headers-Routing

So, as you can see, there is no Ms-Exch-SMTP-Accept-Any-Recipient permission added by default.  Because of this, users will NOT be able to relay off your Exchange Server by default.

To activate Anonymous users to use this connector for relaying, you must issue the following command: Get-ReceiveConnector “Receive Connector Name” | Add-ADPermission -User “NT AUTHORITY\ANONYMOUS LOGON” -ExtendedRights “Ms-Exch-SMTP-Accept-Any-Recipient”

The command should be easy enough to read, but what it essentially does is retrieve the receive connector that you created, add a permission into Active Directory for the Anonymous Logon group, and assign that group the Ms-Exch-SMTP-Accept-Any-Recipient permission for that group on that connector.  Once this is done, any server IPs you added to the Remote Network settings will be allowed to relay off your server utilizing port 25.

E15Relay11

Now you may be thinking, why should I create this new connector?  Well, Exchange will always look to see how specific you are on a connector.  So let’s say we have a SharePoint Server at 192.168.119.150.  We would create a relay connector and allow ONLY 192.168.119.150 to relay.  So when Exchange receives SMTP from an address of 192.168.119.150, it will see there are a few connectors.  One being the Default Receive Connector and one being the Relay Connector.  The Default Receive Connector allows connections from any IP Address while the Relay Connector only allows connections from 192.168.119.150.  Because you explicitly set the address on your Relay Connector, that is given higher preference in serving that SMTP connection from SharePoint and your SharePoint Server will now be able to relay off of Exchange (even though you can configure SharePoint to authenticate, but still just giving an example).

Now, for servers that will have a lot of relay traffic, there are some more steps you need to do on your Receive Connector.  If you see that you have mail flow issues where things periodically work with relaying and sometimes they don’t, it’s recommended to run the following commands on your Relay Connector.

Set-ReceiveConnector -identity “Relay Connector Name” -TarpitInterval 00:00:00

Set-ReceiveConnector -identity “Relay Connector Name” -ConnectionTimeout 00:30:00

Set-ReceiveConnector -identity “Relay Connector Name” -ConnectionInactivityTimeout 00:20:00

Set-ReceiveConnector -identity “Relay Connector Name” -MaxAcknowledgementDelay 00:00:00

Set-ReceiveConnector -identity “Relay Connector Name” -MaxInboundConnection 10000

Set-ReceiveConnector -identity “Relay Connector Name” -MaxInboundConnectionPercentagePerSource 100

Set-ReceiveConnector -identity “Relay Connector Name” -MaxInboundConnectionPerSource unlimited

So in my case, I would run the following command which would allow me to do Get-ReceiveConnector and pipe into Set-ReceiveConnector to make all the modifications in one command:

Get-ReceiveConnector -Identity “Relay Connector Name” | Set-ReceiveConnector -TarpitInterval 00:00:00 -ConnectionTimeout 00:30:00 -ConnectionInactivityTimeout 00:20:00 -MaxAcknowledgementDelay 00:00:00 -MaxInboundConnection 10000 -MaxInboundConnectionPercentagePerSource 100 -MaxInboundConnectionPerSource unlimited

E15Relay12

If you are wondering what the default settings were, I ran the following to view the defaults before running Set-ReceiveConnector.

E15Relay13

 

Share

Exchange 2010 Site Resilient DAGs and Majority Node Set Clustering – Part 3

Welcome to Part 3 of Exchange 2010 Site Resilient DAGs and Majority Node Set Clustering.  In Part 1, I discussed what Majority Node Set Clustering is and how it works with Exchange Site Resilience when you have one DAG member in a Primary Site and one DAG member in a Failover Site.  In Part 2, I discussed how Majority Node Set Clustering works with Exchange Site Resileince when you have two DAG members in a Primary Site and one DAG member in a Failover Site. In this Part, I will show an example of how Majority Node Set Clustering works with Exchange Site Resilience when you have two DAG members in a Primary Site and two DAG members in a Failover Site.

Part 1

Part 2

Part 3

Real World Examples

Each of these examples will show DAG Models with a Primary Site and a Failover Site.

4 Node DAG  (Two in Primary and Two in Failover)

In the following screenshot, we have 4 Servers.  Four are Exchange 2010 Multi-Role Servers; two in the Primary Site and two in the Failover Site.  The Cluster Service is running only on the four Exchange Multi-Role Servers.  More specifically, it would run on the Exchange 2010 Servers that have the Mailbox Server Role. When Exchange 2010 utilizes an even number of Nodes, it utilizes Node Majority with File Share Witness.  If you have dedicated HUB and/or HUB/CAS Servers, you can place the File Share Witness on those Servers.  However, the File Share Witness cannot be placed on the Mailbox Server Role.

So now we have our five Servers; four of them being Exchange.  This means we have five voters.  Four of the Mailbox Servers that are running the cluster service are voters and the File Share Witness is a witness that the voters use to maintain cluster quorum.  So the question is, how many voters/servers/cluster objects can I lose?  Well if you read the section on Majority Node Set (which you have to understand), you know the formula is (number of nodes /2) + 1.  This means we have (4 Exchange Servers / 2) = 2 + 1 = 3.  This means that 3 cluster objects must always be online for your Exchange Cluster to remain operational.

But now let’s say one or two of your Exchange Servers go offline.  Well, you still have at least three cluster objects online.  This means your cluster will be still be operational.  If all users/services were utilizing the Primary Site, then everything continues to remain completely operational.  If you were sending SMTP to the one of the servers in the Failover Site or users were for some reason connecting to the Failover Site, they will need to be pointed to another Exchange Server that is operational in the Primary Site or the Failover Site. This of course depends on whether the user databases are being replicated from a mailbox database failover standpoint.

But what happens if you lose a third node in which all DAG members in the Failover Site go offline including the FSW? Well, based on the formula above we need to ensure we have 3 cluster objects operational at all times.  At this time, the entire cluster goes offline.  You need to go through steps provided in the site switchover process but in this case, you would be activating the Primary Site and specify a new Alternative File Share Witness Server that exists in the Primary Site so you can active the Exchange 2010 Server in the Primary Site.  The DAG will actively use the File Share Witness since there will be 2 Exchange DAG Members remaining which is an even number of nodes.  And again, when you have an even number of nodes, you will use a File Share Witness.

But what happens if you lose two nodes in the Primary Site as well as the FSW due to something such as Power Failure or a Natural Disaster? Well, based on the formula above we need to ensure we have 3 cluster objects operational at all times.  At this time, the entire cluster goes offline.  You need to go through steps provided in the site switchover process but in this case, you would be activating the Failover Site and specify a new Alternative File Share Witness Server that exists (or will exist) in the Failover Site so you can activate the Exchange 2010 Servers in the Failover Site.   The DAG will actively use the Alternate File Share Witness since there will be 2 Exchange DAG Members remaining which is an even number of nodes.  And again, when you have an even number of nodes, you will use a File Share Witness.

Once the Datacenter Switchover has occurred, you will be in a state that looks as such.  An Alternate File Share Witness is not for redundancy for your 2010 FSW that was in your Primary Site.  It’s used only during a Datacenter Switchover which is a manual process.

Once your Primary Site becomes operational, you will re-add the two Primary DAG Servers to the existing DAG which will still be using the 2010 Alternate FSW Server in the Failover Site and you will now be switched into a Node Majority with File Share Witness Cluster instead of just Node Majority.  Remember I said with an odd number of DAG Servers, you will be in Majority Node Witness and with an even number, the Cluster will automatically switch itself to Node Majority with File Share Witness?  You will now be in a state that looks as such.

Part of the Failback Process would be to switch back to the old FSW Server in the Primary Site.  Once done, you will be back into your original operational state.

As you can see with how this works, the question that may arise is where to put your FSW?  Well, it should be in the Primary Site with the most users or the site that has the most important users.  With that in mind, I bet another question arises?  Well, why with the most users or the most important users?  Because some environments may want to use the above with an Active/Active Model instead of an Active/Passive.  Some databases may be activated in both sites.  But, with that, if the WAN link goes down, the Exchange 2010 Server in the Failover Site loses quorum since it can’t contact at least 2 other cluster objects.  Again, you must have three cluster objects online.  This also means that each cluster object must be able to see two other cluster objects.  Because of that, the Exchange 2010 Server will go completely offline.

To survive this, you really must use 2 different DAGs.  One DAG where the FSW is in the First Site and a second DAG where its FSW is in the Second Site.  In my example, users that live in the First Active Site would primarily be using the Exchange 2010 DAG Members in the First Active Site which would be on DAG 2.  Users that live in the Second Active Site would primarily be using the Exchange 2010 DAG Members in the Second Active Site which would be on DAG 1. This way, if anything happens with the WAN link, users in the First Active Site would still be operational as the FSW for their DAG is in the First Active Site and DAG 2 would maintain Quorum.  Users in the Second Active Site would still be operational as the FSW for their DAG is in the Second Active Site and DAG 1 would maintain Quorum.

Note: This would require twice the amount of servers since a DAG Member cannot be a part of more than one DAG.  As shown below, each visual representation below of a 2010 HUB/CAS/MBX is a separate server.

The Multi-DAG Model would look like this.

Share

Exchange 2010 Site Resilient DAGs and Majority Node Set Clustering – Part 2

Welcome to Part 2 of Exchange 2010 Site Resilient DAGs and Majority Node Set Clustering.  In Part 1, I discussed what Majority Node Set Clustering is and how it works with Exchange Site Resilience when you have one DAG member in a Primary Site and one DAG member in a Failover Site.  In this Part, I will show an example of how Majority Node Set Clustering works with Exchange Site Resilience when you have two DAG members in a Primary Site and one DAG member in a Failover Site.

Part 1

Part 2

Part 3

Real World Examples

In Part 1, I showed a Real World example when you have one Exchange DAG member in the Primary Site and one Exchange DAG member in the Failover Site.  In this Part, I am showing a Real World example when you have two Exchange DAG members in the Primary Site and one Exchange DAG member in the Failover Site.

3 Node DAG  (Two in Primary and One in Failover)

In the following screenshot, we have 3 Servers.  Two are Exchange 2010 Multi-Role Servers; one in the Primary Site and one on the Failover Site.  The Cluster Service is running on all three Exchange Multi-Role Servers.  More specifically, it would run on the Exchange 2010 Servers that have the Mailbox Server Role. When Exchange 2010 utilizes an even number of Nodes, it utilizes Node Majority with File Share Witness.  Because we have an odd number of Nodes, we are utilizing Node Majority and will not utilize a File Share Witness.

So now we have our three Servers; all three of them being Exchange.  This means we have three voters and do not need a File Share Witness as we have a third node.  So the question is, how many voters/servers/cluster objects can I lose?  Well if you read the section on Majority Node Set (which you have to understand), you know the formula is (number of nodes /2) + 1.  This means we have (3 Exchange Servers / 2) rounded down = 1 + 1 = 2.  This means that 2 cluster objects must always be online for your Exchange Cluster to remain operational just like if we were utilizing 2 DAG members with a File Share Witness.

But now let’s say one of your Exchange Servers go offline.  Well, you still have at least two cluster objects online.  This means your cluster will be still be operational.  If all users/services were utilizing the Primary Site, then everything continues to remain completely operational.  If you were sending SMTP to the Failover Site or users were for some reason connecting to the Failover Site, they will need to be pointed to the Exchange Server in the Primary Site.

But what happens if you lose a second node? Well, based on the formula above we need to ensure we have 2 cluster objects operational at all times.  At this time, the entire cluster goes offline.  You need to go through steps provided in the site switchover process but in this case, you would be activating the Primary Site and specify a new Alternative File Share Witness Server that exists in the Primary Site so you can active the Exchange 2010 Server in the Primary Site.  The DAG won’t actively use the File Share Witness but you should specify it anyways because part of the Failback process is re-adding the Primary Site Servers back to the DAG once they become operational. And once you re-add the second DAG node, you now have two DAG members in the DAG which will want to switch the DAG Cluster into a Node Majority with File Share Witness which is why you need to still specify a File Share Witness.

But what happens if you lose two nodes in the Primary Site? Well, based on the formula above we need to ensure we have 2 cluster objects operational at all times.  At this time, the entire cluster goes offline.  You need to go through steps provided in the site switchover process but in this case, you would be activating the Failover Site and specify a new Alternative File Share Witness Server that exists (or will exist) in the Failover Site so you can activate the Exchange 2010 Server in the Primary Site.   The DAG won’t actively use the File Share Witness but you should specify it anyways because part of the Failback process is re-adding the Primary Site Servers back to the DAG once they become operational.

Once the Datacenter Switchover has occurred, you will be in a state that looks as such.  An Alternate File Share Witness is not for redundancy for your 2010 FSW that was in your Primary Site.  It’s used only during a Datacenter Switchover which is a manual process.

Once your Primary Site becomes operational, you will re-add the Primary DAG Server to the existing DAG which will still be using the 2010 Alternate FSW Server in the Failover Site and you will now be switched into a Node Majority with File Share Witness Cluster instead of just Node Majority.  Remember I said with an odd number of DAG Servers, you will be in Majority Node Witness and with an even number, the Cluster will automatically switch itself to Node Majority with File Share Witness?  You will now be in a state that looks as such.

Part of the Failback Process would be to switch to a FSW Server in the Primary Site.  Once done, you will be back into your original operational state.

Now the final step of the Failback Process would be to re-add your final remaining DAG Member in the Primary Site.  Once done, your cluster will switch back into a Node Majority Cluster and will no longer be utilizing the FSW.

As you can see with how this works, the question that may arise is where to put your the majority of your Exchange DAG Members?  Well, it should be in the Primary Site with the most users or the site that has the most important users.  With that in mind, I bet another question arises?  Well, why with the most users or the most important users?  Because some environments may want to use the above with an Active/Active Model instead of an Active/Passive.  Some databases may be activated in both sites.  But, with that, if the WAN link goes down, the Exchange 2010 Server in the Failover Site loses quorum since it can’t contact at least 1 other cluster object.  Again, you must have two cluster objects online.  This also means that each cluster object must be able to see one other cluster object.  Because of that, the Exchange 2010 Server will go completely offline.

To survive this, you really must use 2 different DAGs.  One DAG where the majority of your Exchange 2010 DAG Members is in the First Site and a second DAG where the majority of the Exchange 2010 DAG Members is in the Second Site.  Users that live in the First Active Site would primarily be using the Exchange 2010 DAG Members in the First Active Site.  Users that live in the Second Active Site would primarily be using the Exchange 2010 DAG Members in the Second Active Site. This way, if anything happens with the WAN link, users in the First Active Site would still be operational as the majority of its Exchange 2010 DAG Members for their DAG is in the First Active Site and DAG 1 would maintain Qourum.  Users in the Second Active Site would still be operational as the majority of its Exchange 2010 DAG Members for their DAG is in the Second Active Site and DAG 2 would maintain Quorum.

Note: This would require twice the amount of servers since a DAG Member cannot be a part of more than one DAG.  As shown below, each visual representation below of a 2010 HUB/CAS/MBX is a separate server.

The Multi-DAG Model would look like this.

 

Share

Exchange 2010 Site Resilient DAGs and Majority Node Set Clustering – Part 1

I’ve talked about this topic in some of my other articles but wanted to create an article that talks specifically about this model and show several different examples in a Database Availability Group (DAG)’s tolerance for node and File Share Witness (FSW) failure.  Many people don’t properly understand how the Majority Node Set Clustering Model works.  In my article here, I talk about Database Activation Coordination Mode and have a section on Majority Node Set.  In this article, I want to visibly show show some real world examples on how the Majority Node Set Clustering Model works.  This will be a multi-part article and each Part will have its own example.

Part 1

Part 2

Part 3

Majority Node Set

Majority Node Set is a Windows Clustering Model such as the Shared Quorum Model, but different.  Both Exchange 2007 and Exchange 2010 Clusters use Majority Node Set Clustering (MNS).  This means that 50% of your votes (server votes and/or 1 file share witness) need to be up and running.  The proper formula for this is (n / 2) + 1 where n is the number of DAG nodes within the DAG. With DAGs, if you have an odd number of DAG nodes in the same DAG (Cluster), you have an odd number of votes so you don’t have a witness.  If you have an even number of DAGs nodes, you will have a file share witness in case half of your nodes go down, you have a witness who will act as that extra +1 number.

So let’s go through an example.  Let’s say we have 3 servers. This means that we need (number of nodes which is 3 / 2) + 1  which equals 2 as you round down since you can’t have half a server/witness.  This means that at any given time, we need 2 of our nodes to be online which means we can sustain only 1 (either a server or a file share witness) failure in our DAG.  Now let’s say we have 4 servers.  This means that we need (number of nodes which is 4 / 2) + 1 which equals 3.  This means at any given time, we need 3 of our servers/witness to be online which means we can sustain 2 server failures or 1 server failure and 1 witness failure.

Real World Examples

Each of these examples will show DAG Models with a Primary Site and a Failover Site.

2 Node DAG  (One in Primary and One in Failover)

In the following screenshot, we have 3 Servers.  Two are Exchange 2010 Multi-Role Servers; one in the Primary Site and one on the Failover Site.  The Cluster Service is running only on the two Exchange Multi-Role Servers.  More specifically, it would run on the Exchange 2010 Servers that have the Mailbox Server Role. When Exchange 2010 utilizes an even number of Nodes, it utilizes Node Majority with File Share Witness.  If you have dedicated HUB and/or HUB/CAS Servers, you can place the File Share Witness on those Servers.  However, the File Share Witness cannot be placed on the Mailbox Server Role.

So now we have our three Servers; two of them being Exchange.  This means we have two voters and a File Share Witness.  Two of the Mailbox Servers that are running the cluster service are voters and the File Share Witness is just a witness that the voters use to determine cluster majority.  So the question is, how many voters/servers can I lose?  Well if you read the section on Majority Node Set (which you have to understand), you know the formula is (number of nodes /2) + 1.  This means we have (2 Exchange Servers / 2) = 1 + 1 = 2.  This means that 2 cluster objects must always be online for your Exchange Cluster to remain operational.

But now let’s say one of your Exchange Servers go offline.  Well, you still have at least two cluster objects online.  This means your cluster will be still be operational.  If all users/services were utilizing the Primary Site, then everything continues to remain completely operational.  If you were sending SMTP to the Failover Site or users were for some reason connecting to the Failover Site, they will need to be pointed to the Exchange Server in the Primary Site.

But what happens if you lose a second node? Well, based on the formula above we need to ensure we have 2 cluster objects operational at all times.  At this time, the entire cluster goes offline.  You need to go through steps provided in the site switchover process but in this case, you would be activating the Primary Site and specify a new Alternative File Share Witness Server that exists in the Primary Site so you can active the Exchange 2010 Server in the Primary Site.  The DAG won’t actively use the File Share Witness but you should specify it anyways because part of the Failback process is re-adding the Primary Site Servers back to the DAG once they become operational.

But what happens if you lose two nodes in the Primary Site? Well, based on the formula above we need to ensure we have 2 cluster objects operational at all times.  At this time, the entire cluster goes offline.  You need to go through steps provided in the site switchover process but in this case, you would be activating the Failover Site and specify a new Alternative File Share Witness Server that exists (or will exist) in the Failover Site so you can activate the Exchange 2010 Server in the Primary Site.   The DAG won’t actively use the File Share Witness but you should specify it anyways because part of the Failback process is re-adding the Primary Site Servers back to the DAG once they become operational.

Once the Datacenter Switchover has occurred, you will be in a state that looks as such.  An Alternate File Share Witness is not for redundancy for your 2010 FSW that was in your Primary Site.  It’s used only during a Datacenter Switchover which is a manual process.

Once your Primary Site becomes operational, you will re-add the Primary DAG Server to the existing DAG which will still be using the 2010 Alternate FSW Server in the Failover Site and you will now be switched into a Node Majority with File Share Witness Cluster instead of just Node Majority.  Remember I said with an odd number of DAG Servers, you will be in Node Majority and with an even number, the Cluster will automatically switch itself to Node Majority with File Share Witness?  You will now be in a state that looks as such.

Part of the Failback Process would be to switch back to the old FSW Server in the Primary Site.  Once done, you will be back into your original operational state.

As you can see with how this works, the question that may arise is where to put your FSW?  Well, it should be in the Primary Site with the most users or the site that has the most important users.  With that in mind, I bet another question arises?  Well, why with the most users or the most important users?  Because some environments may want to use the above with an Active/Active Model instead of an Active/Passive.  Some databases may be activated in both sites.  But, with that, if the WAN link goes down, the Exchange 2010 Server in the Failover Site loses quorum since it can’t contact at least 1 other voter.  Again, you must have two voters online.  This also means that each voter must be able to see one other voter.  Because of that, the Exchange 2010 Server will go completely offline.

To survive this, you really must use 2 different DAGs.  One DAG where the FSW is in the First Site and a second DAG where its FSW is in the Second Site.  Users that live in the First Active Site would primarily be using the Exchange 2010 DAG Members in the First Active Site.  Users that live in the Second Active Site would primarily be using the Exchange 2010 DAG Members in the Second Active Site. This way, if anything happens with the WAN link, users in the First Active Site would still be operational as the FSW for their DAG is in the First Active Site and DAG 1 would maintain Qourum.  Users in the Second Active Site would still be operational as the FSW for their DAG is in the Second Active Site and DAG 2 would maintain Quorum.

Note: This would require twice the amount of servers since a DAG Member cannot be a part of more than one DAG.  As shown below, each visual representation below of a 2010 HUB/CAS/MBX is a separate server.

The Multi-DAG Model would look like this.

 

Share

Export Spoken Name in Exchange 2010 UM

I was asked by a client recently if there was anyway to export the Spoken Name in Exchange UM to a WAV file.  You can’t export this to a WAV file but you can export it to a WMV-9 file which you can then use other means to convert it to a WAV file.

Now when I say SpokenName, I am referring to the audio you hear when you press this audio icon in the Outlook Contact Card.

To export this, the steps are relatively simple (though I didn’t figure this one out on my own and a very helpful Microsoft fellow CYC gave me most of the code).

?View Code POWERSHELL
1
Export-RecipientDataProperty -Identity  -SpokenName | foreach-object { Add-Content -Value $_.FileData -Path "C:\Exports\identity.wma" -Encoding Byte }

Let’s look at an example.  I will export my own Spoken Name to WMA file.

We can see that no file currently exists.

We will now export the data to a WMV-9 file as well as re-verify that the file was created.

Share

Exchange 2010 Site Resilience, Multiple DAG IPs, and Cluster Resources

Exchange 2010 allows us to have Database Availability Group (DAG) members in several AD Sites.  For every subnet a DAG member’s MAPI NIC is in, we must obtain a DAG IP.  This DAG IP is a separate IP than is located on the MAPI NICs themselves. We take this DAG IP to the DAG using the Set-DatabaseAvailabilityGroup command.

Multiple DAG IPs

Let’s take a look at an example of how the architecture may look.

Taking a look at the above Visio diagram, we have two sites, Primary Site and DR Site, with one node in each.  The MAPI NIC in the Primary Site has an IP Address of 172.17.24.200.  That means that we’ll need to have a DAG IP that lives in this same subnet.  We choose a DAG IP of 172.17.24.120.  The MAPI NIC in the DR Site has an IP Address of 172.16.24.200. That means that we’ll need to have a DAG IP that lives in this same subnet.  We choose a DAG  IP of 172.16.24.120.

In order to add these MAPI IP Addresses, we’ll need to run the following the command.

Note: IPs on Replication NIC’s subnet do not get added to the Database AvailabilityGroupIPAddresses. Only MAPI NIC Subnets get added.

Keep in mind, when adding additional IPs in the future, it is important that you include all existing DAG IPs.  The Set-DatabaseAvailabilityGroup -DatabaseAvailabilityGroupIPAddresses property is not additive.

To verify the DAG IPs were added successfully, let’s check out our DAG Properties.

In Exchange 2010 SP1, we have the ability to add our DAG IPs via the GUI. If we go to the DAG Properties, we now see we can manage our Witness Server and Alternate Witness Server.

This allows us to do our IP Address configuration right from the GUI instead of needing to use Set-DatabaseAvailabilityGroup  with the DatabaseAvailabilityGroupIPAddresses property and needing to worry about all previous IP Addresses being included since the property isn’t additive.

Cluster Resources

So, let’s take a look at what really happens to the cluster resources and what determines which DAG IP is active.  Let’s open the Failover Cluster Manager.  Start > Administrative Tools > Failover Cluster Manager.

After selecting our DAG, let’s take a look at the cluster resources.  We can see from here that we have two Network IP Resources.

But let’s take even a deeper look.

Select the DAG from within the Cluster Core Resources > Right-Click > Choose Properties.

Now let’s take a look at the Dependencies Tab.

As we can see, the two DAG IPs are set up with an OR dependency which means that the cluster can activate either DAG IP at any given time.  As we saw earlier, the 172.16.24.120 IP is the existing DAG IP that is online which means the DRSiteNode’s DAG IP is currently the online Network IP resource.

Let’s run a cluster command so we can failover the default “Cluster Group” from one cluster node to another.

We now see the PrimarySiteNode is the node that has the “Cluster Group.”  Let’s go ahead and take a look at the Cluster Resources again and see which Network IP Resource is online.

Looks like the PrimarySiteNode’s DAG IP is now Online instead of the DRSiteNode’s DAG IP.  This means that the Network IP Resource that is online depends on which DAG Node has the “Cluster Group.”  If you recall from my previous articles, the DAG Node that has the “Cluster Group” is the DAG Node that acts as the Primary Active Manager.  The Primary Active Manager is the DAG Node responsible for choosing what databases get activated in a failover.  For more information on Active Manager, click here.

Share

Next »