RSS Subscription 122 Posts and 1,275 Comments

Exchange 2010 RTM High Availability Load Balancing Options

With Exchange 2010 comes many advantages in the HA realm.  One of them is the ability to connect to the Client Access Server for RPC.  This means, when a Mailbox Server does a *over (failover or a switchover), the user is still connected to their RPC Endpoint.  You can also create a Client Access Array which load balances your RPC Endpoint on your CAS Servers.  Lots of information on the RPC Client Access Server here and here.  So what options are available for load balancing this new RPC Client Access Array and at the same time, load balancing all our other services?  And what are the pros/cons of each method?  If you want to know, read on…

Exchange Load Balancing Options

In Exchange 2007, if you wanted any type of HA, you needed at least four servers.  2 for CCR Nodes and 2 for HUB/CAS Nodes.  The reason why you cannot have 2 nodes altogether is that CCR Nodes were limited to the Mailbox role only.  For an Exchange Site, you need to always have at least the  HUB/CAS/MBX Role  for that site to be operational.  In Exchange 2010, more options are now available.  You now have something called Database Availability Groups (DAGs).  These DAG members can contain all Exchange roles (HUB/CAS/MBX/UM) but still may not contain the Edge Transport role.

There is a problem though.  There is a Windows limitation that allows you to install Windows Network Load Balancing on a server that also contains Failover Clustering Services. So while we can now have 2 Exchange 2010 Servers, we need a way to load balance the CAS role to provide High Availability for the following CAS Services:

  • Outlook Web App (formerly Outlook Web Access) (HTTP Traffic)
  • Exchange Control Panel (HTTP Traffic)
  • Exchange Web Services (HTTP Traffic)
  • Exchange ActiveSync (HTTP Traffic)
  • Autodiscover (HTTP Traffic)
  • Offline Address Book (HTTP Traffic)
  • Outlook Anywhere (HTTP Traffic)
  • RPC Client Access (RPC  Traffic)

There are a few options for load balancing.  The first is the ability to use ISA.  The problem here, is that ISA can only load balance HTTP-based traffic.  If you take a look at the bulleted list above, you can see that RPC Client Access Service is RPC Traffic which means that ISA cannot load balance this traffic.  We have a few load balancing options then:

  1. 2 Multi-Role DAG Members and Hardware Load Balancers – Utilize 2 Multi-Role DAG Members (MBX/HUB/CAS).  Use a hardware load balancer to load balance all of the bulleted items above including the RPC Client Access Service using an RPC Client Access Array which load balances port 135 for the RPC Endpoint Mapper and 1024-65535 ports.  Typically, since you are using High Availability, this means that you would most likely want to have 2 hardware load balancers.
  2. 2 DAG Members, 2 HUB/CAS Servers, and Windows Network Load Balancing - Utilize 2 DAG Members (MBX).  Use 2 HUB/CAS Servers with Windows Network Load Balancing.  Windows Network Load Balancing will load balance all of the bulleted items above including the RPC Client Access Service using an RPC Client Access Array which load balances port 135 for the RPC Endpoint Mapper and 1024-65535 ports.
  3. 2 DAG Members and DNS Round Robin - Use 2 Multi-Role DAG Members (MBX/HUB/CAS).  Use DNS Round Robin to achieve a “poor man’s solution” type of load balancing.  With this scenario, you will not have automatic failover for the RPC Client Access Service.  You will essentially create two A Record for the RPC Client Access Array; one pointing to the first multi-role DAG Member and one pointing to the second multi-role DAG Member.  You will most likely want to lower the TTL values of these DNS records to 5 minutes so if a failure does happen, you can remove one of the A records and the clients will flush their DNS cache within 5 minutes time.
  4. 2 DAG Members, ISA/TMG/UAG, and either Hardware Load Balancing or DNS Round Robin - Use 2 Multi-Role DAG Members (MBX/HUB/CAS).  Use ISA/TMG/UAB to load balance all HTTP items from the bulleted list above. The issue here is that now with Exchange 2010, for mailbox access, users connect to the Client Access Server for their RPC Endpoint.  To make this redundant, we create an RPC Client Access Array.  This RPC Client Access Array can be load balanced through a hardware load balancer, DNS Round Robin, or Windows Network Load Balancing.  ISA/TMG/UAG cannot load balance non-HTTP Traffic.  So if you have ISA/TMG/UAG, you can still use it to load balance all HTTP Traffic but you would still need to use a Hardware Load Balancer, DNS Round Robin, or Windows Network Load Balance to load balance the RPC Client Access Array.  The example picture below will show the use of UAG with a Hardware Load Balance mix.

Exchange Load Balancing Options and their benefits

Taking a look at the above list of options, we can use several different options including Windows Network Load Balancing, Hardware Load Balancing, and DNS Round Robin. Each has their pros and cons in terms of cost and functionality.

Hardware Load Balancing

Hardware Load Balancers can have the most capacity in terms of user connections.  But for SMBs, you won’t have to worry about load.  The load is more for very large organizations.  In fact, Microsoft recommends that if you are going to require over 7 HUB/CAS Servers in a load balanced farm, to use Hardware Load Balancers instead of Windows Network Load Balancing.  Hardware Load Balancers are also the most expensive option.

Hardware Load Balancers do have the best functionality from a perspective of Client to Server Affinity depending on the vendor used.  For example, we can use multiple affinities and have fallbacks to a specific affinity of our preferred affinity fails.  For example, we can set up up our hardware load balancers to use the following affinity in terms of preference:

  • Existing Browser-Based Cookie
  • Hardware Load Balanced created cookie
  • SSL Session ID
  • Source IP

The goal here is to make sure that every user is load balanced evenly and that automatic failover can occur quickly and smoothly.

Windows Network Load Balancers

Windows Network Load Balancers do not achieve as much capacity in terms of user connections as a Hardware Load Balancer, but they can still handle a lot of connections.  Windows Network Load Balanced farms can use as many as 8 CAS Servers without suffering a performance degradation.  In order to have the need for 8+ CAS Servers, you’ll need to have many users (tens of thousands). Windows Network Load Balancing is built into Windows Server and therefore, it’s a large cost savings in comparison to purchasing hardware load balancers.

Windows Network Load Balancers do not have as good of functionality of Hardware Load Balancers from a perspective of Client to Server Affinity.  For example, we only have one affinity method.  That method is Source IP.  The downside to using Source IP is if you have a lot of connections coming from a NAT’d Source IP. This means that all of these connections will end up hitting the same Client Access Server as again, the only Affinity Method a Windows Network Load Balancer has is Source IP.

Most likely, if you don’t have the need for more than 8 CAS Servers, Windows Network Load Balancing will suffice for you needs.  It’s cheap, comes with Windows, and does its job.

ISA Server, TMG, or UAG

ISA/TMG/UAG Servers to have more capabilities than Windows Network Load Balancers.  The one downside to them is that they cannot load balance RPC Traffic.  Because of that, you can still use ISA/TMG/UAG to load balance your HTTP traffic, but you’ll still need a Hardware Load Balancer or a Windows Network Load Balancer to load balance your RPC Client Access Array.

ISA/TMG/UAG do scale better than Windows Network Load Balancing but not as well as a Hardware Load Balancer.  ISA/TMG/UAG does not cost as much as a Hardware Load Balancer but is more expensive than Windows Network Load Balancing.  ISA/TMG/UAG also has the capability to do Load Balanced created cookies as well as Source IP Affinity depending on the protocol ISA/TMG/UAG is publishing.

Another upside to using ISA/TMG/UAG is that they can do pre-authentication.  This means that if a server goes down in which a client has affinity to, ISA/TMG/UAG still contains the authentication context of the user and automatically re-authenticates to the new Client Access Server.

DNS Round Robin

DNS Round Robin scales just as high as Hardware Load Balancers because the connections will just go directly to the Client Access Servers.  If anything, it has the highest scale as you don’t have anything in the middle doing anything with the connections.  It’s also free to use!  But in this case, free is not necessarily good because you lose a lot of functionality.  Hardware Load Balancers, Windows Network Load Balancers, and ISA/TMG/UAG all have the capability to detect server failures and automatically stop sending to the server and direct all traffic to a server that is operational.

DNS Round Robin has no automatic server failure detection.  If a host goes down, an Administrator will need to realize it, remove the DNS A/HOST Record for the server that went down, and then clients will have to wait for the TTL value on the old DNS record to expire.  When that happens, the client will begin connecting to the proper server. So you save a lot of money going with this option, but you lose all automation and gain downtime instead.

  1. 2 DAG Members and DNS Round Robin - Use 2 Multi-Role DAG Members (MBX/HUB/CAS).  Use ISA to load balance all HTTP items from the bulleted list above. Use DNS Round Robin to achieve a “poor man’s solution” type of load balancing.  With this scenario, you will not have automatic failover for the RPC Client Access Service.  You will essentially create two A Record for the RPC Client Access Array; one pointing to the first multi-role DAG Member and one pointing to the second multi-role DAG Member.  You will most likely want to lower the TTL values of these DNS records to 5 minutes so if a failure does happen, you can remove one of the A records and the clients will flush their DNS cache within 5 minutes time.
  • Share/Bookmark

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

36 Responses to “Exchange 2010 RTM High Availability Load Balancing Options”

  1. on 18 Mar 2010 at 11:07 amAG

    Interesting article thanks.

    I'm currently looking at these scenarios for smaller offices where I expect to have two multi-role servers at different physical locations, in this case due to cost constraints HLB and stretched VLANs will not be available.

    The solution I'm considering is using the DAG IP address to create a CAS array then pointing the databases on both servers to this CAS array. This provides an automatic failover equivalent of the DNS solutions you mention. I've only done very basic testing so far but it does look like a workable solution.

  2. on 18 Mar 2010 at 1:14 pmElan Shudnow

    Yep, that's a non-supported scenario though. I tried that scenario and it failed for me and it has failed for someone else as well. It seems to work flawlessly if you create a new cluster group though. Hopefully Microsoft tests this scenario out and starts supporting it as it makes a ton of sense to do for SMBs with 2 multi-role DAGs.

  3. on 18 Mar 2010 at 1:42 pm@FriendsOfQuest

    Great post Elan!

  4. on 18 Mar 2010 at 2:47 pmElan Shudnow

    Thank you! :)

  5. on 18 Mar 2010 at 5:16 pmMike Pfeiffer

    Good point about ISA/TMG/UAG not load balancing RPC Traffic…that seems to be a common oversight.

  6. on 19 Mar 2010 at 3:10 pmChris Lehr

    Good article Elan!

  7. on 19 Mar 2010 at 6:21 pmRichard

    In Exch 2010 does this require Exch Enterprise or will exch Standard do?
    We have a disaster recovery center that currently uses double take for replication, would exch 2010 allow standard edition servers to make use of built in replication features?

  8. on 19 Mar 2010 at 6:57 pmElan Shudnow

    You can use DAGs with Exchange Standard. You're just limited to 5 databases. You'd still need Windows Enterprise for the Clustering support. You'd only ever need to go to Exchange Enterprise when you want more than 5 databases. Be sure to check with your licensing specialist instead of just taking my advice!

  9. on 23 Mar 2010 at 11:37 amDag

    considering only 2 servers: how about setting up a cluster resource group with a virtual IP and a DNS name. The windows clustering is already installed, and exchange barely uses it, but could this be the answer to the fault tolerance problem for the CAS-role with only two servers. Off course, there will be no load-balancing, but…..

  10. on 23 Mar 2010 at 1:26 pmElan Shudnow

    I actually posted about this in a comment just above. It works but isn't supported. I only wrote about supported scenarios.

  11. on 24 Mar 2010 at 7:11 amMJP

    You also have to remember that Microsoft are pushing the smaller user toward BPOS and appear to have no real desire to provide a fully fault tolerant solution to small sites

  12. on 25 Mar 2010 at 4:36 pmRichard

    Thanks I'll keep this in mind when we get pushed to upgrade to 2010
    we'd more look for a failover type setup with one exch in a remote location (in case a hurricane wipes out the office one day)

  13. on 25 Mar 2010 at 12:05 pmAnonymous

    [...] [...]

  14. on 13 Apr 2010 at 10:12 amAEK

    What are the options around dual physical sites ?I'm looking at load balancing options for Edge (1 per site) and CAS/Hub (combined role) 2 per site and 3 dedicated mailbox servers (2 in primary site and 1 in secondary datacentre. Looking to use a single DAG (9 Db's in total 3 Active one on each of the 3 mailbox servers and 2 passive on each). Want to ensure resilience is available in the event of site failure……we have Hardware load balancers available in both the perimeter & internal network to use, but the failover side of things is confusing me slightly.

  15. on 13 Apr 2010 at 1:08 pmeshudnow

    For Edge, you'll either load balance (using MX records with the same weight) or just have fault tolerance (using MX record with different weights).
    For CAS, you'll just have load balancer in Site A have a VIP that goes to the CAS servers in the first site. You'll also create a CAS Array for this primary site.

    There are a lot more variables. You'll want to read the following documentation: http://technet.microsoft.com/en-us/library/dd6381... http://technet.microsoft.com/en-us/library/dd3510...

  16. on 13 Apr 2010 at 9:34 pmAbhi

    A quick question. Can I install Exchange 2010 Client Access and Hub Transport on the same server with NLB? These are the steps I'm going to follow, please correct me if I'm wrong:

    - Configure NLB on 2 Servers running Windows 2008 x64.
    - Install Client Access and Hub Transport Role on those server.
    - Associate the Array with databases.

    All Set?

  17. on 14 Apr 2010 at 3:35 ameshudnow

    Sounds good. You can even load balance SMTP ports. You just need to make sure you don't load balance SMTP traffic between Hub Transport Servers. The way to do this is by going into your Default Receive Connector and change Any IP to the LAN NIC IP. Then create a new Receive Connector and lock it down to your VIP. Now you can leverage your Windows NLB to the VIP which will go over the custom Receive Connector while HUB to HUB Traffic will still use the Default Receive Connector which is listening on your LAN IP which is not leverage NLB.

  18. on 22 Apr 2010 at 3:52 amKevin

    Hi Alan,

    I am currently running 2 exchange 2010, Created and configured DAG, Tested failover and it didn't work. I only have 2 servers, how do I configure so one server failed the other one will take control and outlook client still can be able send mail and receive mail. I read your info about creating cluster group with network name, network IP resouce then set CAS array FQDN to this new network IP resouce, could you please elaborate a bit more on this.

    Many thanks ELan

    cheers
    Kevin

  19. on 25 Apr 2010 at 12:53 pmeshudnow

    You just go into your Failover Cluster Management GUI, go to services, create a generic (I think it was generic) cluster group, give it a new unused IP, make up some FQDN resolvable on the internal network like 2010array.internalnamespace.com, point it to that cluster IP, then stamp your databases (Set-Mailboxdatabase -Identity Database -RPCClientAccessServer 2010array.internalnamespace.com.

  20. on 29 Apr 2010 at 7:34 pmMichael

    "The way to do this is by going into your Default Receive Connector and change Any IP to the LAN NIC IP. Then create a new Receive Connector and lock it down to your VIP. Now you can leverage your Windows NLB to the VIP which will go over the custom Receive Connector while HUB to HUB Traffic will still use the Default Receive Connector which is listening on your LAN IP which is not leverage NLB."

    Is this supported/documented by Microsoft? Do you have a link to a Microsoft article describing this?

  21. on 29 Apr 2010 at 7:42 pmElan Shudnow

    This has been supported since Exchange 2007 SP1. There's a link somewhere that simply tells you it's supported to do WNLB for non-server to server communication

  22. on 29 Apr 2010 at 9:22 pmAGSS

    Elan

    Can you elaborate on how the round robin DNS works for redundancy? I see that you lose some of the high availability that NLB provides. Is there any manual intervention if one of the cas servers is unavailable? Aside from creating 2 a records for the casarray to point to the cas servers and configuring TTL is there anything else that needs to be configured

  23. on 03 May 2010 at 3:19 ameshudnow

    Well, it's basic Round Robin DNS. If 1 server goes down, the client will still potentially use the downed server. So yes, some manual intervention is required. Nothing else really needs to be configured. Just the 2 DNS Records and modifying the TTL. The TTL piece obviously isn't a requirement, but rather a recommendation.

  24. [...] http://www.shudnow.net/2010/03/17/exchange-2010-rtm-high-availability-load-balancing-options/ [...]

  25. on 04 May 2010 at 9:31 pmMatt_Wade

    If I may Elan, I'd like to add that Microsoft has recently released two new TechNet articles on the subject of load balancing Exchange 2010.
    http://aspoc.net/archives/2010/05/04/load-balanci...

  26. [...] Exchange 2010 RTM High Availability Load Balancing Options [...]

  27. on 08 Jun 2010 at 6:30 pmMike

    Great post Elan! Thanks for the info.

    I am going with the two server multi role dag members and two hardware load balancers.
    My question is since I am not using CAS NLB how do I do my UCC certificate? Do I need 2 certs? I am seeeing alot of conflicting information in regards to UCC certs and 2010.

    mail.externaldomain.com
    casserver1.internaldomain.local
    casserver2.internaldomain.local
    autodiscover.externaldomain.com
    autodiscover.internaldomain.local (eliminate use split dns?)

    What about OWA and autodiscover, etc etc?

  28. on 08 Jun 2010 at 11:43 pmtyler

    So if I am understanding correctly in order to offer redundancy for both client access and the db you can do this 2 ways;
    4 servers, 2 for a DAG and 2 for NLB. This requires 4 os licenses and 4 exchange licenses
    2 Server and a hardware NLB device (or 2 NLB for true redundancy).

    It is to bad you cannot get it all in a 2 server solution. Trying to offer a true high availability solution is getting expensive.

  29. on 09 Jun 2010 at 5:14 ameshudnow

    It can depends. At minimum, you need at least 2:
    1. autodiscover.primarysmtpdomain.com
    2. webmail.domain.com

    If you're going to be co-existing with Exchange 2003 and/or Exchange 2007, you'll want to add a legacy FQDN which can be anything such as legacy.domain.com. If you will have several FQDNs on your Connectors, I would add the FQDN of those connectors as well so you can use this certificate for TLS negotiation.

  30. on 09 Jun 2010 at 12:11 pmMike

    Thanks for the reply.

    So If I am understanding correctly if you are running 2010 you do not need the entries where netbios name of cas server or cas array FQDN anymore? I think I found a post from you where you explain how to use a single cert with 2007 which worked for me but still get autodiscover errors with outlook 2007 when I have multiple exchange servers. I do in fact have a FQDN on one of my connectors but I was using it for anonymous authentication with TLS. I am doing an upgrade from 2007 so they will need to coexist for a shortime. So if I get a Unified Cert and I get up to 5 hosts which do I choose? Especially with a load balancer?

    1. autodiscover.primarysmtpdomain.com
    2. webmail.domain.com
    3.smtp.domain.com (internal connector)
    4.?
    5.?

    Thanks again

  31. on 09 Jun 2010 at 5:50 pmeshudnow

    Well, you really didn't need to do it with Exchange 2007 either. I need to update one of my articles. Truth is, you only ever really need the minimum I just told you. The key here, is that you would need to update all your InternalURLs, ExternalURLs, and AutodiscoverServiceInternalURI to your webmail.domain.com FQDN and either DNS Round Robin that FQDN or load balance it. And you don't absolutely have to have the FQDN of the connectors on the cert. The TLS selection process in Exchange will always fallback to the self-signed certificate if it's enabled for TLS in case the 3rd party certificate doesn't have that matching FQDN. I perosnally always leave the self-signed certificate on my servers only for SMTP as a precaution for TLS fallback. I will still include Connector FQDNs on the 3rd party certificate.

    So either way, only ever need an absolute minimum of what I posted previously. Just take that cert, export it, put it on the other servers and update all your web service URLs as I stated in the above paragraph.

  32. on 09 Jun 2010 at 5:30 pmtyler

    Thanks for the clarification on that.

    One more question. If I choose not to do NLB and just load two machines with all the features, setup a DAG but do not setup a CAS array if one machine fails it will auto fail over to the other correct? I am just trying to put together each option to present as a possible solution. Thanks.

    Tyler

  33. on 10 Jun 2010 at 1:27 ameshudnow

    It will not automatically failover. This is what NLB or HLBs are for. There's also the DNS Round Robin option. But as I state in my articles, that's not automatic and there is some administrator intervention but you won't have to change the FQDNs on the clients in regards to where they should connect.

  34. on 20 Jul 2010 at 8:00 amEkrem

    thank you very much Elan. it's really a useful and helpful article.

  35. [...] If you want to know, read on here. [...]

  36. on 29 Jul 2010 at 9:18 amEugene

    thanks for the article! The one I was lookig for.

Trackback this post | Feed on Comments to this post

Leave a Reply