RSS Subscription 167 Posts and 2,643 Comments

Lync 2010 Central Site Resilience w/ Backup Registrars, Failovers, and Failbacks – Part 1

Introduction

I’ve seen quite a bit of discussion on Lync 2010 Backup Registars.  There’s been some PowerPoints which show how a Backup Registrar works, there’s been some blogs that discuss how clients are handed back a primary registrar and backup registrar, etc…  What I have not seen, is one article that wraps everything together and shows it all in action.  That is what this multi-part article is going to do.

In this first part, we’ll go over what the lab setup is going to look like and take a look at the base topology and configuration before we really start diving into the actual scenario testing.

Part 1

Part 2

Part 3

General Information

Now the two best blog articles (both from Microsoft employees) I’ve found in regards to Backup Registrars (other than the official Planning for Central Site Resilience which you can find here) are:

To summarize the articles above to set the stage for my article, you will see below that I am not including a Director.  The key point Doug makes in his article is that when a Lync 2010 Registrar, in Doug’s article it is a Director, redirects a user to their home pool, that 301 redirect contains the user’s Primary Registrar and their Secondary (Backup Registrar).  Now any Registrar in the deployment can accept a user login and if that registrar is not the home pool for the given user, it issues a 301 redirect for that user.  One of the jobs of a Director is to always issue 301 redirects as one of its purposes is to handle client logins.  In my article, I have two pools and one pool will always be authenticating all users.  So in my case, I can simulate the 301 redirect piece by having Client2 that is homed on Pool2 by signing into Pool1.  That client will then get the 301 redirect.  We’ll take a look at the SIP traces later.

Chris Norman then makes a point that there is a file called Endpointconfiguration.cache which caches your last connected server.  In my example above, once Client2 gets a 301 redirect to his pool, he knows about his Primary and Backup Registrar.  On subsequent connects, he will use the information in his Endpointconfiguration.cache and will no longer get a 301 redirect and he won’t know about his Backup Registrar anymore.  Because of this, we need another mechanism for clients to be able to find another registrar in the environment.  And Chris Norman mentioned having additional SRV records with a different priority.  This way, if Client2 looked in his Endpointconfiguration.cache file, saw that he should connect to Pool2, and Pool2 happened to be down or it went down while he was connected, the client would find the other SRV record, know how to connect to Pool1, and voila, he is connected.

That is a very quick summary.  I would recommend giving the above two blog articles a read.  Or just read on as I will explain everything and show multiple scenarios.  Throughout the rest of the article series, I will be showing the following scenarios:

  • Showing Failover without SRVs.  What happens to ClientUser1 on A-L14FE1 (Pool1) when we take down one of the Pools.  What happens to ClientUser2 on A-L14FE2 (Pool2) when that same pool went offline?
  • How does fallback work?
  • What changes after we’ve cached our Primary Registrar for both users in the Endpointconfiguration.cache file?
  • What changes if we add a secondary SRV record?

Lab Setup

Guest Virtual Machines

One Server 2008 R2 Enterprise (Standard can be used) SP1 x64 Domain Controller which Certificate Services installed as the Enterprise Root Certificate Authority.

Two Server 2008 R2 Standard (Enterprise can be used) x64 (x64 required) Member Servers where Lync Server 2010 is installed. Both of these Lync Server 2010 Servers are installed as two separate Lync Standard Edition Pools.

Two Windows 7 X64 Enterprise (Professional and Ultimate can be used instead) Client Machines where the Lync 2010 Client will be installed.  One client machine is using a user account to connect to one Standard Edition Pool.  The other client machine is using a separate user account to connect to the other Standard Edition Pool.

Assumptions

  • You have a domain that contains at least one Server 2003 Domain Controller (DC/GC).
  • The client machines have network connectivity and can talk to either Lync 2010 Front End Servers
  • You are using the latest updates on this Lync 2010 infrastructure.  At the time of writing this lab, we are utilizing Cumulative Update (CU) 5.

Computer Names

Lync 2010 Standard Edition Front End Server –A-L14FE1

Lync 2010 Standard Edition Front End Server –A-L14FE2

Domain Controller  / Global Catalog /  Root Enterprise CA – A-DC1

Domain Controller  / Global Catalog – A-DC2

Client  – A-Client1 (Lync User Account Homed on A-L14FE1)

Client  – A-Client2 (Lync User Account Homed on A-L14FE2)

Configuration of  Domain Controllers

Operating System: Windows Server 2008 R2 SP1

Processor: 1

Memory: 1024MB static

Virtual Network Type - External NIC

Virtual Disk Type – System Volume (C:\): 60GB Dynamic

Note: In a real-world environment, depending on the needs of the business and environment, it is best practice to install your database and logs on separate disks/spindles. I installed Active Directory and Certificate Services on the same disks/spindles for simplicity sakes for this lab.

Configuration of Lync 2010 Standard Edition Front End Servers

Operating System: Windows Server 2008 R2 SP1

Processor: 2

Memory: 2048 MB static

Virtual Network Type - External NIC

Virtual Disk Type – System Volume (C:\): 60 GB Dynamic

Configuration of Client Machines

Operating System: Windows 7 Enterprise X64

Processor: 1

Memory: 1024 MB dynamic (512 startup)

Virtual Network Type - External NIC

Virtual Disk Type – System Volume (C:\): 60 GB Dynamic

IP Addressing Scheme (Corporate Subnet)

IP Address – 192.168.1.x

Subnet Mask – 255.255.255.0

Default Gateway – 192.168.1.1

DNS Server – 192.168.1.51 Primary (A-DC1) / 192.168.1.55 (Secondary)

Base Topology and Configuration

Lync 2010 Topology

What I did was create two Central Sites.  The A-L14FE1.shudlab.net Standard Edition Pool Server will go in Chicago.  The A-L14FE2.shudlab.net Standard Edition Pool Server will go in Detroit.  There wasn’t a particular reason I decided to go with two Central Sites.  I could have put both Standard Pools in the same site.  The Lync 2010 Topology does let you have two Pools in the same Central Site and set the other Pool as a Backup Registrar just like if both Pools were in separate Central Sites.   Now I’m not going to go into detail on why we would want to use multiple Central Sites, but here’s a breakdown of some of the reasons I can think of off the top of my head:

  • Want to have separate Edge Pools for Media Traffic.  That way, we can put one set of users in Chicago and have it use Chicago pipe for Chicago users and we can put the other set of users in the other pool (for example, Detroit which would most likely be an entirely different region much farther away) so those users use an entirely different pipe for Edge media traffic.
  • We want to associate a specific Survivable Branch Appliance (SBA) or Survivable Branch Server (SBS) with a specific Pool.  This is done by creating a Branch Site within the Central Site.  The Branch Site would contain the SBA or SBS and because the SBA or SBS is within the Branch Site which is in a Central Site, that SBA is associated with the Pools in that Central Site.
  • We want more granular control over Call Admission Control.  I’m not going to get into detail on CAC, but CAC uses something called Regions which is associated with a Central Site.  If we have our Pools in different Central Sites, it allows to create more Regions which gives us more control over how we link Regions together and therefore, how we can route Audio/Video traffic across our sites.
  • A thing to note is that when you have a PSTN Gateway associated to a Central Site, that PSTN Gateway cannot be associated to a Mediation Server in another Central Site.

With that out of the way, let’s take a look at our topology from a Site level.  As stated, we have two Sites: Chicago and Detroit.  We only have one SIP Domain and that SIP Domain is shudlab.net.

As you can see, we have two Standard Edition Front End Pools: A-L14FE1.shudlab.net and A-L14FE2.shudlab.net.

If we take a look at the Properties of our A-L14FE1.shudlab.net Pool we can seeA-L14FE2.shudlab.net is set as our Backup Registrar.  I set the Failure detection interval (sec) to 30 seconds and Fallback interval (sec) to 40 seconds. Around 30 seconds the Lync 2010 client that is connected or connecting to that Primary Registrar that is no longer available will fallback to this Backup Registrar.  After their Primary Registrar is back online, shortly after 40seconds of their Primary Registrar being back online and the client detecting that their Primary Registrar is operational, the Lync 2010 client will automatically sign out and sign back into their Primary Registrar.

The defaults for these values are 300 seconds for Failure detection interval and 600 seconds for Failback interval.  But because this is a lab, and I will be taking and embedding video in this article series for you to witness the failover and failback, I didn’t want you guys to wait for long hence the lower values I have set.

If we take a look at the A-L14FE2.shudlab.net Pool’s Resiliency settings, we will see a similar setup where A-L14FE1.shudnow.net is the Backup Registrar.

Note: Keep in mind that an SBA and/or SBS cannot be the Backup Registrar for a Pool.  You can, however, have a Pool be the backup Registrar for a SBA and/or SBS.  This is for registration only.  You can still have redundant voice routes that use the voice gateway and/or SIP Trunks in any location you would like.

Domain Name System (DNS)

If we take a look at the single SRV record currently in our environment, we see it is pointing to the Chicago Pool, a-L14FE1.shudlab.net.  We can point it directly to the pool name instead of something like sip.shudlab.net because AD and our single SIP namespace is using shudlab.net.  If your SIP namespace and your AD (Pool using AD namespace) is different, we would have a SAN name on our Pool containing sip.<sip domain> for each sip domain in our environment.  Our SRV record for each namespace would point to its corresponding sip.<sip domain>.com.  While we could still use sip.shudlab.net, I would rather use the Pool FQDN instead so when a user connects, we can see in the logs that it is connected to the Pool FQDN rather than sip.shudlab.net.  You’ll see what I’m talking about when we start looking at logs.  I’ll make a mention of it.

Client Connectivity

We have 2 Client Computers both running Windows 7 x64 with Lync 2010 CU5 client installed:

  • A-Client1.shudlab.net
  • A-Client2.shudlab.net

We have 2  Users:

  • ClientUser1 (ClientUser1@shudlab.net
  • ClientUser2 (ClientUser2@shudlab.net)

We have 2 Pools:

  • A-L14FE1.shudlab.net
  • A-L14FE2.shudlab.net

Everything with 1 is aligned together and everything with 2 is aligned together.  Therefore:

  • ClientUser1 is using A-Client1 computer which is associated to the A-L14FE1 Pool
  • ClientUser2 is using A-Client2 computer which is associated to the A-L14FE2 Pool

Conclusion

Thanks for reading Part 1.  In this Part, I went over what the lab setup looks like and took a look at the base topology and configuration before we really start diving into the actual scenario testing.

In Part 2, I will go through the sign-in process for each user.  Because the SRV record for sign-in is pointed to A-L14FE1.shudlab.net, the sign-in process will be different for each user logging in.  We’ll take a look at this in detail.

To read Part 2 of this article series, click here.

Share

4 Responses to “Lync 2010 Central Site Resilience w/ Backup Registrars, Failovers, and Failbacks – Part 1”

  1. on 07 Aug 2012 at 7:27 amAndrea

    Hi, the pictures as the same? …in your text you refer to different pool, but the info in the windows are related only to the 1°pool … I'm wrong ? :(

  2. on 06 Sep 2012 at 12:28 pmPaul

    What happens to the contacts of •ClientUser1 (ClientUser1@shudlab.net when he fails over to •A-L14FE2.shudlab.net? does the user still have access to them ?

  3. on 06 Sep 2012 at 1:59 pmElan Shudnow

    No, they will not have contacts. However, this experience is better in Lync 2013 which provides Pool to Pool replication. In Lync 2013, when a user fails over, they will not have their contacts still but Lync 2013 provides a Pool Failover command which allows the Backup Pool to activate the other pool's data which includes contacts. The other benefit in Lync 2013 is that Front Ends now also store a copy of the contact data so even if the backend goes down, the contacts remain.

  4. on 24 Oct 2012 at 8:18 amMike W

    I have 2 sites (all internal networking, no EDGE deployed yet, no EV either) and customer wants to have IM/Presence even if the primary site fails. I see the failover, the client registers with the backup site (director deployed, and two SRV records), we see the big red banner but there is no address book search available. We currently have WebSearchOnly enabled for the address book to allow new users immediate access to the AB without having to modify registry settings for the GALDownloadInitialDelay settings. They all expect no contact list on the backup registrar, but not being able to even search the AB for a 1-1 IM is not what they are looking for either.

    Nice series. helped me grasp better how things work.

Trackback this post | Feed on Comments to this post

Leave a Reply