Wi-Fi Roaming: how it works, how to successfully deploy it and how to troubleshoot it

The general topic of Wi-Fi roaming covers a lot of knowledge domains. When the topic of roaming comes up, it’s common to hear questions like:

  • What do you mean by “roaming”?
  • How does it work?
  • Why doesn’t mine work?
  • How do I fix my problems?
  • How do I set it up so it works?
  • Why do I have to do things that way?

How Wi-Fi roaming works is a subject that is difficult to simplify because of the range of factors involved in roaming. Sometimes when roaming “doesn’t work”, it is working exactly as it should, but not exactly as expected.

Explaining common roaming “problems” requires a fairly deep dive into how roaming works.

The focus of the following discussion will be mostly on the most difficult Wi-Fi environment: the enterprise business. Enterprise level businesses are businesses with 100 or more employees. These businesses typically have the most complex requirements: a higher level of security, a larger number of access points, and so on. For the most part, if the requirements of an enterprise business are satisfied, so are the requirements of all other environments.

Also with the growing trend of BYOD (Bring your own devices), increasing varieties of mobile devices (tablet,etc) and growing use of voice and video applications in enterprise networks, quality ‘glitch free’ roaming is now a major requirement.

Types of Wi-Fi Roaming

There are two fundamental types of Wi-Fi roaming: “nomadic” and “seamless”.

When a Wi-Fi user moves their laptop from their desk to a conference room, they are engaged in “nomadic” roaming. Roaming of this sort typically places a low demand on AP “hand off” speed, since the user is simply moving to a different location, not wandering around. This is the type of roaming that is seen in hospitals and clinics when an equipment cart is moved from one location to another, such as a patient’s bedside. In this instance, the client will usually be powered up after moving and then connect to an AP. The client remains connected to that AP until powered down.

The more demanding type of roaming is “seamless” roaming. In this instance a client is moving while in use, and will switch between APs as it travels.

Consider the case of a Wi-Fi VoIP handset where a user makes a call while they are walking around. The VoIP handset connection may be handed off to several different APs in turn while the call is active. In this scenario, the need to make a fast hand off from one access point to another is vital: anything greater than a 150 millisecond transition may terminate their VoIP call.

VoIP not only requires good signal strength in the service area, it also requires AP capabilities that allow for fast client “hand off” to another AP. These are two of the reasons that VoIP is considered an excellent test case for proper AP installation and configuration.

Roaming and Security in Enterprise Environments

The need for security poses a problem for roaming: handing off a client in a secure environment requires that the client authenticate with the new AP using the same process as the current AP. In enterprise environments (WPA/WPA2 Enterprise) a client authenticates using 802.1X, which brings an authentication server (such as RADIUS) into play. The 802.1X transactions take time, and can easily push the 150 millisecond ceiling for VoIP transitions, especially when there is network congestion.

There are 802.11 amendments that have been developed to improve hand off times: 802.11k, 802.11r and 802.11v. Together, they provide information needed for faster BSS-transition of clients to other APs:

  • 802.11k: Radio Resource Management provides clients with a list of potential transition APs which can speed up the search for a likely transition AP.
  • 802.11r: Fast BSS transition (FT), also called fast roaming, allows for early authentication with an 802.1X authentication server.
  • 802.11v: BSS Transition Management provides an “early warning” mechanism for transition to assist clients moving out of the APs service area.

In a nutshell, 802.11k provides a mechanism for the AP to provide the client with a list of APs that have strong signal strengths prior to transition. The APs in this list provide a list of good candidates for client consideration as the next potential AP for connection.

With 802.11r, the client can authenticate while remaining connected to the current AP. When transition does occur, the early authentication reduces transition time to a new AP significantly. A full discussion of 802.11r is too complex for this post, but it is still an important part of roaming in the enterprise environment.

Finally 802.11v defines a mechanism to send a dissociation imminent message to clients when the AP determines that a signal is becoming too weak for service. This provides the client additional time to authenticate with a candidate AP or to begin searching for an alternate AP.

It is worth noting that in the case of SOHO (small office / home office) environments, WPA/WPA2-Personal are common access modes used. These methods use a PSK (Pre-Shared Key) for authentication, which is generated from a passphrase entered by the user. What is important to remember is that an authentication server is not in use.

When BSS-transition roaming occurs in these SOHO environments, the use of WPA-Personal or WPA2-Personal allows transitions that are fast: speed is rarely an issue. WPA/WPA2-Personal most often meets the ideal transition time of 50 milliseconds or less.

Signal Strength, Low Data Rates and Connection Issues

Sometimes AP placement results in poor reception at spots in the service area, and as a result clients may have difficulty communicating with access points, which will interfere with roaming.

Wi-Fi APs and clients strive to preserve connections. When communication at a particular data rate is unsustainable, the connection will switch to a lower data rate in an attempt to preserve the connection at the expense of throughput. Downshifting to lower data rates can occur even though a connection is active and in use.

In environments where roaming issues have been identified, a floor-plan based site survey should be performed, even if it is an informal “walk-through” of the service area with a simple signal strength monitor. A site survey can reveal signal voids (areas of low signal strength). However, weak signals are not always the source of issues. Sometimes a strong signal can be compromised when the RF “noise floor” (ambient RF) is high.

Industrial equipment, such as motors, or common appliances like microwave ovens can be the source of RF interference (“noise”) that generate issues for Wi-Fi, and in turn, for roaming. While equipment and appliances can cause problems, another source of noise is important to consider: other RF access points. This problem is very prevalent on the 2.4 GHz band.

RF noise can be defined as RF radiation in a particular frequency that is unwanted. It doesn’t have to be what is commonly called “static”, it can be a well modulated and clear Wi-Fi signal that is not intended for the AP. The Wi-Fi signal may originate from a neighboring businesses’ access points, but it still interferes with our AP/client communication. When that occurs it is noise. Sometimes the source of this noise may be the very APs that we have set up in our own coverage areas. 

APs with overlapping coverage areas can cause problems due to co-channel interference (CCI). In this case, two or more APs transmitting on the same Wi-Fi channels can cause interference which reduces the total throughput of the APs. By reducing the signal strength on these APs, it may be possible to improve the throughput. Reducing the signal strength of the competing APs reduces their coverage area, and in turn their arena of signal contention. This can reduce disruption of the channel, allowing better communication with clients. Note that reducing signal strength seems counter-intuitive; typically increasing signal strength improves communications, but not in this case!  This is especially true in dense deployment of APs, where there is overlapping coverage.



             Co-channel interference between NETGEAR62 and NETGEAR62_EXT

(By Andrew Crouthamel - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/... 

Another phenomena known as adjacent channel interference (ACI) can cause problems. Some APs permit a channel width larger than 20 MHz which can result in channel overlaps that are especially detrimental to the signal reception of APs. The source of the interference in this case is from the adjacent and overlapping channels.



        Adjacent Channel Interference: Interloper31 Overlaps with Other AP Channels

For instance, in the diagram above co-located APs on channels 6 and 9 may cause adjacent channel interference. The signal for CrowdedSky24 on channel 6 overlaps with that from Interloper31 reducing the effective clean signal in the overlapping areas.

Building a channel map of a Wi-Fi installation can help identify potential ACI issues. This simple pencil-and-paper channel map lists the channels in use by each AP and seeks to find adjacent APs that are using identical or overlapping channels.. Using 40 MHz channel widths on the 2.4 GHz band is considered bad practice by wireless professionals because it extends the effect of adjacent channel interference. Instead, properly and carefully selecting non-overlapping channels with 20 MHz channel width can also reduce ACI significantly, and as a consequence, improve roaming performance. Datto APs have a feature which will optimize the channel settings using a scheduled procedure which will “listen” to the RF spectrum and make adjustments to power and channels used.

To simplify things, Datto provides a knowledge base article on channel selection to help with configuration of APs.

“Sticky APs” are Really “Sticky Clients”

Clients, not APs, decide when to roam. From an end user viewpoint, however, when there are issues with roaming it seems to be the AP that is the problem, not the client.

Typically a client will begin to search for alternate APs when the current AP signal strength falls below a “search” threshold. The client searches for and qualifies candidate APs against a set of criteria that includes a minimum signal strength and data rate.

The candidate search happens in advance of the transition threshold, which is the signal strength at which the client will actually begin the move to an AP with a better signal strength. The search threshold varies from vendor to vendor, but it will be considerably more than the transition threshold. The search happens early so the client saves time when transition becomes necessary.

Clients will remain connected to an AP as long as the signal strength is sufficient. This way a client avoids frequent switching to the latest strong signal candidate. This helps connection continuity and avoids unnecessary and frequent switching between APs when roaming.

To the end user, it appears that their client is “stuck” to an AP since it is refusing to switch to an AP with a stronger signal. An end user may wonder why their device remains connected to a distant AP even though they may be standing directly under another AP with a stronger signal. The simple answer is that transition is unnecessary since the distant AP is still providing sufficient signal strength and throughput.

Wi-Fi Bands and Free Space Path Loss

A complication to the issue of roaming is that a 2.4 GHz signal and a 5 GHz signal with the same EIRP (equivalent radiated isotropic power) will result in the 2.4 GHz signal appearing stronger than the 5 GHz signal at a distance. This loss of signal strength (called “Free Space Path Loss”) is a phenomena that may cause shifting between the 5 and 2.4 GHz bands which can result in changes to data rates during an active connection.

The result in the different signal strengths provide coverage areas that are unequal for the two bands. The following diagram for an access point in an ideal environment with both bands at identical EIRP power settings illustrates this:

In the case that the “hand off” signal strengths are identical, the two transition points will be unequal, with the 5 GHz transition boundary completely subsumed by the 2.4 GHz transition boundary.

The roaming problem this poses can be illustrated with a simple use case involving a client that roams between two APs with overlapping but unequal band coverage.

In this case the device is roaming between two APs where the transition boundary of the 2.4 GHz band overlaps, but the 5 GHz does not. A client device may have a roaming path between the two that results in multiple transitions points, which may result in changes in data rates.

In the diagram above, a client device passing through the area as shown would detect trigger conditions at each of the indicated points:

  1. Transition from 5 GHz to 2.4 GHz
  2. AP 2 becomes a candidate for transition in the 2.4 GHz band.
  3. Switch to AP 2 in 2.4 GHz band.
  4. Switch to 5 GHz switch if AP 2 supports band steering on active connections.
  5. Transition from 5 GHz to 2.4 GHz

Depending upon the AP feature set, it’s possible that the client device transitions to the 2.4 GHz band and remains there. Since the 2.4 GHz signal strength remains adequate it may not transition to 5 GHz, even though the 5 GHz signal becomes stronger at point 4.

In this example it becomes clear that proper configuration of AP power is important, as is placement of the APs. Most important of all, however, is the validation of the environment using a signal strength monitor.

The actual behavior of the client depends a great deal on the client device itself. The 802.11 standard defines the mechanisms used for roaming, but it does not define the conditions at which transitions occur. The decision to switch access points is made by the client alone and involves algorithms that usually differ from vendor to vendor.

Apple Roaming

Apple devices are an excellent “case study” for roaming since their roaming strategy and thresholds are documented and consistent. Apple reports that its roaming strategy is consistent across all models and iOS versions beginning with iOS 10.

Roaming is triggered when the received signal strength indicator (RSSI) reaches -70 dBm for Apple cell phones (-75 dBm for Macs). When the trigger RSSI is reached, then:

  1. The Apple device begins a search for an AP with RSSI of +8 dBm or +12 dBm greater. (+8 dBm if data is being sent otherwise +12 dBm).
  2. If 802.11k is enabled, it searches the first six entries in the neighbor report and reviews them to prioritize the scans. If 802.11k is not enabled, it must scan all channels which can add several seconds to the discovery process.
  3. If using 802.1X-based authentication, more time can be added while completing the RADIUS server exchange. According to Apple, “This can take several seconds ...”.
  4. If using 802.11r, the client can authenticate with candidate access points in advance, which saves time, but realistically only saves a small amount (1-3 ms).
  5. Until a client can find an AP that satisfies the conditions, it “hangs on” to its current AP, even though the signal may fade below ideal levels.

Apple provides additional information online about roaming for its Wi-Fi devices in an enterprise environment. For information on Apple devices and 802.11k, 802.11r and 802.11v, see: Wi-Fi network roaming with 802.11k, 802.11r, and 802.11v on iOS

Datto AP Roaming Features

Datto AP42 and AP62 APs can be configured to use “band steering” to encourage prospective clients to connect to the 5 GHz band. Datto APs use a proprietary algorithm to analyze the candidate channels for connection. This analysis derives a channel metric based on signal strength, number of connected clients, the traffic through the channel and other factors. The resulting metric is used to direct the client to the best connection with the AP. This helps reduce time and data rate shifts with moving to a new AP.

Best Practices for Wi-Fi Roaming

In the initial design of a Wi-Fi deployment that supports roaming, there are several best practices that will help with a successful deployment:

  1. Restrict mobile (roaming) devices to 5Ghz: This will not only isolate the devices from 2.4Ghz co-channel and adjacent channel interference but also isolate them from throughput consumption by legacy devices and sensors that are 2.4Ghz only. This can be done by configuring a set of SSIDs that exist on 5 GHz only, and assigning mobile devices to that SSID. Make sure the 5 GHz service area coverage and strength will support roaming clients.
  2. Enable 802.11k and 802.11r: If devices support 802.11k, or 802.11r then enable those features by default.
  3. Use bridged SSIDs: This will automatically provide layer 3 roaming. Otherwise, the stations will have to negotiate IP addresses with every roam, and increase the roaming delays
  4. Consider a Separate SSID for roaming devices: Some legacy devices do not support 802.11r or 802.11k. Enabling 802.11k/r on an AP might help, but might affect connectivity on legacy devices. Creating an SSID exclusively with 802.11k/r enabled will allow new devices to take advantage of fast transition roaming features.
  5. Use the Auto-RF feature of Datto APs: This allows APs to configure use of non-overlapping channels. Auto-RF will also adjust the channel signal strength based on detected interference.
  6. Enable DFS: Dynamic frequency selection (DFS) allows the APs a bigger choice of non-overlapping channels. Before enabling DFS, make sure that you will be in compliance with the appropriate regulatory restrictions in your area.
  7. Consider AP coverage area: The cell size (coverage area) is important when considering AP configuration. If the cell size is large, then enable 2.4 GHz legacy rates to increase the range, but remember the impact on 5 GHz clients. Disabling the slower legacy rates might create dead-zones in the service area.

Troubleshooting Wi-Fi Roaming

When a Wi-Fi installation is experiencing difficulties with roaming there are several simple checks which can correct problems without extensive analysis:

  1. Use 20 MHz channel widths for 2.4 GHz. (40 MHz channel widths can cause adjacent channel interference.)
  2. 802.11k/r: Check AP configurations to make sure 802.11k and 802.11r are enabled.
  3. Auto RF: Ensure that auto RF is enabled.
  4. Perform a site survey to ensure client locations have sufficient signal strength. If possible use the client device to check signal strength.
  5. Build channel and location maps of the site APs to determine if there may be any co-channel interference (CCI) or adjacent channel interference (ACI).
  6. Check for co-location of APs: If there are APs that are close to each other and have the same SSID, reduce the signal strengths of both APs if there is suspected co-channel or adjacent channel interference.

There are a lot of “dials and buttons”

The number of factors that can disrupt roaming is fairly large. Remedies for problems often require knowledge of the physical environment, security requirements, and the configuration of the Wi-Fi network. An understanding of the capabilities and unique features of equipment involved is essential, and a collection of good references is helpful.

The number of devices that can be configured to use Wi-Fi is quite large and growing almost daily. Mike Albino maintains a useful online document called “The List” which documents the capabilities of many devices. This reference is very useful when device documentation may not be readily available.

Apple provides an online document to help with the configuration and deployment of iPhone and iPad devices which can be very useful.

While an understanding of Wi-Fi roaming can help diagnose some elementary issues, more complex problems can present confusing symptoms. Datto support engineers have the experience and insights for resolving roaming issues and are ready to assist whenever problems surface.

About the Author

Joe Maybee

Certified Wireless Security Professional and Certified Wireless Network Administrator

LinkedIn  

More from this author