Estimating Mail Migration Speeds – Theoretical vs. Practical

Binary Tree specializes in email migrations and it is quite often that we attend planning workshops with customers that are preparing for their transition to Exchange 2010. During these meetings, a very common question that comes up is, “How fast can we migrate to the new environment?” And it may sound tongue-in-cheek but our standard answer to this question is, “You can migrate as fast as you want.”
This usually comes as quite a shock to the project teams. However, as we explain the scalability of migration processing they come to understand that the theoretical limits for migration speeds are really only bound by the available bandwidth on their network and the speed of their servers. The more migration infrastructure (sessions and processing methods) you configure within your environment, the faster the data will get moved.  

speedometerThe true gating factor is the practical limitation of migrating end-users, NOTtheir mailbox data. Moving mailbox data is easy, moving end-users is hard. The change management aspects of a migration project, which include end-user training, desktop updates/refresh, communications, migration scheduling, help-desk support, among other things, are truly the limiting factors for how fast you can migrate email to a new platform. 

But customers always want to know how they can perform an accurate estimation of their data throughput for a migration to see what their theoretical limitations are for the project. There are two important factors that I’ll describe in this post which are key to crystalizing a migration throughput estimate:
  • Identify and Validate a Single Migration Speed Unit (for each source and target location)
  • Perform a Migration Test in the Production Environment with Real Mailbox Data
Let me delve deeper into #1 above. A migration speed unit is a measurement of data migration from a single mailbox located on a source server to a target server running within the new production Exchange 2010 infrastructure.  This migration throughput is driven by a single migration session handling the processing between the source and target servers.  
For the simplicity of this post, let’s say that this measurement is 1Gb/Hr from location X to Y, and 2Gb/Hr from within location Y to Y, if there are source servers located within the same datacenter as the target environment.  If there are 1,000 users to migrate from location X, and another 1,000 in location Y, all with an average mailbox size of 500 Mb (or .5Gb) then the following formula would estimate the time necessary to perform these migrations using a single processing session.
  • Location X ⇒ Y (500Gb at 1Gb per hour) = 500 hours to migrate the 1,000 users
  • Location Y ⇒ Y (500Gb at 2Gb per hour) = 250 hours to migrate the 1,000 users
  • TOTAL: 750 hours of migration processing using a single session.
By increasing the number of sessions so that they run in parallel, you can easily see how the amount of time to migrate groups of users will drop exponentially. And you just need to configure more migration infrastructure to drive the scale of those migration sessions and reduce the amount of time to complete the data transfer. It’s as simple as that.  
But first you must find out what that single migration unit of measurement is for each geographic location pair (source to target). For that task you can use the information found below in recommendation #2 to perform a migration test in the production environment with real mailbox data.  
The biggest mistake most inexperienced migration teams make is using lab environments and fake mailboxes to perform throughput estimates. By filling up a generic, test mailbox with thousands of messages generated from automated scripts the resulting size may be 500 Mb but the actual item disposition is nowhere near that of a real user mailbox.  
Production end-users have mailboxes that contain messages with attachments and rich text and calendar items and large distribution list headers. Migrating these items is VERY different than migrating thousands of small emails generated by an automated script.  
Add to that the difficulty in simulating the network bandwidth between a remote source server location and a centralized target server environment. You can guestimate as much as possible by looking at network usage reports between sites and try your best to throttle the network within your lab to simulate the available bandwidth. But it is NEVER close enough to what truly happens on the production network during a migration window (usually run on weeknights and weekends).
So the rule of thumb for this recommendation is to perform your testing in the production server environments over your production network. And please use realistic mailboxes for your validation testing, NOT auto-generated data stored in generic mailboxes.  
Once you’ve run single migration processing tests between your different site pairs (source to target) you will have the accurate measurements of migration units that you can use for your estimation formulas. From there you simply need to scale your migration infrastructure to increase the number of sessions necessary to meet your goal for migration throughput.
In closing, let me remind you that these migration throughput estimates are theoretical limits that can be achieved by increasing the size of your migration infrastructure to meet your needs. The practical measurement for any mail migration speed on a project should be based on the transition of the end-users. Mail migration projects are deemed successful not only based on the speed of the transition but also based on the impact to end-user productivity. So even though you might theoretically be able to move everyone’s mailbox data in a single weekend, you really need to identify if you should move that quickly. For more information on migration methods and best practices please check out the white papers, case studies, and additional resources on our website.