This post was authored by John Dilley and first appeared on LinkedIn: https://www.linkedin.com/pulse/how-much-does-network-latency-really-matter-john-dilley/
Does lower network latency make a difference to performance? We absolutely think so!
When I described the Programmable Edge concept to my father, a career scientist with deep research and teaching experience, he said, “Light can circle the Earth 7 times in a second, how much difference does even a thousand miles really make?” It’s a great question – he always finds challenging questions (ask for a story about that some time!).
Putting servers at the network edge reduces network latency (round trip time) for clients and I assert that this makes applications faster and more highly available. The “edge” as I use the term is as close as you (the application owner) can get to your end users. It’s not easy to put software in hundreds of locations, so should you really invest in deploying apps at the edge?
Let’s frame the question like this:
How does latency affect web application response time?
First, I’m sure you agree that response time matters (a lot) to a web-based business. Many studies have linked response time to improved user engagement and e-commerce conversion rates and fast response time becomes more important every year.
Web page response time is the total time your browser took to fetch and render the page. Allow me to dip into network queuing theory to formalize a few terms before we go on.
- Network Latency: The minimum (latent) time a packet takes to get from here to there. We normally talk in terms of round-trip time (RTT), which is the delay there and back.
- Queuing Delay: The time spent waiting in a queue – perhaps in a router as your packet moves from network to network between your browser and the server. Or delay waiting for access to resources on an overloaded server. I’ll call it QD below.
- Server Residence Time: How long the server takes to compute and start sending a response back to you. This includes generating dynamic content, looking up the response in a cache, or whatever. Let’s call it CPU.
As a developer the residence time is your code doing its thing. You work to make this fast, and this time dominates the overall response time in desk and local network testing. Only when you push it to the open Internet do you really face latency and queuing.
If you are reading this article on LinkedIn your browser fetched the page from a server near you via a Content Delivery Network (CDN), which serves cacheable content from the network edge. CDNs defined the network edge, and at Akamai I helped build some of the infrastructure that delivers content, and accelerates and secures applications worldwide. So I’ve seen this movie before!
Let’s look in detail at what your browser had to do to fetch the page, focusing in on the network queuing theory we reviewed above. Here are the steps, assuming a first fetch of the page today:
- Look up the server hostname to resolve its IP address – 1 or more RTT to DNS (more on this later)
- Open a TCP connection to the server – 1 RTT plus queuing delay (QD) if network is congested
- Establish a secure TLS session – 1 RTT (plus maybe QD) plus crypto work on the server (residence time)
- Send the web request to the server – 1 RTT (+QD, which I’ll stop saying explicitly) to send the request and receive acknowledgment (TCP ACK) that it arrived
- Compute the response – 0 RTT but server residence time to do its thing, like look up the page in cache or fetch it from the origin server if it’s not in cache
- Send the response – multiple RTT for each TCP window of content (see  below)
Each of these sub pages require:
- DNS lookup if it’s not the same hostname …
- Open a TCP Connection to the server – 1 RTT …
- Establish TLS connection – 1 RTT + crypto CPU …
- Send the request – 1 RTT …
- Compute the response – 0 RTT + CPU …
- Send the response – multiple RTT… just like the base page above.
To render the page you need many responses back. Your browser will make some requests in parallel; each new connection requires the TCP and TLS handshake before they get any data.
The “multiple RTT” on the response reflect the fact that the TCP stack can only send one TCP window of data before it has to wait for acknowledgment back that the client has received some or all of the data sent. You can’t have more than one window of data in flight (per TCP connection). The window typically starts out small and grows to perhaps 64 KB. Servers and fancy clients can turn the window up further, but unless you break TCP you have to start smaller and grow. This is to prevent fast senders causing congestion, queuing delay and packet drops.
The page you’re reading now probably took 80-100 network requests. Mostly images, some JS and CSS, and four or so html objects across 15-20 domains (most of them via edge servers). Some of the objects transferred over 100 KB. The total round trips for the page to load is well over 100. Multiply the RTT by 100 round trip object fetches and it really adds up!
At 20 msec it’s two seconds … a reasonably responsive page. At 100 msec the page would take ten seconds to load. That’s unacceptably long for many users.
To put RTT latency numbers into perspective: 10 msec will get you to a server at most a few hundred miles away, but typically in the same well-connected metro area. Conventional wisdom is a millisecond gets you 30 miles round trip. Cross country USA gets up to 100 msec range. Long haul trans-oceanic we’re looking at up to many hundreds of msec as illustrated below.
Now I hope you can see how going from 20 to 100 msec delay can turn an acceptable page load into one that people may browse away from.
Show Me The Data!
The following graphs show first the network latency, client-to-edge and client-to-origin, for a web server tested from client locations listed on the horizontal (x) axis. The green bars show network latency, coral shows edge latency. From Asia and South America we saw over 100 msec RTT (measured by TCP connection establishment time) to the origin server, down to about 10 msec in VA, where the server is evidently located.
The second graphs shows the full page response time – and clear correlation between the latency shown in the top graph and the full page response time where the endpoints in Asia took a full second to load a simple (44 KB static) base web page from the origin. Note that the worst-case full-page response time is well over a second even though edge response time was consistently under 100 msec across the world.
A website that requires many round trips from the client to authenticate, retrieve dynamic personalized content, or provide other computation should see performance gains at least as strong as the case above.
If this is what the Programmable Edge can do for a website, imagine what it can do for custom logic that is needed closer to a drone to help it maneuver faster. Or to a sensor to help it make quick decisions based on data its collecting.
I only discussed Queuing Delay briefly here. The more networks you have to transition between the browser and server the more likely you are to hit congestion and delay. I’ll explore that in a future blog post; suffice it to say that queueing delays can seriously magnify the latency issue.
Finally if you objected to my counting 1 RTT for sending the web request (instead of 1/2 – since technically we don’t need the response ACK) you may subtract 1/2 RTT from the “multiple RTT” on the response side. Bonus points for sharp eyes.