Forums

Large Scale Site Optimizations

General Discussion : Developer Blog Previous Index Next Threaded
Jump to first unread post. Pages: 1
nic
07/14/09 11:40 PM

Edit Reply Quote Quick Reply
Hello everyone!

As I've alluded to in a few other posts, I've been steadily making changes to the VGT "backend" (SQL database, web server, and web page optimization) over the past few months.

I thought I'd share with you what I've been up to, as I much of my work isn't immediately visible, yet I hope my changes contribute to improving the overall quality of VGT. Lately there has been a ton of great research in the area of web performance, both in server and in the browser. There are some great books out there (High Performance Websites, and Even Faster Websites by Steve Souders), as well as a lot of great tools (Firebug/YSLow, PageSpeed, WebPageTest.org, etc).

Coinciding with this, a lot of research has been published by Google and Microsoft lately regarding the correlation of how fast a page "loads" vs. visitor happiness and retention rate. While I don't think that VGT is a "slow" website, I have focused on several areas to improve the page load times for the site.

Here are several of the resources I use:
http://developer.yahoo.com/yslow/
Firebug: https://addons.mozilla.org/en-US/firefox/addon/1843
Google PageSpeed: http://code.google.com/speed/page-speed/
http://www.webpagetest.org/
http://smush.it/
http://developer.yahoo.com/performance/rules.html#num_http

The main performance techniques for minimizing page load times can be summed up in a few key areas. The last link above goes into these topics in more details:
* Minimize the number of requests (html, images, css, javascript, etc)
* Minimize the size (KB) of components (ditto above)
* Serve content closer to the consumer (via a Content Delivery Network, aka CDN)
* Minimize the number of requests for repeat visitors (setting far-future Expiry times so their browser doesn't re-request it to see if it's changed on later visits)

With these in mind, I attacked: thumbnails, pics, JavaScript, CSS, images and added a CDN:

Thumbnails
I fixed a long-standing bug where thumbnail uploads, no matter what format they're in, were just renamed to '.gif'. About 50% of the previous 70,000 thumbnails are JPG, 40% are GIF and 10% are PNG. Yet no matter what the format, the website simply renamed them to nnnnn.gif. Most browsers handle this by 'sniffing' the image header to find that the .gif file is really a .jpg, and show it properly -- however, I have had reports of Mac browsers not showing thumbnails and I believe this was the cause.

To fix this, I have converted all thumbnails to JPG (@ 70% quality). This includes not only previous thumbnails that were GIF and PNG, but all new thumbnail uploads are converted (and not just renamed) to .jpg. From my experiments, I have found that JPGs are about 40-50% smaller than GIFs for the thumbnail-style images submitted. I experimented and took samples across JPG quality levels 0 - 100%, GIF, and PNG quality levels 0 - 9. JPG @ 70% quality seems to provide the best "bang for the buck" of file size to image quality without any noticeable loss in quality. Thus, I converted all 70,000 previous thumbnails (1,226 MB) to 70% JPG and they now only take up 572 MB. The thumbnail archive for old imagery was 105 MB but after the conversion is only 61 MB. All newly uploaded content is converted to 70% quality JPG. Besides converted to 70% JPG, all images are run through jpegtran which strips unnecessary JPG header information to save another ~2% in file size.

This translates directly into faster page load times due to less content to download -- especially on the "browse" pages with lots of thumbnails.

Next, all thumbnails are versioned by file name. For example, previously the thumbnail for Google for map # 100 would be /thumb/100.gif. Now, each thumbnail is named with its version number, such as /thumb/100-v1.jpg. When updated, next version will be put at /thumb/100-v2.jpg. The reason for this is that I can finally set an Expires HTTP header to far-future (1 year) for the thumbnail, which helps reduce repeat-visit page load times (more on this later). Previously, newly updated thumbnails would overwrite the old nnn.gif image, so if an Expiry had been set on that thumbnail, users (especially moderators) re-loading a page would not see the new imagery.

Since the thumbnails are versioned and have an Expiry, re-loads to browse pages such as /maps/ will not trigger 30 requests to the webserver to see if any of the thumbnails have been updated, since we're guaranteeing that the thumbnail at that exact location will never change.

Finally, all thumbnails have been moved to a CDN (see below).

Pics
All pics's JPG header info is stripped via jpegtran (2% file size reduction), and moved to the CDN.

JavaScript
The site's JavaScript was referenced in every page's HTML HEAD section. This causes the page rendering to pause, before any content is displayed, to wait for the browser to download the JavaScript when it first loads a page (http://developer.yahoo.com/performance/rules.html#js_bottom). Additionally, the JavaScript for the site used to be delivered, on the fly, by a script that would minimize (jsmin) the JavaScript to strip whitespace. Unfortunately the minimization of the JS would take 500-700ms to send back to the browser, so everyone would have to wait for 1/2 second for the site to even display a pixel.

To solve this, first, I moved the JavaScript SCRIPT tag to the bottom of the page, so the browser doesn't block on rendering the page to load the JavaScript.

Additionally, I pre-compile a minified JavaScript (just once, and save this result), and store this on the CDN for delivery.

CSS
VGT's CSS was spread across two requests, vgt.css and vgt-print.css. I thought this was the only way to specify a "normal" CSS and a "print" CSS for when someone wants to print a page. I found a trick to combine the two CSSs into one (@media tag), which saves 1 request.

The CSS is also now minified via YCompressor - saving 190 bytes

The CSS is also served via the CDN.

Images
Images such as the Google, Bing and Yahoo icons, the VGT main logo, icons for the "Network" sites, comment icons, etc have been converted into two CSS sprites. This technique basically combines all of your separate images into a single image, then you specify the offset into that image for displaying on the web (http://www.alistapart.com/articles/sprites).

This technique combined 22 separate images into 2 single images (sprites.png and sprites-gradient.png), and this slimmed the image sizes from 26,639 bytes total down to 14,518 bytes total.

This has a direct impact on first-time page load for a visitor, as they're doing 20 less HTTP requests.

Images are also served from the CDN.

Favicon
http://virtualglobetrotting.com/favicon.ico is what is displayed when you bookmark a site. It was a 32x32 RGBA .ico, which was larger than necessary and the Alpha channel in RGBA isn't used. I was able to reduce this to a 16x16 RGB .ico. File size shrunk from 4,286 bytes to 894 bytes.

The favicon.ico now also has a 1-year expiry date, which saves from browsers trying to re-check this file in subsequent visits.

CDN
Best of all, all of VGT's static content (per above) is now served from Amazon's CDN, called CloudFront. Their CDN is user-friendly, pretty cheap, and based off of Amazon's S3 files-in-the-sky web service for storing content. Basically, I upload a file to S3, then Amazon's CloudFront distribution network automatically has 20+ servers all over the world that serve the content closer to the user than I can in Virginia where the VGT webserver is.

Right now, CloudFront is serving about 600k requests / day, or 3.4 GB a day of images, thumbnails, pics, JavaScript, and CSS.

This has several wonderful effects:
* Content is served closer to the consumer. Since the VGT server is in Virginia, the farther away you are the slower the site will be. Now, Europe, Japan, etc visitors will get content closer to them and thus faster.
* The load on the VGT server itself is reduced. Before the CDN was setup, the apache web server would server 20-25 requests a second. Now, it's only serving 8 requests a second.
* Since the load on the server is reduced, the page generation time for the HTML/PHP pages is sped up. Page generation times over the last two months averaged 171ms. Now, page loads average 86ms over the last week (50% faster!)

WebPageTest.org
As tested from an outside service (webpagetest.org):
/ (home page) load time before as seen from a 1.5MB DSL modem:
First load: 8.68s load time, 1.27s start render, 97 requests, 820 KB
Repeat visitor: 5.33s load time, 0.51s start render, 67 requests, 547 KB

/ (home page) load time NOW:
First load: 5.31s load time, 0.92s start render, 87 requests, 481 KB
Repeat visitor: 1.69s load time, 0.42s start render, 6 requests, 23 KB repeat

As you can see, all components have improved. First-load # requests are down by 10 (this can very due to home page content), and 58% less KB. The total user perceived page load time of this content has been reduced from 8.68 seconds to 5.31 seconds (61% of the original time), and the page starts rendering at 920ms instead of 1270ms (1/3 second faster).

The repeat visit is the real winner. With all of the thumbnails being versioned and have an Expiry set, we reduce from 67 requests down to 6 (4 are Google ads), and 547 KB down to 23 KB. Page load time is 30% of the original as well.

Lots of work -- but I'm really happy with the results!
Parabellum
07/15/09 06:32 AM

Edit Reply Quote Quick Reply
Thanks nic, very impressive. The world needs more webmasters that actually care about page load time (hint, hint WaPo and NYTimes).

Up here in the mountains the only 'fast' internet connection available is Hughesnet (spit), and it's pretty slow when it's cloudy or rainy, so page load times really mean something to me. VGT is working great.

Now you need to go optimize the other 150 sites I read regularly! {8^)


--------------------
VGT Moderator
Pdunn
07/15/09 08:03 AM

Edit Reply Quote Quick Reply
I second Parabellum's cudos. Super job!
kjfitz
07/15/09 09:43 AM

Edit Reply Quote Quick Reply
Wow. Things have changed a bit since I used to hand code HTML 1.0. I'm very impressed.


--------------------
kjfitz
Virtual Globetrotting Moderator
nic
07/15/09 06:24 PM

Edit Reply Quote Quick Reply
Now that most of the performance optimizations are complete, I am planning on moving onto a "new features" phase as discussed in some of the other threads here.

Let me know if you encounter any problems with the latest set of changes!
Pages: 1

Extra information
0 registered and 0 anonymous users are browsing this forum.

Moderator:  nic, TexasAndroid, kjfitz, AlbinoFlea, Hinkkanen, Parabellum, Pdunn, dda, mlc1us, milwhcky 

Print Topic

Forum Permissions
      You cannot start new topics
      You cannot reply to topics
      HTML is enabled
      UBBCode is enabled

Topic views: 1904