Caching Minified files with Nginx

A quick introduction to Minify

Minify is a popular PHP5 library best described by the Minify site itself:

Minify is a PHP5 app that helps you follow several of Yahoo!’s Rules for High Performance Web Sites.
It combines multiple CSS or Javascript files, removes unnecessary whitespace and comments, and serves them with gzip encoding and optimal client-side cache headers.

The uses of Minify are twofold:

  1. Speed up websites by combining and minifying files
  2. Help eliminate the problems of cached css and javascript files in users browsers.

Minify is fairly easy to setup and can dramatically increase the performance of websites, but as mentioned in the FAQs because each request is served by PHP, it can actually slow your site down (for example if your site recieves a lot of traffic or you have are on a feeble shared server)

A simple solution to this problem – suggested in the Minify FAQs  is to serve your Minified files through a reverse proxy.

An even quicker introduction to Nginx

Nginx is popular high performance web server and reverse proxy server. I’m not going to try and summarise what it does here – if you are reading this, the chances are you already know.

We have used Nginx for a while in a standard way, using Nginx in front of Apache. This approach is easy to configure and is also to revert to a pure Apache setup. We served all static content (css, js, images etc.) directly from Nginx and passing requests for dynamic content to a backend (PHP on an Apache server) using the Nginx HttpProxyModule.

Despite the fact that our Minified css and Javascript rarely change with this setup a request for either is still a PHP request. We can avoid this request by using the Nginx HttpProxyModule  to cache responses from the backend making a faster response and reducing the load on the backend server.

Nginx configuration

This is a simple Nginx configuration taken from a development server running Ubuntu. For ease of maintenance, the configuration file is split up into several files using sites-available and sites-enabled directories (Debian/Ubuntu style). In this case Nginx is listening on Port 80 and Apache is listening on Port 8080

nginx.conf

The main file is nginx.conf which then includes all the live virtual hosts that are in the sites-enabled directory (often organised symbolic links to actual files in the sites-available directory)

The white-space in the config files doesn’t matter so it makes sense to organise your .conf files in the way you find most readable.

worker_processes  1;

events {
    worker_connections  1024;
}

http {
    include mime.types;
    default_type application/octet-stream;

    log_format  new_log
    '$remote_addr - $remote_user [$time_local] $request '
    '"$status" $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';

    # Proxy cache and temp configuration.
    proxy_cache_path 	/var/www/nginx_cache levels=1:2
			keys_zone=main:10m
			max_size=1g inactive=30m;
    proxy_temp_path 	/var/www/nginx_temp;

    sendfile on;

    include /etc/nginx/sites-enabled/*; 
}

The key directive here are the lines that setup the Proxy cache and which allow us to cache the results of scripts that are returned by the backend server. (see documentation)

    # Proxy cache and temp configuration.
    proxy_cache_path 	/var/www/nginx_cache levels=1:2
			keys_zone=main:10m
			max_size=1g inactive=30m;
    proxy_temp_path 	/var/www/nginx_temp;

proxy_cache_path – where on the filesystem the cached files will be stored
levels – helps define the structure of files stored in the cache directory
max_size – the maximum size of the cache
inactive – timeout period for requests to the cached files
proxy_temp_path – a buffer for requests from the file system

example.conf

This is a simple example of one of the virtual host .conf files this is included using the include /etc/nginx/sites-enabled/*; directive in the nginx.conf file.

server {
    listen 80;
    access_log /var/log/nginx/www.example.com.access.log;
    error_log /var/log/nginx/www.example.com.error.log;
    root /var/www/www.example.com/;
    index index.php index.html;
    server_name www.example.com;

    # send appropriate headers to enable browser caching for static files
    # static files are identified by file extension
    location ~* ^.+.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|exe|pdf|ppt|txt|tar|mid|midi|wav|bmp|rtf|js)${
    access_log off;
    expires 30d;
   }

    # Set the proxy cache key
    set $cache_key $scheme$host$uri$is_args$args;

    location ~/min/ {
        # Set proxy headers for the passthrough
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
	proxy_pass http://192.168.1.5:8080;
        proxy_cache main;
        proxy_cache_key $cache_key;
        proxy_cache_valid 30m; # 200, 301 and 302 will be cached.
        # Fallback to stale cache on certain errors.
        # 503 is deliberately missing, if we're down for maintenance
        # we want the page to display.
        proxy_cache_use_stale 	error
               	              	timeout
                       	      	invalid_header
                      		http_500
                       		http_502
                       		http_504
                       		http_404;
	} 
    # proxy any other requests back to the Apache server listening on Port 8080
    location / {
        more_clear_headers 'Content-Length' 'Transfer-Encoding';
	proxy_cache_bypass 1;
	proxy_no_cache 1;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_pass http://192.168.1.5:8080;
    }

}

The virtual host .conf file above contains 3 different location blocks each with a different role.

The first location block catches requests for static files and serves them directly from the file system – in this case the Apache server is not accessed at all. Unfortunately request to minify are all .php requests so cache headers are not sent an every time a minified file is requested it will be passed back to Apache.

The role of the third location block is to send any request that hasn’t already been dealt with already back to the Apache server.

The role of the second location block is deal with any requests to minify. We have set a cache timeout of 30 minutes and any requests within that time period will be served from the Nginx cache – if there is no match in the cache, the request will be passed back to Apache. If you are using minify_group, when the underlying Javascript and CSS files change, the timestamp on the minified URL will change and therefore no matching cache file will exist, so the Nginx cache will update.

How do I know if it’s working

The easiest way to see this is working is if you have Apache mod_status enabled. You will know it works because you will no longer see any requests to /min/?

I love my Kindle, but…

I was given a Kindle for my birthday earlier this year. I’ve always loved books, to be honest I am a bit of a book hoarder (much to the exasperation of my partner). I knew I was getting a Kindle (it was one of those presents…) and I was looking forward to it, but not without a certain amount of trepidation. What if the entire E-book experience turned out to be crap? What if I just hated reading of the Kindle’s screen?

So is it any good?

Yes. It is great. You can prize it from my cold dead hands.

One of my main justifications to my partner was that all the hefty technical tomes that lie about the house could be ditched and I could read them all on my Kindle (I think that was what persuaded her that it wasn’t a complete waste of money in the end). Unfortunately most technical books only seem to be avilable as PDFs – to be frank PDF rendering on the Kindle leaves something to be desired (although to be fair this depends on the formatting of the PDF itself).

My other worries turned out to be unfounded. I find reading on my Kindle a very pleasant experience, I’ve read it on the beach, in blazing sunlight – just like the ads.

I’ve just returned from holiday. I switched of wireless and came back without having run out of battery.

As long as you remember to put reading material in it, it is great and it is amazingly liberating to be able to carry about hundreds of books, articles and RSS periodicals (via Calibre).

Would I recommend it?

Absolutely, although I have severe reservations – not to do with the Kindle itself – more to do with the entire E-book ecosystem.

Perhaps the best thing about having a Kindle is that I have started to read books that I simply never would have read – think classic books that are  out of copyright and that can aquired for free. There is no risk in  trying out a free E-book. Obviously there is no cost but more importantly there is no book to find space on a shelf for; no unread book sitting around making you feel guilty.

Reservations

Over the years I have discovered countless authors from browsing second hand bookshops and charity shops. I’ve bought scores of books for 50p or so, the majority of which I still have years latter. It’s great that so many out of print or merely out of copyright books are available through sites like Project Gutenberg (or even the Amazon Kindle Store itself) – but what if you want to read a book that is  still in copyright? You can buy it, borrow it or steal it.

I don’t want to steal books. Why? Rowena Macdonald, an old friend of mine is an author, and other friends write for their livings. Authors work hard and deserve to be paid. (BTW you can buy Ro’s latest book Smoked Meat.)

You can borrow E-books from some libraries. Where I live though, there is no deal in place for Kindle. If it was looking for a particular dead tree style book, I could locate a used copy from any number of sources. It is true that authors do not directly gain benefit from the resale of a book, but the resale of books does help support the publishing and book industry in a wider sense. The environment of books and bookshops is of very tangible benefit.

I can’t count the number of new books that I have bought as a direct result of discovering an author in a second hand book – each of those purchases goes to support the publishing company, the author and a shop. Similarly I have spent too much money on the spur of the moment when a book has caught my eye in bookshop or art gallery.

In the UK at least E-books are subject to tax, unlike paper books, so prices can actually be more than the price of the physical book. Personally I’m surprised that anybody coughs up for an E-bbok when the real book costs no more… There is no romance in owning an E-book (although if romance is what you are after you’ve found the right medium). There is is nothing to touch or hold. You can’t feel the texture of the paper or smell the ink. There is no cover art, precious little blurb. In short other than the utter practicality, the E-book experience is inferior in every way.

The overheads of publishing an E-book are clearly less than that of a printed book,  and it would be lovely to see more publishers have more realistic costs for their E-books. It is very hard to estimate, but I don’t think it would be too unfair to suggest that printing + retailing costs make up 50% of the cost of a physical book.

If there is one thing that most E-books could do with, it is a decent editor. Authors and publishers could be a lot better paid and we as consumers could still get mainstream E-books at a much reduced price.

The Kindle’s ability to download a book sample for free is a lovely idea and it works – it persuaded me to buy Hugh Howey’s book Wool. However, while be able to read a sample at your leisure is great, it’s not going to replace that random book shop browsing experience, where you stumble across a gem you would never have found in a million years in a cuarted online experience.

E-books aren’t going anywhere, they are here to stay. As technology improves they are only going to provide a richer reading experience, but I do worry about the damage already being done to the world of second hand, passed on books. You can’t give an E-book away when you’ve read it.

On Digital Reader I came across this article. What E-books (and indeed Music) need is a legal way to pass on or to resell E-books, analogous to the way we can give away or sell a paper book or a CD.

DRM shouldn’t just be a stick to beat the consumer with – it could and should be neutral, balancing the needs of content creators and consumers (but that’s another story).

References:
http://ireaderreview.com/2009/05/03/book-cost-analysis-cost-of-physical-book-publishing/


Meta description is the new black

Meta description is one of those areas of SEO that seems to drift in and out of  popular attention. I don’t think it has ever made sense to ignore the value of meta description, although I have rarely given it the love it deserves.

Right now is a good time to think about this tag, and think about it carefully. The meta description tag is frequently overlooked for a number of reasons – meta description is old – it seems to have been around forever, it is not perceived as exciting, and it is not a quick fix – writing a good descriptions takes both though and time. The ubiquity of content management systems does nothing to help either – meta data is too often either left to be generated automatically or somehow just gets forgotten, buried under a backwater tab or lost at the bottom of a rarely used menu.

Just imagine

Everything has slotted into place, your site now has optimal urls, the content is good  (but still getting better of course), your page titles are spot on. Your site has reached page 1 of Google for the terms you are focussing on… everything is going to be rosy and you can start think seriously about your new improved  life spent sipping cocktails on the beach whilst your website just hums away.

Wait a minute, there’s a problem. People see your site in the search results, but nobody clicks through. This is where meta description comes in. Think of everything you have done so far as setting the stage, getting your product into that shop front in the prime position on the high street. The trouble is your shop window just doesn’t appeal and nobody bothers to come in.

The shop window

Meta description is your shop window, it is a key component of the snippets that Google shows on it’s results page. There is no guarantee that Google will use your description word for word, but the chances are it will use at least part of it.

As usual Webmaster Tools is great reference http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35624 and it pays to re-read the published advice from time to time.

Write for people

To many people  just use meta description as an opportunity to place keywords and key phrases, there is clearly a place for this, but it’s no good just targeting machines. If the snippet that appears in the search results is your shop window, then it is also your chance to engage the viewer.

Don’t just treat your description as an opportunity to get one up on the system, treat it is a ‘Call to action’ – grab the viewer’s attention and make sure they want to click through to your website.

 

 

 

 

 

I miss the old Streetview

Recently I’ve been updating a site with some quite complex map related functionality from Google maps v2 to Google maps v3. All in all it has been straight forward and worth doing.

The one disappointment IMO has been Streetview – the old version was so much slicker. The new version is easier to implement, but that is as positive as I can be. I miss being able to click in the distance and zoom down the road; I miss the swoosh as you move from position to position; I miss built in Full Screen.

On the flip side we now the have API for static streetview images – I’ve been using Jamie Thompson’s brilliant work on getting static map images up until now, but it is time to move on.

Slightly irritating is the fact you need to convert old (v2) Streetview panorama locations (if you have them saved) to show the locations as static images or as saved Streetview panoramas.

To convert a v2 panorama to a v3 panorama:

  • ‘yaw’ has become ‘heading’
  • change the sign of ‘pitch’ e.g. -1.80999 becomes 1.80999
  • add 1 to your ‘zoom’ level