Towards Continuous Deployment: Zero Downtime WebApp Deployment
Lets assume you have a simple web application which runs on a web server like tomcat, jetty, IIS or mongrel and is backed by a database. Also lets say you have only one instance of your application running (non-clustered) in production.
Now you want to deploy your application several times a week. The single biggest issue that gets in the way of continuous deployment is, every time you deploy a new version of your application, you don’t want a downtime (destroy your user’s session). In this blog, I’ll describe how to deploy your applications without interrupting the user.
First time set-up steps:
- On your local machine set up a web server cluster for session replication and ensure your application works fine in a clustered environment. (Tips on setting up a tomcat cluster of session replication). You might want to look at all the objects you are storing in you session and whether they are serializable or not.
- On your production server, set up another web server instance. We’ll call this temp_webserver. Make sure the temp_webserver runs on a different port than your production server. (In tomcat update the ports in the tomcat/config/server.xml file). Also for now, don’t enable clustering yet.
- In your browser access the temp_webserver (different port) and make sure everything is working as expected. Usually both the port on which the production web server and the temp_webserver is running should be blocked and not accessible directly from any other machine. In such cases, set up an SSH-tunnel on the specified port to access the webapp in your browser. (ssh -L 3333:your.domain.com:web_server_port [email protected]_ip_or_name). Alternatively you could SSH to the production box and use Lynx (text browser) to test your webapp.
- Now enable clustering on both web servers, start them and make sure the session is replicated. To test session replication, bring up one webserver instance, login, then bring up the other instance, now bring down the first instance and make sure your app does not prompt you to login again. Wait a sec! When you brought down the first server, you get a 404 Page not found. Of course, even though clustering might be working fine, your browser has no way to know about the other instance of web server, which is running on a different port. It expects a webserver on the production server’s port.
- To solve this problem, we’ll have to set up a reverse-proxy server like Nginx on your production box or any of your other publically accessible server. You will have to configure the reverse proxy server to run on the port on which your web server was running and change your webserver to run on a different (more secure) port. The reverse proxy server will listen on the required port and proxy all web requests to your server. (sample Nginx Configuration). This will help us start and stop one of our webservers without the user noticing it. Also notice that its a good practice to let your reverse proxy server serve all static content. Its usually a magnitude faster.
- After setting up a round robin reverse proxy, you should be able to test your application in a clustered environment.
- Once you know your webapp works fine in a clustered env in production, you can change the reverse-proxy configuration to direct all traffic to just your actual production webserver. You can comment out the temp_webserver line to ensure only production webserver is getting all requests. (Every time you make a change to your reverse proxy setting, you’ll have to reload the configuration or restart the reverse proxy server. Which usually takes a fraction of a second.)
- Now un-deploy the application on the temp_webserver and stop the temp_webserver. Everything should continue working as before.
- * At each step of this process, its handy to run a battery of functional tests (Selenium or Sahi) to make sure that your application is actually work the way you expect it. Manual testing is not sustainable and scalable.
This concludes our initial set-up. We have enabled ourselves to do continuous deployment without interrupting the user.
Note: Even though our web-server is clustered for session replication, we are still using the same database on both instances.
Now lets see what steps we need to take when we want to deploy a new version of our application.
- FTP the latest web app archive (war) to the production server.
- If you have made any Database changes follow Owen’s advice on Zero-Downtime Database Deployment. This will help you upgrade the DB without affecting the existing, running production app.
- Next bring up the temp_webserver and deploy the latest web application. In most cases, its just a matter of dropping the web archive in the web apps folder.
- Set up a SSH-Proxy from your machine to access the temp_webserver. Run all your smoke tests to make sure the new version of the web-app works fine.
- Go back into your reverse proxy configuration and comment out the production webserver line and uncomment the temp_webserver line. Reload/Restart your reverse proxy, now all request should be redirected to temp_webserver. Since your reverse proxy does not hold any state, reloading/restarting it should not make any difference. Also since your sessions are replicated in the cluster, users should see no difference, except that now they are working on the latest version of your web app.
- Now undeploy the old version and deploy the latest version of your web app on the production webserver. Bring it up and test it using a SSH_proxy from your local machine.
- Once you know the production web-server is up and running on the latest version of your app, comment out the temp_webserver and uncomment the production webserver in the reverse proxy setting . Reload the configuration or restart the reverse proxy. Now all traffic should get redirected to your production web server.
- At this point the temp_webserver has done its job. Its time to undeploy the application and stop the temp_webserver.
Congrats, you have just upgraded your web application to the latest version without interrupting your users.
Note: All the above steps are very trivial to automate using a script. Because of the speed and accuracy, I would bet all my money on the automated script.