XNSIO
  About   Slides   Home  

 
Managed Chaos
Naresh Jain's Random Thoughts on Software Development and Adventure Sports
     
`
 
RSS Feed
Recent Thoughts
Tags
Recent Comments

Towards Continuous Deployment: Zero Downtime WebApp Deployment

Tuesday, November 24th, 2009

Lets assume you have a simple web application which runs on a web server like tomcat, jetty, IIS or mongrel and is backed by a database. Also lets say you have only one instance of your application runningĀ  (non-clustered) in production.

Now you want to deploy your application several times a week. The single biggest issue that gets in the way of continuous deployment is, every time you deploy a new version of your application, you don’t want a downtime (destroy your user’s session). In this blog, I’ll describe how to deploy your applications without interrupting the user.

First time set-up steps:

  • On your local machine set up a web server cluster for session replication and ensure your application works fine in a clustered environment. (Tips on setting up a tomcat cluster of session replication). You might want to look at all the objects you are storing in you session and whether they are serializable or not.
  • On your production server, set up another web server instance. We’ll call this temp_webserver. Make sure the temp_webserver runs onĀ  a different port than your production server. (In tomcat update the ports in the tomcat/config/server.xml file). Also for now, don’t enable clustering yet.
  • In your browser access the temp_webserver (different port) and make sure everything is working as expected. Usually both the port on which the production web server and the temp_webserver is running should be blocked and not accessible directly from any other machine. In such cases, set up an SSH-tunnel on the specified port to access the webapp in your browser. (ssh -L 3333:your.domain.com:web_server_port username@server_ip_or_name). Alternatively you could SSH to the production box and use Lynx (text browser) to test your webapp.
  • Now enable clustering on both web servers, start them and make sure the session is replicated. To test session replication, bring up one webserver instance, login, then bring up the other instance, now bring down the first instance and make sure your app does not prompt you to login again. Wait a sec! When you brought down the first server, you get a 404 Page not found. Of course, even though clustering might be working fine, your browser has no way to know about the other instance of web server, which is running on a different port. It expects a webserver on the production server’s port.
  • To solve this problem, we’ll have to set up a reverse-proxy server like Nginx on your production box or any of your other publically accessible server. You will have to configure the reverse proxy server to run on the port on which your web server was running and change your webserver to run on a different (more secure) port. The reverse proxy server will listen on the required port and proxy all web requests to your server. (sample Nginx Configuration). This will help us start and stop one of our webservers without the user noticing it. Also notice that its a good practice to let your reverse proxy server serve all static content. Its usually a magnitude faster.
  • After setting up a round robin reverse proxy, you should be able to test your application in a clustered environment.
  • Once you know your webapp works fine in a clustered env in production, you can change the reverse-proxy configuration to direct all traffic to just your actual production webserver. You can comment out the temp_webserver line to ensure only production webserver is getting all requests. (Every time you make a change to your reverse proxy setting, you’ll have to reload the configuration or restart the reverse proxy server. Which usually takes a fraction of a second.)
  • Now un-deploy the application on the temp_webserver and stop the temp_webserver. Everything should continue working as before.
  • * At each step of this process, its handy to run a battery of functional tests (Selenium or Sahi) to make sure that your application is actually work the way you expect it. Manual testing is not sustainable and scalable.

This concludes our initial set-up. We have enabled ourselves to do continuous deployment without interrupting the user.

Note: Even though our web-server is clustered for session replication, we are still using the same database on both instances.

Now lets see what steps we need to take when we want to deploy a new version of our application.

  • FTP the latest web app archive (war) to the production server.
  • If you have made any Database changes follow Owen’s advice on Zero-Downtime Database Deployment. This will help you upgrade the DB without affecting the existing, running production app.
  • Next bring up the temp_webserver and deploy the latest web application. In most cases, its just a matter of dropping the web archive in the web apps folder.
  • Set up a SSH-Proxy from your machine to access the temp_webserver. Run all your smoke tests to make sure the new version of the web-app works fine.
  • Go back into your reverse proxy configuration and comment out the production webserver line and uncomment the temp_webserver line. Reload/Restart your reverse proxy, now all request should be redirected to temp_webserver. Since your reverse proxy does not hold any state, reloading/restarting it should not make any difference. Also since your sessions are replicated in the cluster, users should see no difference, except that now they are working on the latest version of your web app.
  • Now undeploy the old version and deploy the latest version of your web app on the production webserver. Bring it up and test it using a SSH_proxy from your local machine.
  • Once you know the production web-server is up and running on the latest version of your app, comment out the temp_webserver and uncomment the production webserver in the reverse proxy setting . Reload the configuration or restart the reverse proxy. Now all traffic should get redirected to your production web server.
  • At this point the temp_webserver has done its job. Its time to undeploy the application and stop the temp_webserver.

Congrats, you have just upgraded your web application to the latest version without interrupting your users.

Note: All the above steps are very trivial to automate using a script. Because of the speed and accuracy, I would bet all my money on the automated script.

Setting up Tomcat Cluster for Session Replication

Monday, November 9th, 2009

If you have your web application running on one tomcat instance and want to add another tomcat instance (ideally on a different machine), following steps will guide you.

Step 1: Independently deploy your web application (WAR file) on each instance and make sure they can work independently.

Step 2: Stop tomcat

Step 3: Update the <Cluster> element under the <Engine> element in the Server.xml file (under the conf dir in tomcat installation dir) on both your servers with:

<Engine name="<meaningful_unique_name>" defaultHost="localhost">      
     <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
              channelSendOptions="8">
          <Manager className="org.apache.catalina.ha.session.DeltaManager"
                   expireSessionsOnShutdown="false"
                   notifyListenersOnReplication="true"/>
          <Channel className="org.apache.catalina.tribes.group.GroupChannel">
               <Membership className="org.apache.catalina.tribes.membership.McastService"
                           address="228.0.0.4"
                           port="45564"
                           frequency="500"
                           dropTime="3000"/>
               <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                         address="auto"
                         port="4000"
                         autoBind="100"
                         selectorTimeout="5000"
                         maxThreads="6"/>
               <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
                   <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
               </Sender>
               <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
               <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
          </Channel>
          <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
                 filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.css;.*\.txt;"/>
          <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
     </Cluster>
     ...
</Engine>

For more details on these parameters, check https://sec1.woopra.com/docs/cluster-howto.html

Step 4: Start tomcat and make sure it starts up correctly. You should be able to access http://locahost:8080. Most default tomcat installations come with an examples web app. Try access http://localhost:8080/examples/jsp/ You should see a list of JSP files.

Step 4.a: Also if you see catalina.out log file, you should see:

INFO: Initializing Coyote HTTP/1.1 on http-8080
Nov 9, 2009 9:29:43 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 762 ms
Nov 9, 2009 9:29:43 AM org.apache.catalina.core.StandardService start
INFO: Starting service <server_name>
Nov 9, 2009 9:29:43 AM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.16
Nov 9, 2009 9:29:43 AM org.apache.catalina.ha.tcp.SimpleTcpCluster start
INFO: Cluster is about to start
Nov 9, 2009 9:29:43 AM org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:/<server_ip>:4000
Nov 9, 2009 9:29:43 AM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to 500
Nov 9, 2009 9:29:43 AM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4

Step 5: Stop tomcat.

Step 6: We’ll use the examples web app to test if our session replication is working as expected.

Step 6.a: Open the Web.xml file of the “examples” web app in your webapps. Mark this web app distributable, by adding a <distributable/> element at the end of the Web.xml file (just before the </web-app> element)

Step 6.b: Add the session JSP file. This JSP prints the contents of the session and also adds/increments a counter stored in the session.

Step 6.c: Start tomcat on both machines

Step 6.d: You should see the following log in catalina.out

Nov 9, 2009 9:29:44 AM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, 0, 101}:4000,{-64, -88, 0, 101},4000, alive=10035,id={68 106 92 39 -110 -8 73 124 -116 -122 -15 -3 11 117 56 105 }, payload={}, command={}, domain={}, ] 
 
Nov 9, 2009 9:29:49 AM org.apache.catalina.ha.session.DeltaManager start
INFO: Register manager /examples to cluster element Engine with name <server_name>
Nov 9, 2009 9:29:49 AM org.apache.catalina.ha.session.DeltaManager start
INFO: Starting clustering manager at /examples
Nov 9, 2009 9:29:49 AM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
WARNING: Manager [localhost#/examples], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, 0, 101}:4000,{-64, -88, 0, 101},4000, alive=15538,id={68 106 92 39 -110 -8 73 124 -116 -122 -15 -3 11 117 56 105 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds.
Nov 9, 2009 9:29:49 AM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
INFO: Manager [localhost#/examples]; session state send at 11/9/09 9:29 AM received in 101 ms.
 
Nov 9, 2009 9:29:49 AM org.apache.catalina.core.ApplicationContext log
INFO: ContextListener: contextInitialized()
Nov 9, 2009 9:29:49 AM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener: contextInitialized()
Nov 9, 2009 9:29:50 AM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
Nov 9, 2009 9:29:50 AM org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
Nov 9, 2009 9:29:50 AM org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/49  config=null
Nov 9, 2009 9:29:50 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 6331 ms

Step 6.e: Try to access http://localhost:8080/examples/jsp/session.jsp Try refreshing the page a few times, you should see the counter getting updated.

Step 6.f: You should see the same behavior when you try to access the other tomcat server. Open another tab in your browser and hit http://<other_server_ip>:8080/examples/jsp/session.jsp

Step 6.g: At this point we know the app works fine and the session is working correctly. Now we want to check if the tomcat cluster is replicating the session info. To check this, we want to pass the session from server 1 to session 2 and see if it increments the counter from where we left.

Step 6.h: Before accessing the page, make sure you copy the j_session_id from server 1 (displayed on the http://localhost:8080/examples/jsp/session.jsp). Also make sure to clear all cookies from server 2. (All browsers give you a facility to clear cookies from a specific host/ip).

Step 6.i: Now hit http://<server_2_ip>:8080/examples/jsp/session.jsp;jsessionid=<jsession_id_from_server1>

Step 6.j: If you see the counter incrementing from where ever you had left, congrats! You have session replication working.

Step 6.k: Also catalina.out log file should have:

Nov 9, 2009 9:42:03 AM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener: sessionCreated('CDC57B8C5CFDFDDC2C8572E7D14C0D28')
Nov 9, 2009 9:42:03 AM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener: attributeAdded('CDC57B8C5CFDFDDC2C8572E7D14C0D28', 'counter', '1')
Nov 9, 2009 9:42:05 AM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener: attributeReplaced('CDC57B8C5CFDFDDC2C8572E7D14C0D28', 'counter', '2')

While this might like smooth, I ran into lot of issues when getting to this point. Following are some trap routes I ran into:

1) java.sql.SQLException: No suitable driver tomcat cluster
Make sure your DB Driver jar (in our case mysql-connector-java-x.x.xx-bin.jar) is in tomcat/lib folder

2) In catalina.org if you see the following exception:

Nov 7, 2009 3:48:53 PM org.apache.catalina.ha.session.DeltaManager requestCompleted
SEVERE: Unable to serialize delta request for sessionid [1F43C3926FF3CC231574EF248896DCA6]
java.io.NotSerializableException: com.company.product.Class
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326)
	at java.util.ArrayList.writeObject(ArrayList.java:570)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)

This means that you are storing com.company.product.Class object (or some other object that holds a reference to this Object) in your session. And you’ll need to make com.company.product.Class implement Serializable interface.

3) In your catalina.out log if you see

INFO: Register manager /<your_app_name> to cluster element Engine with name <tomcat_engine_name>
Nov 7, 2009 11:56:20 AM org.apache.catalina.ha.session.DeltaManager start
INFO: Starting clustering manager at /<your_app_name>
Nov 7, 2009 11:56:20 AM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
INFO: Manager [localhost#/<your_app_name>]: <strong>skipping state transfer. No members active in cluster group</strong>.

If both your tomcat instance are up and running, then check if your tomcat servers can communicate with each other using Multicast with the following commands:

$ ping -t 1 -c 2 228.0.0.4
PING 228.0.0.4 (228.0.0.4): 56 data bytes
64 bytes from <server_1_ip>: icmp_seq=0 ttl=64 time=0.076 ms
64 bytes from <server_2_ip>: icmp_seq=0 ttl=64 time=0.645 ms

— 228.0.0.4 ping statistics —
1 packets transmitted, 1 packets received, +1 duplicates, 0.0% packet loss

or

$ sudo tcpdump -ni en0 host 228.0.0.4
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on en0, link-type EN10MB (Ethernet), capture size 65535 bytes
22:11:50.016147 IP <server_1_ip>.45564 > 228.0.0.4.45564: UDP, length 69
22:11:50.033336 IP <server_2_ip>.45564 > 228.0.0.4.45564: UDP, length 69
22:11:50.516746 IP <server_1_ip>.45564 > 228.0.0.4.45564: UDP, length 69
22:11:50.533613 IP <server_2_ip>.45564 > 228.0.0.4.45564: UDP, length 69

If you don’t see the results as described above, you might want to read my blog on Enabling Multicast.

    Licensed under
Creative Commons License