JMS Architectural Pros and Cons

Why JMS Isn't A "One Messaging Technology Fits All" Solution for Distributed System Development

Author's note: This article was written with help of Darrell Cavens, CTO of BlueNile.com,
who provided many insightful and invaluable comments and suggestions.

INTRODUCTION

Never before has the design and implementation of a message-proccessing layer within a distributed system been as complex as it is today. This is mostly due to the dramatic increase in potential functionality enabled by standards like JMS which allow developers to connect many vendors technologies together in a single system, and the proliferation of the Internet, which has given rise to new, expansive user bases.

Eventually most Java developers developing distributed systems will need to determine if Java Messaging Server (JMS) is a technology that meets the requirements as a message-processing layer for their distributed systems. Architects and developers of these system must take into account the many implications of a JMS-based solution on their systems, including application server performance, data distribution, security, error handling, etc, and then make critical technical decisions to design and implement said systems. One of the best ways to evaluate JMS as a technology and to understand it's implications on system design is through a case study of real-world distributed systems. Such an analysis will illustrate the pros and cons of JMS and how these relate to crucial parts of distributed system design.

In this article, I will examine two distributed systems that are architecturally very similar in order to understand the implications of imposing a JMS message-processing layer on each system. I will illustrate both the assets, as well as the liabilities of JMS w/r/t each systems specific requirements to develop a set of basic guidelines that help determine if JMS is an appropriate technology for a distributed system, as well as for when JMS might lead to an overengineered and/or overly-complicated solution. I will present an alternate, simpler strategy for data distribution and evaluate this against the specific areas of the two-system analysis.

BACKGROUND

Distributed system development is profilerating rapidly as software developers try to build systems that keep up with the ever-increasing requirements imposed by e-business. The availability of the Internet has been one of the main contributors to distributed system development, with several protocols being introduced to allow communication to occur within a distributed system. Examples of such protocols include CORBA IIOP, Microsoft DCOM, and Java RMI.

The natural evolution of these protocols has lead to the introducion of message-oriented middleware (MOM), which allow for looser coupling within distributed systems by abstracting translation, security, and the underlying communications protocols away from the clients and servers. Examples of these middleware solutions include SOAP and JMS. Middle layer transaction processing has been around since the early days of COBOL with proprietary solutions but was somewhat limited in it's complexity because of the limits of these early messaging technologies.

With the advent of standards like JMS, developers now have the ability to connect many vendors technologies together. Coupled together with the proliferation of the Internet and it's subsequent expansive (and oftentimes poorly understood) user bases, developers are presented with new and complex challenges. Design decisions relating to distributed system design are getting more difficult to make, and their implications on data integrity and distribution are critical to systems success or failure.

In most development shops I've ever been in, a pervasive/tacit assumption is that the introduction of technology is generally regarded as as asset while it's liabilities are oftentimes ignored. Not accounting for the liabilities (aka: count the hits but not the misses) most often results in a system that is either unnecessarily complicated and/or over-engineered.

A basic understanding of JMS and it's inherent qualities (those qualities that are independent of any specific system), followed by a careful analysis of JMS in relation to specific distributed system scenarios can indicate how well a JMS might solve the requirements presented by your system vs either commuting existing problems or even introducing new ones.

JMS OVERVIEW

JMS, introduced by Sun in 1999 as part of the J2EE specification, is a set of standards which taken together describe the foundations for a message-processing middleware layer. JMS allows systems to communicate both synchronously and asynchronously via both point-to-point and publish-subscribe models. Today, there are several vendors to choose from that provide implementations of the JMS specification.

Figure 1 shows a very simple JMS-based system in which an outgoing queue is populated with messages for clients to pick up for processing, as well as an incoming queue, which collects the results of the client processing for insertion into a database.

Figure 1: JMS Turning Database Rows into Messages for Distribution

Message-oriented middleware (like JMS) was introduced to allow for looser coupling within distributed systems by abstracting translation, security, and the underlying communications protocols away from the clients and servers. One of the main assets of a message-processing layer being that because it introduces this layer of abstraction, the implementation of either the client or server can change, sometimes radically, without affecting the other components within the system.

JMS has the distinction of being a standard, and while it has been criticized for being Java-centric, there are many vendors that provide implementations of the standard (e.g.: BEA, HP, IBM, Macromedia, and Oracle), thereby allowing JMS to interact with multiple vendor technologies.

TWO SPECIFIC SCENARIOS

In this section I'd like to present two distributed systems that are potential candidates for introduction of a JMS-based message-processing layer, as well as what the goals are for each system and why they are candidates for JMS.

Scenario 1

The first candidate system for JMS imposition is within a distributed encoding system (shown in Figure 2). This system has a set of N clients that retrieve encoding jobs from a central database server. The clients then excecute the actual transformation (aka: encoding) from digital master to encoded files, and finish by reporting their post-processing status (eg: SUCCESS/FAILED) back to the central database server.

Figure 2: Scenario #1

It doesn't matter what type of encoding is occurring (eg: TEXT, AUDIO, or VIDEO), nor what the transformations are (eg: .pdf to .xml, .wav to .mp3, .avi to .qt). What is important to understand is that encoding is a very CPU-intensive process that in order to scale, requires processing to be distributed across multiple clients.

At a glance, this system is a potential candidate for JMS for the following reasons:

The processing must be distributed as it is extremely processor (CPU) intensive.
It may be problematic, from a sytem performance standpoint, to connect multiple clients directly to a single database server.

Scenario 2

The second candidate system for potential application of JMS is a global registration system for an Internet portal. Global registration involves handling requests for new user creation (registration), login, and authentication, in this case for an Internet Portal.

Figure 3: Scenario #2

It doesn't matter what specific information is being registered (eg: name, address, favorite color), or how users are being authenticated (eg: server side user objects, HTTP cookies). What is important is to understand that this system must scale to handle millions of users, and that usage patterns are difficult, if not impossible to predict (eg: during an ESPN television segment during the World Cup, the announcer suggests Log in and vote in our online poll, we'll present the results at the end of the show. All of a sudden you have 500,000 users all trying to log in within a 3 minute interval).

(3 minutes = 180 seconds. 500,000 user logins/180 seconds = 2778 user logins/sec)

At a glance, this system is a potential candidate for JMS for the following reasons:

To scale the transaction volume, the system must be distributed.
Transactions are atomic (eg: login), so they are stateless and are therefore candidates for being distributed.

The two systems are architecturally extremely similar. Several client machines need to extract data from a central database server (possibly replicated out to M read-only database servers), execute some logic local on the client, and then report the status back to the central database server. One system is delivering encoded files to a filesystem over UNC/FTP, the other is delivering HTML content to Web browsers over HTTP. Both systems are distributed.

It is my experience that this is as far as many engineers go with their analysis before applying a technology like JMS. In the rest of this article I hope to show that although these systems do share many characteristics, the appropriateness of imposing a JMS message-processing layer becomes more clear, and divergant, as requirements specific to each system are examined, including system performance, data distribution, security, and scalability.

SYSTEM ANALYSIS: TO USE JMS OR NOT TO USE JMS (THIS IS THE QUESTION)

JMS has a set of intrinsic qualities that are independent of the system that employs JMS. Some of these qualities (pros denoted by +, cons denoted by -) that will apply to both systems include:

(+) JMS is a set of standards for which multiple vendors provide implementations, thereby helping to avoid the dreaded vendor lock-in problem.
(-) JMS is complicated: it's an entirely new layer with a new set of servers and all of the implications inherent in operating a distributed system. Managing software rollout, server monitoring, and security are just a few of the non-trivial problems that are inherent in technology rollouts similar to JMS whose costs need to be considered.
(+) JMS allows for abstraction (via a generic API) between client and server, the implication being that one can change a database schema or platform without changing the application layer (implicit here are other potential system changes, isolated from one another by the messaging layer).
(-) Vendors do not always interpret, and therefore implement, the standards exactly the same way, so there do exist differences between different implementations.
(+/-) JMS can help a system scale (this in itself is a pro). The con being that any system that scales using JMS can be scaled without it.
(-) With JMS, you need more checks and balances in your system, as you're not only introducing an entirely new layer, you're also introducing asynchronous data distribution and acknowledgement, which has the added complexity of asynchronous notification.
(-) No reporting/updating/monitoring of messages in queues without custom software.

JMS also has a set of qualities that are system-dependent. The appropriateness of JMS imposition depends on how well these qualities map to the problem set you're trying to solve. Following are a list of some of these qualities and how these relate to the two systems of interest.

Caching

Caching should be a primary consideration for capacity planning within any distributed system. JMS has many features that allow it to be used as a caching technology (mainly that it's distributed, asynchronous or synchronous, and data is exchanged as objects in messages). In this way, an existing JMS installation could be leveraged for use as a caching infastructure if required.

When considering the Encoding System, caching is generally not useful to increase overall system performance, as most file transformations are executed once and delivered to a hosting facility or SAN, coupled with the fact that there is very little content overlap between customers.

Global Registration is a prime candidate for a cache of user-information, as users tend to log in, browse for a while, and then log out. Log in can create a cache entry for a user, and this object can be used for subsequent user authentication while the user is on the site.

Processing Order

Within the Global Registration system, there is no scheduling and/or order for transaction processing. Pseudorandom users enter the system at pseudorandom intervals upon login, browse content (and are therefore authenticated when accessing restricted content and/or applications), and then log out.

Within the encoding system, processing is ordered. Content is batched up into groups for delivery depending on the availability of removable storage (eg: DLT or NetApp storage). No content can be delivered until the batch is complete, so batches must be executed in order (though transforms within a batch could potentially be unordered).

It is possible to implement priority queues within JMS to preserve order of processing, but maintaining this order can becomes quite complicated when considering the preservation of batches (groups) of messages between multiple JMS servers and multiple queues. A more suitable technology for managing this workflow is a relational database server with support for transactions.

Security

Security is not part of the JMS specification. In this way, the problem of security is commuted with a JMS-based implementation (this is to say that if you have a security requirement pre-JMS, you'll have a similar security requirement post-JMS). Knowing this, it becomes important to understand how JMS might relate to leveraging your existing technological infastructure for security.

In general, the more technology you use, the more vulnerable your system is to hackers and security violations. Because the global registration application servers are Web-facing, security flaws discovered in your vendors JMS implementation and published in news groups on the Internet quickly become security liabilities for your site. As well, because JMS is just a generic API, it's more prone to being hacked than a system that uses propietary technology and an unpublished API.

While it may be possible to leverage your existing firewall and IP-based network security to protect your back-end (read: not Web-facing/pun intended) application and database servers from security violations, there is a significant security risk created by exposing application servers running JMS directly to the Internet.

The encoding system will generally be all on the same network (also a network that is isolated from the Internet). This would suggest that there's nothing inherent about this systems network topography that relates to JMS and leveraging this topography to provide security (there are far fewer security requirements for the encoding system, as it is not Web-facing).

Scalability

Because the Global Registration system is subject to the whims of a large and capriciously-clicking user base, the scalability requirements of this system warrant the use of JMS (or similar message-processing layer). JMS will not only help scale the system, it will queue transactions, though it will do little to help with the system being flooded by user requests.

Because the distributed encoding system has data traffic that can be carefully regulated (as this is presumably an entirely self-contained system), the scalability requirements for this system are not as formidable. For distributed encoding, you'll probably be able to connect your O[100] clients directly to your database and throttle their traffic to balance encoding throughput with database server performance.

Performance

The introduction of a single JMS server does more to commute performance issues than it does to solve them. It is for this reason that any system that uses JMS should be designed with multiple JMS servers (and therefore multiple queues) in mind. To understand why performance problems are commuted instead of solved, consider the following figure, which shows the processing layers that are required for a generic data server to respond to client connection requests:

Figure 4: Data Access on a Server

There are two parts to exchanging data between client and server, whether this is client-to-database or client-to-JMS server:

Data access.
Thread and socket management, pooling, and caching.

A JMS and a database server look exactly the same when considering the SERVER side in Figure 4. This is because they both have to handle socket connections and thread management, as well as access to the data that resides on the server.

When not using using more than one JMS server, potential performance problems will simply be commuted from the database server to the JMS server. In addition to possible performance degradation associated with context switching within your database server, there is now this same performance problem, with greater potential due to JVM performance issues, within your JMS server.

A single JMS server will not only add significant complexity to your system but will also commute potential performance problems related to connecting multiple clients to a single server. Considering the impact of multiple JMS servers on your systems design and data flow is tremendous and can mean the difference between a successful and failed system rollout.

In summary, features vs. potential JMS impact looks like so:

Feature	JMS Notes/Impact
Caching	Extremely beneficial for systems that require caching
Processing Order	Best for systems that allow for independent processing order
Security	Commutes security requirements and/or introduces potential security holes
Scalability	Beneficial for systems subject to extreme/unpredictable surges in traffic
Performance	Beneficial only if multiple JMS servers are used

JMS ALTERNATIVES

If your distributed system does not warrant the complexity of a solution like JMS, as is the case for the distributed encoding system, an alternate data distribution (and data collection) strategy is required. Following is one alternate strategy for distributing data within a distributed system. By data distribution, I mean:

Getting data from a central database server to N clients
Collecting data from N clients and storing it into a central DB server
Reconciling messages with ACKnowledgements

The system shown in Figure 5 shows two databases (master and secondary), fronted by one or more web servers (IIS/Netscape/etc) into which is loaded a single ISAPI/NSAPI module (written in either C or C++). The clients communicate with the Web servers using XML over HTTP. The master database server maintains aggregate data, the secondary database is used to collect raw data for aggregation.

Figure 5: Alternate ISAPI/NSAPI/HTTP/XML-based Strategy

The ISAPI/NSAPI modules maintain a pool of ODBC connections to the databases, as well as a bunch of threads awaiting HTTP requests delivered via the web servers. Data reads are cached in memory within the ISAPI/NSAPI environment, which concurrency issues are handled using mutexes/critical sections/etc.

Data writes are also cached in memory and dumped to local disk (Web Server) periodcially in BCP format. Clients connect to the web servers using HTTP and request data (or write data) using XML (possibly with a multi-part FORM POST).

Data writes can be synchronous (HTTP POST -> SQL Stored Procedure INSERT -> HTTP 200 ACK returned to client), or asynchronous (HTTP POST -> in-memory cache -> HTTP 200 returned to client, cache later dumped to disk for BCP insert into data collection server).

It should also be noted that this strategy has innumerable permutations which include additions like database replication to a set of read-only databases, adding another set of web servers for data reads, and adding a second set of web servers for data writes.

This strategy has the following set of pros and cons which map quite closely to the set of evaluation criteria for the JMS-based analysis:

Pros:

NSAPI/ISAPI modules are written in native C/C++, which is very fast.
The system leverages existing web server software for port, socket, and thread management to improve performance (e.g.: able to maintain 50k simultaneous connections).
ISAPI/NSAPI modules are relatively simple and lightweight (read: not very much code).
Can throttle traffic out of database, as connection pool is configurable.
Can maintain many records in memory (O[100k..1m] ] (read: not limited by JVM memory).
Easy to make data access thread safe using mutexes, semaphores, and critical sections.
Uses HTTP, so any client can connect and client/server implementation is decoupled.
Simpler rollout and configuration than JMS servers.
Use existing database connectivity technologies (ODBC, Oracle DB connectivity, etc).
ODBC is relatively fast (read: it's native).
Asynchronous data collection for Internet traffic surges.
Choice of either periodically writing data to disk in BCP format for bulk insert into database, or single INSERTs and/or groups of inserts over ODBC, etc (single inserts are O[500/sec] or more ... bulk inserts are O[10k/sec] or more).
Queueing of data is limited by local disk space, not RAM.
If the same module is used for both data distribution and data collection, the asynchronous gap has been narrowed such that the module can report status of DB server to clients (e.g.: shut down services, stop collecting data, accumulate data to secondary storage, etc) in real time.

Cons:

Custom code means development time.
C++ is generally more difficult to write than Java.
Single point of failure (if a single web server is used).
No reporting/updating of data cached within ISAPI/NSAPI module without custom code.
Messaging between client and server is asynchronous (this is possibly desired).
Messages are not persistent in a relational database server that supports transactions.

CONCLUSION

JMS can be a real win from a scalability, caching, and avoiding-vendor-lock-in standpoint. More specific to the system and/or application is how JMS relates to areas such as performance, security, processing order, and data integrity. One thing that is well understood: integrating JMS into a distributed system is a very complicated process.

It is because of this complexity that JMS should be avoided unless it's benefits far outweight it's liabilities. In making this determination, your systems analysis should focus on both the features of JMS that are system-independent, as well as the features that relate specifically to the requirements of your system. Consider the areas of your system that relate to scalability, caching, security, and performance.

It is my experience that a careful systems analysis such as this yields that the profound majority of JMS implementations result in an inappropriate application of technology that results in overly-complex and over-engineered systems. If your analysis suggests that you could benefit from caching, performance, and scalability from multiple JMS servers, JMS may be appropriate for your system. However, there are many alternate strategies that are much simpler than JMS and can provide the requisite layer of abstraction between client and server while taking advantage of ubiquitous protocols like HTTP and data description languages like XML, as well as providing the scalibility and synchronous/ asynchronous communication desired.

Understanding JMS and message-processing basics, coupled together with a careful systems analysis will allow Java developers to evaluate the trade-offs associated with either the application of JMS within their systems or eliminating this message-processing layer altogether to pursue alternate strategies. Once these tradeoffs are well understood, developers will be on the way to making an informed decision regarding system design and architecture for their distributed software systems.