Welcome!

Websphere Authors: Shelly Palmer, Bob Gourley, Gilad Parann-Nissany, Liz McMillan, Yeshim Deniz

Related Topics: Java, SOA & WOA, Websphere

Java: Article

WebSphere eXtreme Scale Design and Performance Considerations

Elastic and Scalable infrastructure

Fundamentals: How does WXS solve the Scalability problem?
Understanding Scalability

In understanding the scalability challenge addressed by WebSphere eXtreme

Scale, let us first define and understand scalability.

Wikipedia defines scalability as a "desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged. For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added."

  • Scalability in a system is about the ability to do more, whether it is processing more data or handling more traffic, resulting in higher transactions
  • scalability poses great challenges to database and transaction systems
  • An increase in data can expose demand constraints on back-end database servers
  • This can be a very expensive and short term approach to solving the problem of processing ever growing data and transactions

At some point, either due to practical, fiscal or physical limits, enterprises are unable to continue to "scale out" by simply adding hardware. The progressive approach then adopted is to "scale out" by adding additional database servers and using a high speed connection between the database servers to provide a fabric of database servers. This approach while viable, poses some challenges around keeping the databases servers synchronized. It is important to ensure that the databases are kept in sync for data integrity and crash recovery.

Solution: WebSphere eXtreme Scale
WebSphere eXtreme Scale compliments the database layer to provide a fault tolerant, highly available and scalable data layer that addresses the growing concern around the data and eventually the business.

  • Scalability is never an IT problem alone. It directly impacts the business applications and the business unit that owns the applications.
  • Scalability is treated as a competitive advantage.
  • The applications that are scalable can easily accommodate growth and aid

The business functions in analysis and business development.

WebSphere eXtreme Scale provides a set of interconnected java processes that holds the data in memory, thereby acting as shock absorbers to the back end databases. This not only enabled faster data access, as the data is accessed from memory, but also reduces the stress on database.

Design Approach:
This short paper attempts to serve as checklist and is designed for clients and professional community that use or are considering to use WebSphere eXtreme Scale as a elastic, scalable in memory data cache, and who are interested in implementing a highly available and scalable e-business infrastructure using the IBM WebSphere eXtreme Scale (WXS). Through WebSphere eXtreme Scale, customers can postpone or virtually eliminate costs associated with upgrading more expensive, heavily loaded back-end database and transactional systems, while meeting the high availability and scalability requirements for today's environments. While not an exhaustive list, this paper includes primarily the infrastructure planning requirements of WXS environment.

This document is broken into two sections:

  1. Application Design Discussion: This section is important and should be a considered when discussing application design. The intent of this section is to discuss architectural implications of including a WXS grid as a part of the application design.
  1. Layered Approach to WXS environment performance tuning: This is a recommended approach for WXS implementation. The approach can be implemented top to bottom or bottoms-up. We usually recommend a tom-to-bottom approach, simply due to control boundaries around middleware infrastructure.

1. Application Design Discussion:
Part of application design and consideration is understanding various WXS components. This is an important exercise as this provides insights into performance tuning and application design considerations discussed in this section. The idea is to implement a consistent tuning methodology during operations and apply appropriate application design principles during the design of the WXS application. This is an important distinction, as tuning will not be of much help during operational runtime if the application design is inadequate to achieve scalability. It is therefore much more important to spend sufficient time in application design, which will lead to significantly less effort in performance tuning. A typical WXS application includes the following components:

a. WXS Client - The entity that interacts with the WXS server. It is a JVM runtime with ORB communications to the WXS grid containers. Can be a JEE application hosted in WAS runtime of standalone IBM JVM.

b. WXS Grid Server - An entity that stored java objects/data. It is a JVM runtime with ORB communication to the other WXS grid containers. Can be hosted in a WAS ND cell or stand alone interconnected JVMs.

c. WXS Client loader (optional for bulk pre-load): A client loader which pre-loads the data (can be in bulk fashion) into the grid. It is a JVM runtime with ORB communication to WXS grid containers. The client loaders pre-load the data and push it to the grid servers, this activity happens at regular intervals.

d. Back-end database - A persistent data store such as a back end database including DB2, Oracle etc.

(Note: please see General performance Principles for general performance guidelines)

 

WXS_Components

 

Discussed below are top 10 IMDG application design considerations:

I. Understand Data Access and Granularity of data model

a. JDBC

b. ORM ( JPA,Hibernate etc)

i. Fetch - Join

ii. Fetch batch size

c. EJB ( CMP,BMP, JPA)

 

II. Understand Transaction management requirements

a. XA -2PC – Impact on latency and performance

b. JMS

c. Compensation

 

III. Ascertain stateful vs. Stateless

a. Stateless – more apt for IMDG

b. Stateful – determine the degree of state to be maintained.

IV. Application data design ( data and Object Model) – CTS and De-normalized data

a. CTS – Constrained Tree Schema: The CTS schemas also don’t have references to other root entities. Each customer is independent of all other customers. The same behavior applies to users. This type of schema lends itself to partitioning. These are applications that use constrained tree schemas and only execute transactions that use a single root entity at a time. This means that transactions don’t span a partition and complex protocols such as two-phase commit are not needed. A one phase or native transaction is enough to work with a single root entity given it is fully contained within a single transaction.

b. De-normalized data : The data de-normalization, although done by adding redundant data. WXS (IMDG) ability to support ultra high scalability depends on uniformly partitioning data and spreading the partitions across machines. Developing scalable applications accessing partitioned data demands a paradigm shift in programming discipline. De-normalization of data, creation of application specific and non-generic data models, avoidance of complex transactional protocols like 2 phase commit are some of the basic principles of this new programming methodology.

V. Distributing Sync object graphs across grid.

Synchronizing objects in a grid can results in many RPC calls the grid containers busy and impact performance and scalability.

VI. Single User Decoupled system

a. Typically Single use decoupled system are designed with stateless application in mind.

b. Unlike stateful enterprise systems which may limit scalability due to number of factors such as number of resources, operations, cluster services, data synchronization etc.

c. Every application system is single function and is usually co-located with the data.

VII. Invasive vs. Non-Invasive change to IMDG

a. Test! Test! Test!

b. Invasive application changes include change in data access and data model to fit IMDG/XTP type scenario. Such changes are expensive, error prone and less like to adapt IMDG solutions in immediate future. In such cases the IMDG adoption will be a long term approach

c. Non-Invasive application includes easy plug ability into WXS with little or no code change and such application changes require no change to application data access or data model. These are low hanging fruits and more readily receptive to WXS solutions.

VIII. Data Partitioning

a. Data partitioning is a formal process of determining which data or sub set of data are needed to be contained in a WXS data partition or shard.

b. Design with data density in mind

c. Data Partitioning will assist in planning for growth.

IX. Data Replication and availability

a. In synchronous data replication a put request from a process will block all other processes access to the cache until it successfully replicates the data change to all other processes that use the cache. You can view in a term of a database transaction. It will update this process’s cache and propagate the data modification to the other processes in the same unit of work. This would be the ideal mode of operation because it means that all the processes see the same data in the cache and no ever gets stale data from the cache. However it’s likely that in a case of a distributed cache, the processes live on different machines connected through a network, the fact that a write request in one process will block all other reads from the cache this method may not be considered efficient. Also all involved processes must acknowledge the update before the lock is released. Caches are supposed to be fast and network I/O is not, not to mention prone to failure so maybe not wise to be very confident that all the participants are in sync, unless you have some mechanism of failure notification. Advantages : data kept in sync

Disadvantages : network I/O is not fast and is prone to failure

b. In contrary, the asynchronous data replication method does not propagate an update to the other processes in the same transaction. Rather, the replication messages are sent to the other processes at some time after the update of one of the process’s cache. This could be implemented for example as another background thread that periodically wakes and sends the replication messages from a queue to the other processes. This means that an update operation on a process to its local cache will finish very fast since it will not have to block until it receives an acknowledgment of the update from the other processes. If a peer process is not responding to a replication message, how about retrying later, but in no way hinder or block the other processes. Advantages : Updates do not generate long blocks across processes. Simpler to deal with, for example in case of network failure maybe resend the modification .Disadvantages : Data may not be in sync across processes

X. Cache (grid) pre-load :

a. Grid pre-load is an essential consideration with business requirement in mind. The reason to move to WXS or IMDG solution is to have the ability to access massive amounts of data which is transparent to end user application. Grid pre-load strategies become vital.

b. Server side Pre load : Partition specific load, dependent on data model and is complex.

c. Client side pre-load : Easy, but preload is not as fast, as DB becomes a bottleneck, so this takes longer

d. Range based multiple clients preload : Multiple clients in different systems do a range based client preload to warm the grid.

 

WXS Client Interaction

  1. Layered approach to Performance Tuning:

As discussed earlier this is usually an approach at WXS implementation, the approach can be top to bottom or bottoms-up. We usually recommend a top-to-bottom approach, simply due to control boundaries around middleware infrastructure.

WXS Layered Tuning

Figure - WXS Layered Tuning approach

This approach adds structure to the tuning process, it also helps eliminate layers in problem determination process. Applying the ‘top-to-bottom’ approach, enabled the administrators to inspect various tiers involved and methodically isolate the layer(s) responsible for performance degradation. Short description of layers is described below:

I. ObjectGrid.xml file:

A deployment policy descriptor XML file is passed to an ObjectGrid container server during start-up. This file ( in conjunction with ObjectGrid.xml file) defined the grid policy such as a replication policy ( which has impact on grid performance), shard placement etc. It is vital to defined policies that are aligned with business goals, and to discuss the performance and sizing implication during design and planning process.

II. WebSphere Turning ( if grid servers use WAS runtime): Standard WAS tuning related to JVM such as GC policy, heap limits apply. Important consideration is to factor in the WAS footprint in estimating overall grid size.

III. ORB Tuning:

    1. The ORB is used by WXS to communicate over a TCP stack. The necessary orb.properties file is in the java/jre/lib directory.
    2. The orb.properties file is used to pass the properties used by the ORB to modify the transport behavior of the grid. The following settings are a good baseline but not necessarily the best settings for every environment. The descriptions of the settings should be understood to help make a good decision on what values are appropriate in your environment. Note that when the orb.properties file is modified in a WebSphere Application Server java/jre/lib directory, the application servers configured under that installation will use the settings.

com.ibm.CORBA.RequestTimeout=30

com.ibm.CORBA.ConnectTimeout=10

com.ibm.CORBA.FragmentTimeout=30

com.ibm.CORBA.ThreadPool.MinimumSize=256

com.ibm.CORBA.ThreadPool.MaximumSize=256

com.ibm.CORBA.ThreadPool.IsGrowable=false

com.ibm.CORBA.ConnectionMultiplicity=1

com.ibm.CORBA.MinOpenConnections=1024

com.ibm.CORBA.MaxOpenConnections=1024

com.ibm.CORBA.ServerSocketQueueDepth=1024

com.ibm.CORBA.FragmentSize=0

com.ibm.CORBA.iiop.NoLocalCopies=true

com.ibm.CORBA.NoLocalInterceptors=true

Request Timeout

The com.ibm.CORBA.RequestTimeout property is used to indicate how many seconds any request should wait for a response before giving up. This property influences the amount of time a client will take to failover in the event of a network outage type of failure. Setting this property too low may result in inadvertent timeout of valid requests. So care should be taken when determining a correct value.

Connect Timeout

The com.ibm.CORBA.ConnectTimeout property is used to indicate how many seconds a socket connection attempt should wait before giving up. This property, like the request timeout, can influence the time a client will take to failover in the event of a network outage type of failure. This property should generally be set to a smaller value than the request timeout as establishing connections should be relatively time constant.

Fragment Timeout

The com.ibm.CORBA.FragmentTimeout property is used to indicate how many seconds a fragment request should wait before giving up. This property is similar to the request timeout in effect.

Thread Pool Settings

These properties constrain the thread pool to a specific number of threads. The threads are used by the ORB to spin off the server requests after they are received on the socket. Setting these too small will result in increased socket queue depth and possibly timeouts.

Connection Multiplicity

The connection multiplicity argument allows the ORB to use multiple connections to any server. In theory this should promote parallelism over the connections. In practice

ObjectGrid performance does not benefit from setting the connection multiplicity and we do not currently recommend using this parameter.

Open Connections

The ORB keeps a cache of connection established with clients. These connections may be purged when the max open connections value is passed. This may cause poor behavior in the grid.

Server Socket Queue Depth
The ORB queues incoming connections from clients. If the queue is full then connections will be refused. This may cause poor behavior in the grid.

Fragment Size

The fragment size property can be used to modify the maximum packet size that the ORB will use when sending a request. If a request is larger than the fragment size limit then that request will be chunked into request “fragments” each of which is sent separately and reassembled on the server. This is helpful on unreliable networks where packets may need to be resent but on reliable networks this may just cause overhead.

No Local Copies
The ORB uses pass by value invocation by default. This causes extra garbage and serialization costs to the path when an interface is invoked locally. Setting the com.ibm.CORBA.NoLocalCopies=true causes the ORB to use pass by reference which is more efficient.

No Local Interceptors
The ORB will invoke request interceptors even when making local requests (intra-process). The interceptors that WXS uses are not required in this case so these calls are unnecessary overhead. By setting the no local interceptors this path is more efficient.

I. JVM Tuning:

    1. GC Tuning : Analyze for optimum GC policy generational GC vs. Optthruput vs. optavgpause.
    1. 32 bit vs 64 bit :

Considerations:

1. IBM Java 6 SDK that was shipped with WAS V7 (and the most recent Sun Java 6 SDK that was shipped with fixpack 9 for V7) provide compressed references which significantly decrease the memory footprint overhead of 64-bit but don't eliminate it

2. There is not hard requirement for DMGR to be on 64bit when all of the Nodes/App servers are in 64 bit mode, but we strongly recommend ensuring that DMGR and nodes in a cell are all at same level. So if you decide to keep your grid at 64 bit level, please keep the DMGR also at the same level.

3. Depending on the OS 32-bit address spaces allow for heaps of ~1.8 GB to 3.2 GB as shown below

Bottom line, a comparison of 32-bit versus 64-bit is rather straightforward

a) 64-bit without compressed references takes significantly more physical memory than 32-bit

b) 64-bit with compressed references takes more physical memory than 32-bit

c) 64-bit performs slower 32-bit unless an application is computationally intensive which allows it to leverage 64-bit registers or a large heap allows one to avoid out of process calls for data access

d) JDK Compressed Reference: In WAS V7.0 we introduce compressed reference (CR) technology.  CR technology allows WAS 64-bit to allocate large heaps without the memory footprint growth and performance overhead.  Using CR technology instances can allocate heap sizes up to 28GB with similar physical memory consumption as an equivalent 32-bit deployment (btw, I am seeing more and more applications that fall into this category -- only "slightly larger" than the 32-bit OS process limit).  For applications with larger memory requirements, full 64-bit addressing will kick in as needed.   The CR technology allows your applications to use just enough memory and have maximum performance, no matter where along the 32-bit/64-bit address space spectrum your application falls

Memory Table

Figure - JVM heap memory table

    1. Threads : see ORB thread pool properties.
    1. ORB tuning : See ORB Tuning

 

I. Operating System ( including network) Tuning:

(Note: Tuning options for different operating systems may differ, concept remains the same)

Network tuning can reduce Transmission Control Protocol (TCP) stack delay by changing connection settings and can improve throughput by changing TCP buffers.

1. Example of AIX tuning:

a. TCP_KEEPINTVL

The TCP_KEEPINTVL setting is part of a socket keep-alive protocol that enables detection of network outage. It specifies the interval between packets that are sent to validate the connection. The recommended setting is 10.

To check the current setting

# no –o tcp_keepintvl

To change the current setting # no –o tcp_keepintvl=10

b. TCP_KEEPINIT

The TCP_KEEPINIT setting is part of a socket keep-alive protocol that enables detection of network outage. It specifies the initial timeout value for TCP connection. The recommended setting is 40.

To check the current setting # no –o tcp_keepinit

To change the current setting # no –o tcp_keepinit=40

c. Various TCP buffers such as : Network has a huge impact on performance it s hence vital to ensure that the OS specific properties are optimized :

i. tcp_sendspace

ii. tcp_recvspace

iii. send and recv buffers



General performance Principles to be aware of:

  1. Multi-JVM / Multi Thread - Pre-load
    1. Multiple Thread to query DB
    2. One thread defined record range from DB
    3. Implement thread pool – client loader side thread pool.
    4. Agent required (grid agent) for client pre-loader. – This agent communicated with the client loader for pre-load ONLY.

 

client preload

 

(Figure: Agent communication with client loader –pre-load)

  1. Query – Loader to DB
    1. One-to-many relationship – Lazy
    2. Many-to-Many – Eager

  1. Operational ‘churn’
    1. Impact of Teardown
    2. Impact of abrupt shutdown

 

  1. For complex object graphs
    1. NO JPA or JDBC loader
    2. Use custom loader
    3. Client load the data i.e pre-load the data into the grid and then grid operations is business as usual.
    4. After pre-load ( client based), the update to database is done by backing maps and loader plug-in.

 

  1. Consider Database tuning such as a DB buffer pools and RAMDisk
    1. Instrumental in preload performance is database is tuned.
    2. Consider Indexing – Index and Populate.

 

  1. CPU – Memory and Heap Consumption
    1. Consider number of threads, more number of threads higher the CPU consumption ( generally)
    2. When using multiple threads for client loaders, depending on number of records retrieved per thread, consider heap size of the client loader JVMs. Tune the threads per JVM accordingly. This is when you consider multi JVM multi threads option.

 

    1. The client loaders pre-load the data and push it to the grid servers, this activity happens at regular intervals, so we can expect to see a CPU spike ( due to network traffic and serialization) and gradual increase in JVM heap. The JVM heap will eventually level off as grid becomes stable.

 

    1. WXS Maintenance related issues:

i. GC takes too long:

1. can cause high CPU consumption

2. Marking JVM down, causing shard churn i.e replica to primary conversion and subsequent replica serialization – expensive process.

ii. Replication traffic :

1. shard churn i.e replica to primary conversion and subsequent replica serialization – expensive process.

2. Evaluate replication policy in objectgriddeployment.xml file. Or tune HA manager heartbeat and HA detection.

iii. CPU Starvation.:

1. Cause marking JVM/Host un-reachable triggering high availability mechanism.

2. Marking JVM down, causing shard churn i.e replica to primary conversion and subsequent replica serialization – expensive process.

3. Excessive GC often a culprit cause excessive shard churn.

 

Conclusion:

If Application design is faulty, then no amount of tuning will help. Hence recommendation to spend more time in design. Spending more time in planning your application design and infrastructure topology will not only lay the foundation for a more resilient infrastructure, but also enable application to get the most out of the elastic and scaleable infrastructure enabled by WebSphere eXtreme Scale.

 

More Stories By Nitin Gaur

Nitin Gaur, is currently working in capacity of Senior WebSphere Consulting IT Specialist with IBMs S&D Organization. Prior to teaming with IBM S&D organization, Nitin spend several years with WebSphere OEM team, a SWG entity and AIX support – ITS/IGS entity. In his 11 years with IBM he has achieved various industry recognized certifications and enriched his career by doing more than required by the defined job responsibilities. Prior to beginning his career with IBM, he was graduate student at University of Maryland University College. Apart from excelling in his normal job responsibilities, Nitin has been involved with many on going projects at IBM Austin. To name a few, Nitin has been an active member of Austin TVC – Technical Vitality Council, an IBM Academy affiliate since 2002.

As a technical leader Nitin has been involved in various technical paper presentations in various conferences at IBM and outside. The range of the topics presented by him span from software architectures to improvement of management processes. Nitin, has been focused on staying close to customer and always attuned to their needs and problems. One of his primary job responsibilities includes positioning WebSphere infrastructure products and providing technical solution and support to field sales teams. He is relentless in researching skills and presenting the industry best practices of IT Infrastructure. Performing advance technical research and providing IBM clients with strategic solutions on WebSphere offerings is one his forte.