Click here to close now.

Welcome!

Websphere Authors: AppDynamics Blog, Ian Khan, Harry Trott, Liz McMillan, Elizabeth White

Blog Feed Post

Benchmarking RRE and SAS

by Thomas Dinsmore Regular readers of this blog may be familiar with our ongoing effort to benchmark Revolution R Enterprise (RRE) across a range of use cases and on different platforms.  We take these benchmarks seriously at Revolution Analytics, and constantly seek to improve the performance of our software.  Previously, we shared results from a performance test conducted by Allstate.   In that test, RRE ran a GLM analysis in five minutes; SAS took five hours to complete the same task.  A reader objected that the test was unfair because SAS ran on a single machine, while RRE ran on a five node cluster.  It's a fair point, except that given the software in question (PROC GLM in SAS/STAT) the performance would be the same on five nodes or a million nodes, since PROC GLM can scale up but not out. Arguing that the Allstate benchmark was "apples to oranges", SAS responded by publishing its own apples to orange benchmark.  In this benchmark, SAS demonstrated that its new HPGENSELECT procedure is very fast when it runs on a 144 node grid with 2,304 cores.  As noted in the paper, this performance is only possible if you license more software, since HPGENSELECT can only run in Distributed mode if the customer licenses SAS High Performance Statistics. We will be happy to stipulate that PROC HPGENSELECT runs faster on 2,304 cores than RRE on 20 cores. As a matter of best practices, software benchmarks should run in comparable hardware environments, so that we can attribute performance differences to the software alone and not to differences in available computing resources.   Consequently, we engaged an outside vendor with experience running SAS in clustered environments to perform an "apples to apples" benchmark of RRE vs. SAS.  The consultant used a clustered computing environment consisting of five four-core commodity servers (with 16G RAM each) running CentOS, Ethernet connections and a separate NFS Server. We tested RRE 7 versus SAS Release 9.4, with Base SAS, SAS/STAT and SAS Grid Manager.  (We did not test with SAS High Performance Statistics because we could find no vendors with experience using this new software.  We note that more than two years into General Availability, SAS appears to have no public reference customers for this software.)  In our experience, when customers ask how we perform compared to SAS, they are most interested in how we compare with the SAS software they already use. To test Revolution R Enterprise ScaleR, we first deployed IBM Platform LSF and Platform MPI Release 9 on the grid, then installed Revolution R Enterprise Release 7 on each node.  SAS Grid Manager uses an OEM version of IBM Platform LSF that cannot run concurrently with the standard version from IBM, so we configured the environment and ran the tests sequentially.   To simplify test replication across different environments, we used data manufactured through a random process.  The time needed to manufacture the data is not included in the benchmark results.  Prior to running the actual tests, we loaded the randomized data into each software product’s native file system: for SAS, a SAS Data Set; for Revolution R Enterprise, an XDF file. Although we have benchmarked Revolution R Enterprise on data sets as large as a billion rows, typical data sets used by even the largest enterprises tend to be much smaller.  We chose to perform the tests on wide files of 591 columns and row counts ranging from 100,000 to 5,000,000, file sizes that represent what we consider to be typical for many analysts.  We also ran scoring tests on “narrow” files of 21 columns with row counts ranging up to 50,000,000. Rather than comparing performance on a single task, we prepared a list of multiple tasks, then wrote programs in SAS and RRE to implement the test.  Readers will find the benchmarking scripts here, on Git together with a script to produce the manufactured data. To implement a fair test, we asked the SAS consultant to review the SAS programs and enable them for best performance in the clustered computing environment. Detailed results of the benchmark test are shown here, in our published white paper RRE ran the tasks forty-two times faster than SAS on the larger data set RRE outperformed SAS on every task The RRE performance advantage ranged from 10X to 300X The RRE advantage increased when we tested on larger data sets SAS’ new HP PROC, where available, only marginally improved SAS performance We invite readers to use the scripts in your own environment; let us know the results you achieve. Revolution Analytics Whitepapers: Revolution R Enterprise: Faster Than SAS 

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
The WebRTC Summit 2015 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
Recent technology advances in miniaturization has positioned the wearables as the pinnacle of technology convergence with the human body. We inquire if wearables are mere standard miniaturized devices extended with the connectivity and present our views on considerations like design, applications, performance, efficiency, interoperability, usage scenarios, human device interaction and consequent trade-offs enabling wearables to impart optimal value.
In this session we look at creating interactive communications via the web by adding messaging, file transfer, and group communication (group chat and audio/video conferencing) into the web experience. We will also discuss potential applications of this technology in areas including B2B, B2C, P2P, and gaming. Peter is Technical Director at Acision. He graduated from The University of Edinburgh in 2000 with a BSc (Hons) in Computer Science. After graduation Peter worked on a PSTN switch developing signalling stacks for SS7, ISDN and similar protocols and creating advanced routing and serv...
The Internet of Things Maturity Model (IoTMM) is a qualitative method to gauge the growth and increasing impact of IoT capabilities in an IT environment from both a business and technology perspective. In his session at @ThingsExpo, Tony Shan will first scan the IoT landscape and investigate the major challenges and barriers. The key areas of consideration are identified to get started with IoT journey. He will then pinpoint the need of a tool for effective IoT adoption and implementation, which leads to IoTMM in which five maturity levels are defined: Advanced, Dynamic, Optimized, Primitive,...
SYS-CON Events announced today that AIC, a leading provider of OEM/ODM server and storage solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. AIC is a leading provider of both standard OTS, off-the-shelf, and OEM/ODM server and storage solutions. With expert in-house design capabilities, validation, manufacturing and production, AIC's broad selection of products are highly flexible and are configurable to any form factor or custom configuration. AIC leads the industry with nearly 20 years of ...
SYS-CON Events announced today that Vicom Computer Services, Inc., a provider of technology and service solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. They are located at booth #427. Vicom Computer Services, Inc. is a progressive leader in the technology industry for over 30 years. Headquartered in the NY Metropolitan area. Vicom provides products and services based on today’s requirements around Unified Networks, Cloud Computing strategies, Virtualization around Software defined Data Ce...
As we approach the next @ThingsExpo, to be held June 9-11 at the Javits Center in New York, my thoughts naturally turn to the Internet of Things. The IoT is a leviathan—in the best possible sense of the term—that will sweep up most everything in the ocean of data and technology being created today and tomorrow. But rather than try to grasp all of its possible uses, for today I'm looking at “just” the Industrial Internet part. I just read a long paper co-authored by Tim Berners-Lee about the possibility of describing a “web science,” that is, discipline that combines the study involved ...
The best mobile applications are augmented by dedicated servers, the Internet and Cloud services. Mobile developers should focus on one thing: writing the next socially disruptive viral app. Thanks to the cloud, they can focus on the overall solution, not the underlying plumbing. From iOS to Android and Windows, developers can leverage cloud services to create a common cross-platform backend to persist user settings, app data, broadcast notifications, run jobs, etc. This session provides a high level technical overview of many cloud services available to mobile app developers, includi...
How is unified communications transforming the way businesses operate? In his session at WebRTC Summit, Arvind Rangarajan, Director of Product Marketing at BroadSoft, will discuss how to extend unified communications experience outside the enterprise through WebRTC. He will also review use cases across different industry verticals. Arvind Rangarajan is Director, Product Marketing at BroadSoft. He has over 19 years of experience in the telecommunications industry in various roles such as Software Development, Product Management and Product Marketing, applied across Wireless, Unified Communic...
Enterprise IoT is an exciting and chaotic space with a lot of potential to transform how the enterprise resources are managed. In his session at @ThingsExpo, Hari Srinivasan, Sr Product Manager at Cisco, will describe the challenges in enabling mass adoption of IoT, and share perspectives and insights on architectures/standards/protocols that are necessary to build a healthy ecosystem and lay the foundation to for a wide variety of exciting IoT use cases in the years to come.
The IoT Bootcamp is coming to Cloud Expo | @ThingsExpo on June 9-10 at the Javits Center in New York. Instructor. Registration is now available at http://iotbootcamp.sys-con.com/ Instructor Janakiram MSV previously taught the famously successful Multi-Cloud Bootcamp at Cloud Expo | @ThingsExpo in November in Santa Clara. Now he is expanding the focus to Janakiram is the founder and CTO of Get Cloud Ready Consulting, a niche Cloud Migration and Cloud Operations firm that recently got acquired by Aditi Technologies. He is a Microsoft Regional Director for Hyderabad, India, and one of the f...
SYS-CON Events announced today that B2Cloud, a provider of enterprise resource planning software, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. B2cloud develops the software you need. They have the ideal tools to help you work with your clients. B2Cloud’s main solutions include AGIS – ERP, CLOHC, AGIS – Invoice, and IZUM
SYS-CON Events announced today that MangoApps will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY., and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. MangoApps provides private all-in-one social intranets allowing workers to securely collaborate from anywhere in the world and from any device. Social, mobile, and easy to use. MangoApps has been named a "Market Leader" by Ovum Research and a "Cool Vendor" by Gartner...
SYS-CON Media announced today that @ThingsExpo Blog launched with 7,788 original stories. @ThingsExpo Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @ThingsExpo Blog can be bookmarked. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago.
The world's leading Cloud event, Cloud Expo has launched Microservices Journal on the SYS-CON.com portal, featuring over 19,000 original articles, news stories, features, and blog entries. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. Microservices Journal offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. Follow new article posts on Twitter at @MicroservicesE
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
Wearable technology was dominant at this year’s International Consumer Electronics Show (CES) , and MWC was no exception to this trend. New versions of favorites, such as the Samsung Gear (three new products were released: the Gear 2, the Gear 2 Neo and the Gear Fit), shared the limelight with new wearables like Pebble Time Steel (the new premium version of the company’s previously released smartwatch) and the LG Watch Urbane. The most dramatic difference at MWC was an emphasis on presenting wearables as fashion accessories and moving away from the original clunky technology associated with t...
SYS-CON Events announced today that Litmus Automation will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Litmus Automation’s vision is to provide a solution for companies that are in a rush to embrace the disruptive Internet of Things technology and leverage it for real business challenges. Litmus Automation simplifies the complexity of connected devices applications with Loop, a secure and scalable cloud platform.
As Marc Andreessen says software is eating the world. Everything is rapidly moving toward being software-defined – from our phones and cars through our washing machines to the datacenter. However, there are larger challenges when implementing software defined on a larger scale - when building software defined infrastructure. In his session at 16th Cloud Expo, Boyan Ivanov, CEO of StorPool, will provide some practical insights on what, how and why when implementing "software-defined" in the datacenter.
There is no doubt that Big Data is here and getting bigger every day. Building a Big Data infrastructure today is no easy task. There are an enormous number of choices for database engines and technologies. To make things even more challenging, requirements are getting more sophisticated, and the standard paradigm of supporting historical analytics queries is often just one facet of what is needed. As Big Data growth continues, organizations are demanding real-time access to data, allowing immediate and actionable interpretation of events as they happen. Another aspect concerns how to deliver ...