Welcome!

IBM Cloud Authors: Elizabeth White, XebiaLabs Blog, Liz McMillan, Dana Gardner, Pat Romanski

Blog Feed Post

Benchmarking RRE and SAS

by Thomas Dinsmore Regular readers of this blog may be familiar with our ongoing effort to benchmark Revolution R Enterprise (RRE) across a range of use cases and on different platforms.  We take these benchmarks seriously at Revolution Analytics, and constantly seek to improve the performance of our software.  Previously, we shared results from a performance test conducted by Allstate.   In that test, RRE ran a GLM analysis in five minutes; SAS took five hours to complete the same task.  A reader objected that the test was unfair because SAS ran on a single machine, while RRE ran on a five node cluster.  It's a fair point, except that given the software in question (PROC GLM in SAS/STAT) the performance would be the same on five nodes or a million nodes, since PROC GLM can scale up but not out. Arguing that the Allstate benchmark was "apples to oranges", SAS responded by publishing its own apples to orange benchmark.  In this benchmark, SAS demonstrated that its new HPGENSELECT procedure is very fast when it runs on a 144 node grid with 2,304 cores.  As noted in the paper, this performance is only possible if you license more software, since HPGENSELECT can only run in Distributed mode if the customer licenses SAS High Performance Statistics. We will be happy to stipulate that PROC HPGENSELECT runs faster on 2,304 cores than RRE on 20 cores. As a matter of best practices, software benchmarks should run in comparable hardware environments, so that we can attribute performance differences to the software alone and not to differences in available computing resources.   Consequently, we engaged an outside vendor with experience running SAS in clustered environments to perform an "apples to apples" benchmark of RRE vs. SAS.  The consultant used a clustered computing environment consisting of five four-core commodity servers (with 16G RAM each) running CentOS, Ethernet connections and a separate NFS Server. We tested RRE 7 versus SAS Release 9.4, with Base SAS, SAS/STAT and SAS Grid Manager.  (We did not test with SAS High Performance Statistics because we could find no vendors with experience using this new software.  We note that more than two years into General Availability, SAS appears to have no public reference customers for this software.)  In our experience, when customers ask how we perform compared to SAS, they are most interested in how we compare with the SAS software they already use. To test Revolution R Enterprise ScaleR, we first deployed IBM Platform LSF and Platform MPI Release 9 on the grid, then installed Revolution R Enterprise Release 7 on each node.  SAS Grid Manager uses an OEM version of IBM Platform LSF that cannot run concurrently with the standard version from IBM, so we configured the environment and ran the tests sequentially.   To simplify test replication across different environments, we used data manufactured through a random process.  The time needed to manufacture the data is not included in the benchmark results.  Prior to running the actual tests, we loaded the randomized data into each software product’s native file system: for SAS, a SAS Data Set; for Revolution R Enterprise, an XDF file. Although we have benchmarked Revolution R Enterprise on data sets as large as a billion rows, typical data sets used by even the largest enterprises tend to be much smaller.  We chose to perform the tests on wide files of 591 columns and row counts ranging from 100,000 to 5,000,000, file sizes that represent what we consider to be typical for many analysts.  We also ran scoring tests on “narrow” files of 21 columns with row counts ranging up to 50,000,000. Rather than comparing performance on a single task, we prepared a list of multiple tasks, then wrote programs in SAS and RRE to implement the test.  Readers will find the benchmarking scripts here, on Git together with a script to produce the manufactured data. To implement a fair test, we asked the SAS consultant to review the SAS programs and enable them for best performance in the clustered computing environment. Detailed results of the benchmark test are shown here, in our published white paper RRE ran the tasks forty-two times faster than SAS on the larger data set RRE outperformed SAS on every task The RRE performance advantage ranged from 10X to 300X The RRE advantage increased when we tested on larger data sets SAS’ new HP PROC, where available, only marginally improved SAS performance We invite readers to use the scripts in your own environment; let us know the results you achieve. Revolution Analytics Whitepapers: Revolution R Enterprise: Faster Than SAS 

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
Machine Learning helps make complex systems more efficient. By applying advanced Machine Learning techniques such as Cognitive Fingerprinting, wind project operators can utilize these tools to learn from collected data, detect regular patterns, and optimize their own operations. In his session at 18th Cloud Expo, Stuart Gillen, Director of Business Development at SparkCognition, discussed how research has demonstrated the value of Machine Learning in delivering next generation analytics to imp...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develo...
The IETF draft standard for M2M certificates is a security solution specifically designed for the demanding needs of IoT/M2M applications. In his session at @ThingsExpo, Brian Romansky, VP of Strategic Technology at TrustPoint Innovation, explained how M2M certificates can efficiently enable confidentiality, integrity, and authenticity on highly constrained devices.
In today's uber-connected, consumer-centric, cloud-enabled, insights-driven, multi-device, global world, the focus of solutions has shifted from the product that is sold to the person who is buying the product or service. Enterprises have rebranded their business around the consumers of their products. The buyer is the person and the focus is not on the offering. The person is connected through multiple devices, wearables, at home, on the road, and in multiple locations, sometimes simultaneously...
Basho Technologies has announced the latest release of Basho Riak TS, version 1.3. Riak TS is an enterprise-grade NoSQL database optimized for Internet of Things (IoT). The open source version enables developers to download the software for free and use it in production as well as make contributions to the code and develop applications around Riak TS. Enhancements to Riak TS make it quick, easy and cost-effective to spin up an instance to test new ideas and build IoT applications. In addition to...
Identity is in everything and customers are looking to their providers to ensure the security of their identities, transactions and data. With the increased reliance on cloud-based services, service providers must build security and trust into their offerings, adding value to customers and improving the user experience. Making identity, security and privacy easy for customers provides a unique advantage over the competition.
CenturyLink has announced that application server solutions from GENBAND are now available as part of CenturyLink’s Networx contracts. The General Services Administration (GSA)’s Networx program includes the largest telecommunications contract vehicles ever awarded by the federal government. CenturyLink recently secured an extension through spring 2020 of its offerings available to federal government agencies via GSA’s Networx Universal and Enterprise contracts. GENBAND’s EXPERiUS™ Application...
"We've discovered that after shows 80% if leads that people get, 80% of the conversations end up on the show floor, meaning people forget about it, people forget who they talk to, people forget that there are actual business opportunities to be had here so we try to help out and keep the conversations going," explained Jeff Mesnik, Founder and President of ContentMX, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
I wanted to gather all of my Internet of Things (IOT) blogs into a single blog (that I could later use with my University of San Francisco (USF) Big Data “MBA” course). However as I started to pull these blogs together, I realized that my IOT discussion lacked a vision; it lacked an end point towards which an organization could drive their IOT envisioning, proof of value, app dev, data engineering and data science efforts. And I think that the IOT end point is really quite simple…
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 19th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world and ThingsExpo Silicon Valley Call for Papers is now open.
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
WebRTC is bringing significant change to the communications landscape that will bridge the worlds of web and telephony, making the Internet the new standard for communications. Cloud9 took the road less traveled and used WebRTC to create a downloadable enterprise-grade communications platform that is changing the communication dynamic in the financial sector. In his session at @ThingsExpo, Leo Papadopoulos, CTO of Cloud9, discussed the importance of WebRTC and how it enables companies to focus...
"My role is working with customers, helping them go through this digital transformation. I spend a lot of time talking to banks, big industries, manufacturers working through how they are integrating and transforming their IT platforms and moving them forward," explained William Morrish, General Manager Product Sales at Interoute, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
For basic one-to-one voice or video calling solutions, WebRTC has proven to be a very powerful technology. Although WebRTC’s core functionality is to provide secure, real-time p2p media streaming, leveraging native platform features and server-side components brings up new communication capabilities for web and native mobile applications, allowing for advanced multi-user use cases such as video broadcasting, conferencing, and media recording.
SYS-CON Events announced today that Venafi, the Immune System for the Internet™ and the leading provider of Next Generation Trust Protection, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Venafi is the Immune System for the Internet™ that protects the foundation of all cybersecurity – cryptographic keys and digital certificates – so they can’t be misused by bad guys in attacks...
ReadyTalk has expanded the capabilities of the FoxDen collaboration platform announced late last year to include FoxDen Connect, an in-room video collaboration experience that launches with a single touch. With FoxDen Connect, users can now not only engage in HD video conferencing between iOS and Android mobile devices or Chrome browsers, but also set up in-person meeting rooms for video interactions. A host’s mobile device automatically recognizes the presence of a meeting room via beacon tech...
The cloud market growth today is largely in public clouds. While there is a lot of spend in IT departments in virtualization, these aren’t yet translating into a true “cloud” experience within the enterprise. What is stopping the growth of the “private cloud” market? In his general session at 18th Cloud Expo, Nara Rajagopalan, CEO of Accelerite, explored the challenges in deploying, managing, and getting adoption for a private cloud within an enterprise. What are the key differences between wh...
It’s 2016: buildings are smart, connected and the IoT is fundamentally altering how control and operating systems work and speak to each other. Platforms across the enterprise are networked via inexpensive sensors to collect massive amounts of data for analytics, information management, and insights that can be used to continuously improve operations. In his session at @ThingsExpo, Brian Chemel, Co-Founder and CTO of Digital Lumens, will explore: The benefits sensor-networked systems bring to ...