Welcome!

IBM Cloud Authors: Zakia Bouachraoui, Pat Romanski, Elizabeth White, Liz McMillan, Yeshim Deniz

Blog Feed Post

R and Data Week 2013

by Joseph Rickert Data Week 2013 is being held this week in sunny San Francisco at the Fort Mason conference center overlooking the Bay. Holding a Bay Area R User Group Meeting (BARUG) at Data Week helped to raise the R consciousness among the hip conference crowd attracted by the intoxicating mix of blue skies, big data hype, startups and visionaries. The BARUG members, on the other hand, came mostly for the free beer and lightning talks. There were six, 12 minute talks with themes that ranged from basic R applications to using R to replace SAS in a big-league manufacturing process. Timothy Sweetser began the evening by showing the regression model he used to analyze BART fares. This was an elementary, but clever analysis of an everyday kind of question, the sort that briefly floats through your mind while you are buying a ticket: “How come this trip costs this much, but I paid a different amount last week for what seemed like a similar trip”. The plot below shows the strata in fares by distance as well as Timothy’s regression model. Utham Kamath described Mathpak, a new cloud based, platform for building collaborative analytical applications, marketing and monetizing them, and showed how R based applications would fit nicely into this scheme. It seemed to me that Utham and his fellow developers are envisioning an new “pick up game” kind of collaboration where developers from around the world will undertake serious projects that anyone of them alone would not have the resources to even contemplate. Clark Fitzgerald spoke about the favorable economics of running R in Amazon cloud (EC2) virtual machines. He compared serious computational hardware to tractors from the point of view that most people just rent tractors when they need to do the heavy lifting. He went on to make the case that the economics of cloud based computing are favorable for even relatively small projects involving teaching and automation. You don’t necessarily have to be working on some high performance computing project to see the benefits. Elaine Jones showed how her IBM tape storage manufacturing group achieved some serious cost cutting by replacing an expensive ($150K) SAS group license with R to do a number of ETL tasks that are fundamental to the production workflow. Critical tasks such as extracting raw data from DB2, summarizing it, formatting it and loading it into a different DB2 databases that used to take 30 or so SAS programs are now handled by R scripts. The following graph shows the production workflow and where R replaced SAS. For someone who blogs about R, it was really encouraging to hear that Elaine first heard about R in from reading the 2009 NY TImes article about R published in an internal INM webpage. Mathias Brandewinder talked about the new F# to R type provider, a kind of “bridge mechanism” for sharing data and resources between the two languages. Types enable R to be expressed as an F# resource. Now, F# users can call R from within the F# environment, and R developers can make use of F# in production code. Mathias gave very convincing live demo where working from his F# IDE he seemed to be mixing F# and R code on the fly to achieve an impressive level of integration. It was like watching a musician switch between instruments. Harrison Decker finished up the evening by describing how reproducible research tools in R are evolving to meet the needs of scientists and researchers. Reproducible research: Allows authors to reproduce the results and figures in their research publications Aids verification of results by other researchers  Allows researchers to learn from and build on the work of others Builds community Harrison very eloquently articulated one of the major strengths of R when he said, almost in passing: “R grows because people are building and sharing”. The slides from all of the presenters will be posted on the BARUG meetup website. Other R related activities include a well-attended R Bootcamp that was held on Tuesday, "The R Summit" a series of talks by Tess Nesbit of Data Song, Uday Tennety of Revolution Analytics, Ryan Walker of Blue Shield of California and Ryan White of A9, and a panel discussion "R means: Business", led by David Smith. The talks and panel discussion are taking place today.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

IoT & Smart Cities Stories
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...