|By David Smith
|September 19, 2012 11:00 AM EDT
This guest post is by Alex Guazzelli, VP of Analytics at Zementis Inc. -- ed.
Predictive Model Markup Language, is the de facto standard to represent predictive
analytics and data mining models. With PMML, it is extremely easy to move a
predictive solution from one system to another, since it avoids proprietary
issues and incompatibilities.
Companies around the globe are benefiting from PMML to
make instant use of their predictive solutions. With PMML, there is no
need for custom coding: you can easily move
your solution from the scientist’s desktop, where it was built, to the production
environment, where it is operationally deployed. Companies
also use PMML as the common language between service providers and external vendors.
In this way, it defines a single and clear process for the exchange of
predictive solutions. It becomes the bridge not only between data analysis,
model building, and deployment systems, but also between all the people and
teams involved in the analytical process. This is extremely important, since PMML
is used to disseminate knowledge and best practices, and to ensure
All the top analytical tools, commercial and open-source,
support PMML. And, the language itself has reached a great level of maturity
and refinement. PMML 4.1, its latest version, makes it extremely easy for
predictive solutions to be represented in an open and standard way. With PMML, you
can represent a myriad of pre- and post-processing steps, besides the
predictive modeling techniques per se. PMML 4.1 allows for multiple models
(model composition, chaining, segmentation, and ensemble, which includes random
forest models), to be represented by a single and concise language element. It
also allows for model outputs to be transformed into business decisions. Therefore,
a PMML file is able to represent the entire solution, from raw data to business
decision, with one or multiple predictive models.
of a standard such as PMML combined with scoring solutions in the cloud, for
Hadoop, and in-database make it possible for predictive analytics to fulfill
its promise and crack the big data code. Zementis, Inc. has been in the
forefront of PMML-based scoring, first through its ADAPA Scoring Engine, which
is available for on-site deployment or as a service on cloud (Amazon and IBM),
and lately through its Universal PMML Plug-in which is offered for a range of
databases and for Hadoop. Zementis has partnered with Revolution Analytics, so
that predictive solutions built in R can benefit from the vast scoring infrastructure
already in place. I am proud to be associated with Zementis and excited to be
part of an ever-growing PMML community.
A PMML package for R that exports all kinds of predictive
models is available directly from CRAN.
Traditionally, the PMML Package offered support for the
following data mining algorithms:
Support Vector Machines
rpart: C&RT Decision
lm & glm
(stats): Linear and Binary Logistic Regression Models
kmeans and hclust:
Recently, it has been expanded to support:
Multinomial Logistic Regression Models;
Generalized Linear Models for classification and regression with a wide variety
of link functions
Random Forest Models for classification and regression (click HERE for examples);
(randomSurvivalForest): Random Survival Forest Models;
this expansion is still on-going as the R community implements support for
other packages and techniques. For more on the PMML package, please take a look
at the paper we published with Graham Williams from Togaware in “The R Journal”.
For that just follow the link below:
PMML: An Open
Standard for Sharing Models
There may be quite a few reasons for you to move your
predictive solution from R to an independent deployment platform. Among them,
you may want parallel execution on big data or real-time scoring for
applications such as fraud detection or recommender systems. With PMML you can
easily move your model to the cloud or inside the database for scoring. Or,
even have it executed on Hadoop. It is really up to you! On top of that, PMML
allows for side-by-side deployment of predictive assets from R as well as other
commercial data mining tools, supporting a multi-vendor environment as well as
platform independent deployment.
More and more companies and individuals are using the PMML
standard for the obvious benefits it provides, putting their predictive
solutions on the fast track. With PMML, the speed of predictive solutions can
be on par with the speed of business.
Dr. Alex Guazzelli is the VP of Analytics
at Zementis Inc. where he is responsible for developing core technology and
predictive solutions under ADAPA, a PMML-based decisioning platform. With more
than 20 years of experience in predictive analytics, Dr. Guazzelli holds a PhD
in Computer Science from the University of Southern California and has co-authored
the book PMML
in Action: Unleashing the Power of Open Standards for Data Mining and
Predictive Analytics, now in its second edition (paperback and
kindle). You can follow him at @DrAlexGuazzelli.