Welcome!

IBM Cloud Authors: Elizabeth White, Pat Romanski, Carmen Gonzalez, Jnan Dash, Yeshim Deniz

Blog Feed Post

Hadoop & NoSQL – Friends, not frenemies (Published in SDTimes, January 7, 2014)

The term Big Data is an all-encompassing phrase that has various subdivisions addressing different needs of the customers. The most common description of Big Data talks about the four V’s: Volume, Velocity, Variety and Veracity.Volume represents terabytes to exabytes of data, but this is data at rest. Velocity talks about streaming data requiring milliseconds to seconds of response time and is about data in motion. Variety is about data in many forms: structured, unstructured, text, spatial, and multimedia. Finally, veracity means data in doubt arising out of inconsistencies, incompleteness and ambiguities.Hadoop is the first commercial version of Internet-scale supercomputing, akin to what HPC (high-performance computing) has done for the scientific community. It performs, and is affordable, at scale. No wonder it originated with companies operating at Internet scale, such as Yahoo in the 1990s, and then at Google, Facebook and Twitter.

In the scientific community, HPC was used for meteorology (weather simulation) and for solving engineering equations. Hadoop is used more for discovery and pattern matching. The underlying technology is similar: clustering, parallel processing and distributed file systems. Hadoop addresses the “volume” aspect of Big Data, mostly for offline analytics.

NoSQL products such as MongoDB address the “variety” aspect of Big Data: how to represent different data types efficiently with humongous read/write scalability and high availability for transactional systems operating in real time. The existing RDBMS solutions are inadequate to address this need with their schema rigidity and lack of scale-out solutions at low cost. Therefore, Hadoop and NoSQL are complementary in nature and do not compete at all.

Whether data is in NoSQL or RDBMS databases, Hadoop clusters are required for batch analytics (using its distributed file system and Map/Reduce computing algorithm). Several Hadoop solutions such as Cloudera’s Impala or Hortonworks’ Stinger, are introducing high-performance SQL interfaces for easy query processing.

Hadoop’s low cost and high efficiency has made it very popular. As an example, Sears’ process for analyzing marketing campaigns for loyalty club members used to take six weeks on mainframe, Teradata and SAS servers. The new process running on Hadoop can be completed weekly.
The Hadoop systems, at 200TB, cost about one-third of 200TB relational platforms. Mainframe costs have been reduced by more than US$500,000 per year while delivering 50x to 100x better performance on batch jobs. The volume of data on Hadoop is currently at 2PB. Sears uses Datameer, a spreadsheet-style tool that supports data exploration and visualization directly on Hadoop. It claims to develop interactive reports in three days, a process that use to take six to 12 weeks.

NoSQL products such as MongoDB are getting hugely popular in the developer community. They seamlessly blend with modern programming languages like JavaScript, Ruby and Python, thus imparting high coding velocity. This simplicity has made them very popular in a short amount of time.

With RDBMS, there was impedance mismatch when an object-oriented programming model had to map to the row-column structure of the database (like translating Swahili to French). The rich data model can handle varieties of data with full indexing and ad hoc query capabilities.

The other reason is its ability to scale horizontally over commodity servers and provide massively parallel processing. This aspect is similar to Hadoop’s distributed architecture. However, NoSQL has to deal with the operational aspects of production databases running on premise or in the cloud, whereas Hadoop basically operates in offline batch mode for analysis.

NoSQL is used by large enterprises to build “systems of engagement.” Enterprise IT has spent decades building “systems of record” to run their business—essentially technology that contains a database. Now, CIOs are under pressure to build systems of engagement in which the focus is on using modern technology and the Internet to better communicate internally and externally.

One such system of engagement was recently built at MetLife, the 145-year old insurance company. The goal was to provide a 360-degree view of the customer (switching from a policy-centric view to a customer-centric view), whose information was scattered across 20 legacy systems of record. This way, any agent at MetLife can get a complete picture of a customer’s activities using a mobile device, anytime, from anywhere.

The entire system was developed and deployed in three months using the MongoDB platform. The reasons for the rapid deployment were attributed to MongoDB’s flexible data model, linear scaling via its sharding architecture, high coding velocity, and iterative development using JSON.NoSQL and Hadoop have a peaceful coexistence. MongoDB, for example, offers a Hadoop connection pipe for easy movement of data between the two stores. Similarly, Oracle offers a connection for data movement between Hadoop and the Oracle DB. Future additions to Hadoop such as YARN and Tez are aimed at extending it for real-time data loading and queries, but not to solve the needs of mission-critical production systems (the domain of NoSQL).Jnan Dash is a technology visionary and executive consultant in Silicon Valley. He spent 10 years at Oracle and was the Group Vice President of Systems Architecture and Technology. Prior to joining Oracle, he spent 16 years at IBM in various positions, including in development of the DB2 family of products and leading IBM’s database architecture and technology efforts.


Read the original blog entry...

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

@ThingsExpo Stories
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
The Internet of Things (IoT) promises to simplify and streamline our lives by automating routine tasks that distract us from our goals. This promise is based on the ubiquitous deployment of smart, connected devices that link everything from industrial control systems to automobiles to refrigerators. Unfortunately, comparatively few of the devices currently deployed have been developed with an eye toward security, and as the DDoS attacks of late October 2016 have demonstrated, this oversight can ...
You have great SaaS business app ideas. You want to turn your idea quickly into a functional and engaging proof of concept. You need to be able to modify it to meet customers' needs, and you need to deliver a complete and secure SaaS application. How could you achieve all the above and yet avoid unforeseen IT requirements that add unnecessary cost and complexity? You also want your app to be responsive in any device at any time. In his session at 19th Cloud Expo, Mark Allen, General Manager of...
Bert Loomis was a visionary. This general session will highlight how Bert Loomis and people like him inspire us to build great things with small inventions. In their general session at 19th Cloud Expo, Harold Hannon, Architect at IBM Bluemix, and Michael O'Neill, Strategic Business Development at Nvidia, discussed the accelerating pace of AI development and how IBM Cloud and NVIDIA are partnering to bring AI capabilities to "every day," on-demand. They also reviewed two "free infrastructure" pr...
As data explodes in quantity, importance and from new sources, the need for managing and protecting data residing across physical, virtual, and cloud environments grow with it. Managing data includes protecting it, indexing and classifying it for true, long-term management, compliance and E-Discovery. Commvault can ensure this with a single pane of glass solution – whether in a private cloud, a Service Provider delivered public cloud or a hybrid cloud environment – across the heterogeneous enter...
"Dice has been around for the last 20 years. We have been helping tech professionals find new jobs and career opportunities," explained Manish Dixit, VP of Product and Engineering at Dice, in this SYS-CON.tv interview at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Extracting business value from Internet of Things (IoT) data doesn’t happen overnight. There are several requirements that must be satisfied, including IoT device enablement, data analysis, real-time detection of complex events and automated orchestration of actions. Unfortunately, too many companies fall short in achieving their business goals by implementing incomplete solutions or not focusing on tangible use cases. In his general session at @ThingsExpo, Dave McCarthy, Director of Products...
"ReadyTalk is an audio and web video conferencing provider. We've really come to embrace WebRTC as the platform for our future of technology," explained Dan Cunningham, CTO of ReadyTalk, in this SYS-CON.tv interview at WebRTC Summit at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
The many IoT deployments around the world are busy integrating smart devices and sensors into their enterprise IT infrastructures. Yet all of this technology – and there are an amazing number of choices – is of no use without the software to gather, communicate, and analyze the new data flows. Without software, there is no IT. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, Dave McCarthy, Director of Products at Bsquare Corporation; Alan Williamson, Principal...
Businesses and business units of all sizes can benefit from cloud computing, but many don't want the cost, performance and security concerns of public cloud nor the complexity of building their own private clouds. Today, some cloud vendors are using artificial intelligence (AI) to simplify cloud deployment and management. In his session at 20th Cloud Expo, Ajay Gulati, Co-founder and CEO of ZeroStack, will discuss how AI can simplify cloud operations. He will cover the following topics: why clou...
Video experiences should be unique and exciting! But that doesn’t mean you need to patch all the pieces yourself. Users demand rich and engaging experiences and new ways to connect with you. But creating robust video applications at scale can be complicated, time-consuming and expensive. In his session at @ThingsExpo, Zohar Babin, Vice President of Platform, Ecosystem and Community at Kaltura, discussed how VPaaS enables you to move fast, creating scalable video experiences that reach your aud...
"At ROHA we develop an app called Catcha. It was developed after we spent a year meeting with, talking to, interacting with senior citizens watching them use their smartphones and talking to them about how they use their smartphones so we could get to know their smartphone behavior," explained Dave Woods, Chief Innovation Officer at ROHA, in this SYS-CON.tv interview at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web communications world. The 6th WebRTC Summit continues our tradition of delivering the latest and greatest presentations within the world of WebRTC. Topics include voice calling, video chat, P2P file sharing, and use cases that have already leveraged the power and convenience of WebRTC.
20th Cloud Expo, taking place June 6-8, 2017, at the Javits Center in New York City, NY, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life sett...
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain. In this power panel at @...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and sh...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
20th Cloud Expo, taking place June 6-8, 2017, at the Javits Center in New York City, NY, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy.