Welcome!

IBM Cloud Authors: Liz McMillan, Jeev Trika, Yeshim Deniz, Elizabeth White, John Esposito

Blog Feed Post

Hadoop & NoSQL – Friends, not frenemies (Published in SDTimes, January 7, 2014)

The term Big Data is an all-encompassing phrase that has various subdivisions addressing different needs of the customers. The most common description of Big Data talks about the four V’s: Volume, Velocity, Variety and Veracity.Volume represents terabytes to exabytes of data, but this is data at rest. Velocity talks about streaming data requiring milliseconds to seconds of response time and is about data in motion. Variety is about data in many forms: structured, unstructured, text, spatial, and multimedia. Finally, veracity means data in doubt arising out of inconsistencies, incompleteness and ambiguities.Hadoop is the first commercial version of Internet-scale supercomputing, akin to what HPC (high-performance computing) has done for the scientific community. It performs, and is affordable, at scale. No wonder it originated with companies operating at Internet scale, such as Yahoo in the 1990s, and then at Google, Facebook and Twitter.

In the scientific community, HPC was used for meteorology (weather simulation) and for solving engineering equations. Hadoop is used more for discovery and pattern matching. The underlying technology is similar: clustering, parallel processing and distributed file systems. Hadoop addresses the “volume” aspect of Big Data, mostly for offline analytics.

NoSQL products such as MongoDB address the “variety” aspect of Big Data: how to represent different data types efficiently with humongous read/write scalability and high availability for transactional systems operating in real time. The existing RDBMS solutions are inadequate to address this need with their schema rigidity and lack of scale-out solutions at low cost. Therefore, Hadoop and NoSQL are complementary in nature and do not compete at all.

Whether data is in NoSQL or RDBMS databases, Hadoop clusters are required for batch analytics (using its distributed file system and Map/Reduce computing algorithm). Several Hadoop solutions such as Cloudera’s Impala or Hortonworks’ Stinger, are introducing high-performance SQL interfaces for easy query processing.

Hadoop’s low cost and high efficiency has made it very popular. As an example, Sears’ process for analyzing marketing campaigns for loyalty club members used to take six weeks on mainframe, Teradata and SAS servers. The new process running on Hadoop can be completed weekly.
The Hadoop systems, at 200TB, cost about one-third of 200TB relational platforms. Mainframe costs have been reduced by more than US$500,000 per year while delivering 50x to 100x better performance on batch jobs. The volume of data on Hadoop is currently at 2PB. Sears uses Datameer, a spreadsheet-style tool that supports data exploration and visualization directly on Hadoop. It claims to develop interactive reports in three days, a process that use to take six to 12 weeks.

NoSQL products such as MongoDB are getting hugely popular in the developer community. They seamlessly blend with modern programming languages like JavaScript, Ruby and Python, thus imparting high coding velocity. This simplicity has made them very popular in a short amount of time.

With RDBMS, there was impedance mismatch when an object-oriented programming model had to map to the row-column structure of the database (like translating Swahili to French). The rich data model can handle varieties of data with full indexing and ad hoc query capabilities.

The other reason is its ability to scale horizontally over commodity servers and provide massively parallel processing. This aspect is similar to Hadoop’s distributed architecture. However, NoSQL has to deal with the operational aspects of production databases running on premise or in the cloud, whereas Hadoop basically operates in offline batch mode for analysis.

NoSQL is used by large enterprises to build “systems of engagement.” Enterprise IT has spent decades building “systems of record” to run their business—essentially technology that contains a database. Now, CIOs are under pressure to build systems of engagement in which the focus is on using modern technology and the Internet to better communicate internally and externally.

One such system of engagement was recently built at MetLife, the 145-year old insurance company. The goal was to provide a 360-degree view of the customer (switching from a policy-centric view to a customer-centric view), whose information was scattered across 20 legacy systems of record. This way, any agent at MetLife can get a complete picture of a customer’s activities using a mobile device, anytime, from anywhere.

The entire system was developed and deployed in three months using the MongoDB platform. The reasons for the rapid deployment were attributed to MongoDB’s flexible data model, linear scaling via its sharding architecture, high coding velocity, and iterative development using JSON.NoSQL and Hadoop have a peaceful coexistence. MongoDB, for example, offers a Hadoop connection pipe for easy movement of data between the two stores. Similarly, Oracle offers a connection for data movement between Hadoop and the Oracle DB. Future additions to Hadoop such as YARN and Tez are aimed at extending it for real-time data loading and queries, but not to solve the needs of mission-critical production systems (the domain of NoSQL).Jnan Dash is a technology visionary and executive consultant in Silicon Valley. He spent 10 years at Oracle and was the Group Vice President of Systems Architecture and Technology. Prior to joining Oracle, he spent 16 years at IBM in various positions, including in development of the DB2 family of products and leading IBM’s database architecture and technology efforts.


Read the original blog entry...

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

@ThingsExpo Stories
The IoTs will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm and share the must-have mindsets for removing complexity from the development proc...
SYS-CON Events announced today TechTarget has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. TechTarget is the Web’s leading destination for serious technology buyers researching and making enterprise technology decisions. Its extensive global networ...
SYS-CON Events announced today that MangoApps will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. MangoApps provides modern company intranets and team collaboration software, allowing workers to stay connected and productive from anywhere in the world and from any device. For more information, please visit https://www.mangoapps.com/.
SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management...
The essence of data analysis involves setting up data pipelines that consist of several operations that are chained together – starting from data collection, data quality checks, data integration, data analysis and data visualization (including the setting up of interaction paths in that visualization). In our opinion, the challenges stem from the technology diversity at each stage of the data pipeline as well as the lack of process around the analysis.
SYS-CON Events announced today that Alert Logic, Inc., the leading provider of Security-as-a-Service solutions for the cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Alert Logic, Inc., provides Security-as-a-Service for on-premises, cloud, and hybrid infrastructures, delivering deep security insight and continuous protection for customers at a lower cost than traditional security solutions. Ful...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, wh...
In his session at 18th Cloud Expo, Bruce Swann, Senior Product Marketing Manager at Adobe, will discuss how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects). Bruce Swann has more than 15 years of experience working with digital marketing disciplines like web analytics, social med...
Designing IoT applications is complex, but deploying them in a scalable fashion is even more complex. A scalable, API first IaaS cloud is a good start, but in order to understand the various components specific to deploying IoT applications, one needs to understand the architecture of these applications and figure out how to scale these components independently. In his session at @ThingsExpo, Nara Rajagopalan is CEO of Accelerite, will discuss the fundamental architecture of IoT applications, ...
SYS-CON Events announced today that Tintri Inc., a leading producer of VM-aware storage (VAS) for virtualization and cloud environments, will exhibit at the 18th International CloudExpo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, New York, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that ContentMX, the marketing technology and services company with a singular mission to increase engagement and drive more conversations for enterprise, channel and SMB technology marketers, has been named “Sponsor & Exhibitor Lounge Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York City, New York. “CloudExpo is a great opportunity to start a conversation with new prospects, but what happens after the...
SYS-CON Events announced today that EastBanc Technologies will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. EastBanc Technologies has been working at the frontier of technology since 1999. Today, the firm provides full-lifecycle software development delivering flexible technology solutions that seamlessly integrate with existing systems – whether on premise or cloud. EastBanc Technologies partners with p...
WebRTC is bringing significant change to the communications landscape that will bridge the worlds of web and telephony, making the Internet the new standard for communications. Cloud9 took the road less traveled and used WebRTC to create a downloadable enterprise-grade communications platform that is changing the communication dynamic in the financial sector. In his session at @ThingsExpo, Leo Papadopoulos, CTO of Cloud9, will discuss the importance of WebRTC and how it enables companies to fo...
The IoT is changing the way enterprises conduct business. In his session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, discuss how businesses can gain an edge over competitors by empowering consumers to take control through IoT. We'll cite examples such as a Washington, D.C.-based sports club that leveraged IoT and the cloud to develop a comprehensive booking system. He'll also highlight how IoT can revitalize and restore outdated business models, making them profitable...
SYS-CON Events announced today the How to Create Angular 2 Clients for the Cloud Workshop, being held June 7, 2016, in conjunction with 18th Cloud Expo | @ThingsExpo, at the Javits Center in New York, NY. Angular 2 is a complete re-write of the popular framework AngularJS. Programming in Angular 2 is greatly simplified. Now it’s a component-based well-performing framework. The immersive one-day workshop led by Yakov Fain, a Java Champion and a co-founder of the IT consultancy Farata Systems and...
What a difference a year makes. Organizations aren’t just talking about IoT possibilities, it is now baked into their core business strategy. With IoT, billions of devices generating data from different companies on different networks around the globe need to interact. From efficiency to better customer insights to completely new business models, IoT will turn traditional business models upside down. In the new customer-centric age, the key to success is delivering critical services and apps wit...
Join us at Cloud Expo | @ThingsExpo 2016 – June 7-9 at the Javits Center in New York City and November 1-3 at the Santa Clara Convention Center in Santa Clara, CA – and deliver your unique message in a way that is striking and unforgettable by taking advantage of SYS-CON's unmatched high-impact, result-driven event / media packages.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, will provide an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life ...
SYS-CON Events announced today that BMC Software has been named "Siver Sponsor" of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. BMC is a global leader in innovative software solutions that help businesses transform into digital enterprises for the ultimate competitive advantage. BMC Digital Enterprise Management is a set of innovative IT solutions designed to make digital business fast, seamless, and optimized from mainframe to mo...
SYS-CON Events announced today that MobiDev will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business managers to a full-scale mobile software company with over 200 develope...