Tag Archives: hadoop

New in HDP 2: More Powerful Scheduling Options in Oozie

New in HDP 2: More Powerful Scheduling Options in Oozie

In this post we’ll cover some new scheduling options available via Apache Oozie in HDP 2. You can try out these capabilities in HDP 2 Beta and HDP 2 Beta Sandbox . What Is Oozie Again? Apache Oozie is a workflow engine and scheduler for Hadoop. Oozie allows you […]

Cascading Architecture

Zero to Predictive Models in Minutes

This is a guest blog post from Gary Nakamura, CEO at our partner Concurrent, Inc . discussing Cascading Pattern and the new Hadoop tutorial they have written for the Hortonworks Sandbox. This is one of the first tutorials aimed at more experienced crowd. Enjoy! Cascading Pattern: Deploy Predictive Models […]

Blending Data Helps Gain Big Data Business Insights in a Structured IT World

Blending Data Helps Gain Big Data Business Insights in a Structured IT World

Loraine Lawson |   Integration   |  17 Sep, 2013 Four Steps to a Big Data Strategy Combining operational data from other sources — particularly Big Data sets — is generating a lot of discussion as a “next step” for companies investing in Big Data. So it’s not surprising that Pentaho’s […]

Benchmarking Graph Databases

Benchmarking Graph Databases

By Alekh Jindal , MIT CSAIL Graph data management has recently received a lot of attention, particularly with the explosion of social media and other complex, inter-dependent datasets. As a result, a number of graph data management systems have been proposed. But this brings us to the question: What […]

Laplace the Bayesianista and the Mass of Saturn

Laplace the Bayesianista and the Mass of Saturn

I’m reviewing Bayes’ theorem and related topics for the upcoming GDAT class . In its simplest form, Bayes’ theorem is statement about conditional probabilities. The probability of A, given that B has occurred, is expressed as: begin{equation} Pr(A|B) = dfrac{Pr(B|A)timesPr(A)}{Pr(B)} label{eqn:bayes} end{equation} In Bayesian language, $Pr(A|B)$ is called the […]

Social Media Marketing: How Big Data is Changing Everything

Social Media Marketing: How Big Data is Changing Everything

Every second of every day, Big Data gets bigger. Social media alone generates endless streams of data, flowing in from Facebook, Twitter, Pinterest and other social sites like never before. Fortunately, sophisticated analytics platforms have arrived on the scene to help social media marketers manage, analyze and leverage large […]

Vendors Announce Search, Tools to Make Hadoop User-Friendly

Vendors Announce Search, Tools to Make Hadoop User-Friendly

Four Ways Security Analytics Can Improve Business Performance Cloudera, which offers a distribution of Hadoop, announced this week a search for data stored in Hadoop Distributed File System and Apache HBase. The company says it’s the “industry’s first fully integrated search engine for interactive exploration” of data in these […]

SAP Gets Sexy with Hadoop Partnerships, Big Data Road Trip

SAP Gets Sexy with Hadoop Partnerships, Big Data Road Trip

Information MAnagement, Analytics,SAP Gets Sexy with Hadoop Partnerships, Big Data Roadtrip What’s SAP doing at TechCrunch Disrupt — it’s not exactly an Enterprise audience? But then again, we’re living in an age of disruptive technologies — Big Data, Cloud, Social, Mobile, Geospatial visualizations and touchpoints, the Internet of Things […]

How to Plan and Configure YARN and MapReduce 2 in HDP 2.0

How to Plan and Configure YARN and MapReduce 2 in HDP 2.0

As part of HDP 2.0 Beta , YARN  takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines.  This also streamlines MapReduce to do what it does best, process data.  With YARN, you can now run multiple applications in Hadoop, all […]

Apache Tez: A New Chapter in Hadoop Data Processing

Apache Tez: A New Chapter in Hadoop Data Processing

In this post we introduce the motivation behind Apache Tez  ( http://incubator.apache.org/projects/tez.html ) and provide some background around the basic design principles for the project. As Carter discussed in our previous post on Stinger progress , Apache Tez is a crucial component of phase 2 of that project. What […]