Tag Archives: hadoop

Hadoop's Second Generation Offers More To Enterprises

Hadoop’s Second Generation Offers More To Enterprises

5 Big Wishes For Big Data Deployments (click image for larger view and for slideshow) Hadoop is one of the single most disruptive recent innovations in enterprise IT. The promise is to turn the ever-growing tide of data into profit. Even just in my own industry, telecommunications and media, […]

The Uncertainty of Predictions

The Uncertainty of Predictions

Prediction vs. Confidence Intervals There are many kinds of intervals in statistics.  To name a few of the common intervals: confidence intervals, prediction intervals, credible intervals, and tolerance intervals. Each are useful and serve their own purpose. I’ve been recently working on a couple of projects that involve making […]

Splunk Serves Up Big Data Accessibility

Splunk Serves Up Big Data Accessibility

Slide Show Big Data: Eight Facts and Eight Fictions Big Data analysis is fairly useless unless all the insight that can be garnered from making such an investment can be easily consumed across the enterprise. With that goal in mind, at the Splunk Worldwide 2013 Users Conference , Splunk […]

Hunk Setup using Hortonworks Hadoop Sandbox

Hunk Setup using Hortonworks Hadoop Sandbox

Hortonworks Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop examples. Recently Hortonworks and Splunk released a tutorial and video to install and connect Hunk with the Hortonworks Hadoop Sandbox This blog summarizes the configurations used as part of the Hunk setup. Configurations for Hadoop […]

Podcast: Big Data, Open Source and Analytics

Podcast: Big Data, Open Source and Analytics

Episode #15  of the  podcast  is a talk with Stefan Groschupf   available also on  iTunes Stefan is the CEO of Datameer , and talked about how the company started and where it is now. Founded in 2009 by some of the original contributors to Apache Hadoop, Datameer has […]

Why the Big in Big Data Is a Distraction

Why the Big in Big Data Is a Distraction

Analyzing exactly what you want to analyze The benefit of big data tools such as Hadoop and MapReduce is that they can help you pick data up and analyze it faster. Instead of taking days, it can take minutes. Of course, this is useful, but you have to take […]

Python + Hadoop: Real Python in Pig trunk

Python + Hadoop: Real Python in Pig trunk

For a long time, data scientists and engineers had to choose between leveraging the power of Hadoop and using Python’s amazing data science libraries (like NLTK, NumPy, and SciPy). It’s a painful decision, and one we thought should be eliminated. So about a year ago, we solved this problem […]

Full Metal Hadoop - Christian Prokopp | Big Data Republic

Full Metal Hadoop – Christian Prokopp | Big Data Republic

Initially do-it-yourself distributions like Cloudera, MapR, and Hortonworks made up a great part of the market. In recent years, following the success of Amazon Elastic MapReduce (EMR), Hadoop/data services like Qubole have become popular. Qubole in particular has highlighted advantages over EMR. (See my From Zero to Big Data […]

Big Data Investments Currently Earn 50 Cents For Every Dollar Invested

Big Data Investments Currently Earn 50 Cents For Every Dollar Invested

As investments go, Big Data is currently dramatically underperforming the market. Driven in part by media hype, enterprises are investing in Big Data well in advance of actually understanding how to derive value from it . Small wonder, then, that a new analyst report indicates that enterprises are deriving […]

Tomorrow’s future is here today – HDP 2.0 Beta Sandbox

Tomorrow’s future is here today – HDP 2.0 Beta Sandbox

Albert Einstein is credited with saying that he doesn’t worry about the future because it would arrive soon enough. We don’t worry the future either — we focus on building it. And today, we are delighted to release the Hortonworks Data Platform 2.0 Beta Sandbox . This is the […]