Tag Archives: hadoop

Alpine Data Analytics App Works Directly Against Hadoop

Alpine Data Analytics App Works Directly Against Hadoop

Slide Show Big Data: Not Just for Big Business Anymore The popular perception is that anything involving petabytes of data requires a lot of IT people and at least one data scientist to analyze. In reality, however, analytics applications are scaling to the point where analysts can now analyze […]

Spatial Clustering With Equal Sizes

Spatial Clustering With Equal Sizes

Cluster Map This is a problem I have encountered many times where the goal is to take a sample of spatial locations and apply constraints to the algorithm.  In addition to providing a pre-determined number of K clusters a fixed size of elements needs to be held constant within […]

MapR Moves to Secure Hadoop

MapR Moves to Secure Hadoop

Slide Show Big Data: Not Just for Big Business Anymore When it comes to anything relating to Big Data, concerns about security are never far away. After all, concentrating massive amounts of data in one place can make for a very tempting target for hackers. At the Strata Conference […]

Stinger Initiative Brings SQL Users to Hadoop Via Apache Hive

Stinger Initiative Brings SQL Users to Hadoop Via Apache Hive

Slide Show Eight Ways to Put Hadoop to Work in Any IT Department Hadoop is big, but there’s no doubt that the game changer will be marrying SQL— the primary language used by business analysts for ad hoc analysis—with Hadoop. If you don’t want the information in Hadoop to […]

Creating a .NET-based Visual Monitoring System for Hadoop | .NET Zone

Creating a .NET-based Visual Monitoring System for Hadoop | .NET Zone

Summary Generic Hadoop doesn’t provide any out-of-the-box visual monitoring systems that report on the status of all the nodes in a Hadoop cluster. This JNBridge Lab demonstrates how to create a .NET-based monitoring application that utilizes an existing Microsoft Windows product to provide a snapshot of the entire Hadoop […]

Using Amazon’s Elastic MapReduce to Compute Recommendations with Apache Mahout 0.8

Using Amazon’s Elastic MapReduce to Compute Recommendations with Apache Mahout 0.8

Apache Mahout is a “scalable machine learning library” which, among others, contains implementations of various single-node and distributed recommendation algorithms. In my last blog post, I described how to implement an on-line recommender system processing data on a single node. What if the data is too large to fit […]

Hadoop 2.0: With YARN, the Game Changes

Hadoop 2.0: With YARN, the Game Changes

I’ve been working with Hadoop for a few years now, and the platform and ecosystems has been advancing at an amazing pace with new features and additional capabilities appearing almost on a daily basis. Some changes are small, like better scheduling in Oozie ; some are still progressing, like […]

Image Search with Splunk and Hunk

Image Search with Splunk and Hunk

One of the sexy new features Hunk brings to the Splunk 6 smorgasbord, is preprocessing data. Since Hunk is built on top of Hadoop’s MapReduce framework, we can utilize it’s preprocessing framework. Basically, now you can take any data, write a piece of code that turns it into text, […]

Configuring Hadoop with Guava MapSplitters

Configuring Hadoop with Guava MapSplitters

hadoop-logo In this post, we are going to provide a new twist on passing configuration parameters to a Hadoop Mapper via the Context object. Typically, we set configuration parameters as key/value pairs on the Context object when starting a map-reduce job. Then in the Mapper, we use the key(s) […]

Enabling Interactive Data Exploration over Big Data

Enabling Interactive Data Exploration over Big Data

ugur_cetintemel By Ugur Cetintemel , Brown University Interactive Data Exploration (IDE) has been a recent focus area of the Brown Data Management Group . This is an emerging form of data-intensive analytics in which users ask questions over a data set to make sense of the data, identify interesting […]