Tag Archives: map

Spark: Low Latency, Massively Parallel Processing Framework

While Hadoop fits well in most batch processing workloads, and is the primary choice of big data processing today, it is not optimized for other types of workloads due to its following limitation:  For a more detail elaboration of the Hadoop limitation , refer to my previous post . […]

CMWire's Top 10 Hits of 2013: Big Data

CMWire’s Top 10 Hits of 2013: Big Data

Yes, Big Data was a Big Buzzword in 2013. The technology and business press — and even mainstream media — got a piece of the action, churning out article after article about what Big Data means to you. And that’s part of the problem. Big Data means lots of […]

Using Amazon’s Elastic MapReduce to Compute Recommendations with Apache Mahout 0.8

Using Amazon’s Elastic MapReduce to Compute Recommendations with Apache Mahout 0.8

Apache Mahout is a “scalable machine learning library” which, among others, contains implementations of various single-node and distributed recommendation algorithms. In my last blog post, I described how to implement an on-line recommender system processing data on a single node. What if the data is too large to fit […]

Make Big Data Portable: the Basics

Make Big Data Portable: the Basics

Soam Acharya If you’re reading this, then you probably know that we’re very much pro Hadoop-as-a-Service. Obviously, many organizations we speak to have concerns about the logistics of transporting all their data. While at first glance this process can appear intimidating, it’s actually a lot easier than many suspect, […]

Cloudera's Enterprise Data Hub Rises to the Call of Amazon's AWS

Cloudera’s Enterprise Data Hub Rises to the Call of Amazon’s AWS

Room with Clouds Someone joked at Strata and Hadoop World earlier this year that Cloudera was ahead of its time when it chose its name. “You should have called it On-premise era,” said the would-be comedian, referring to the fact that Cloudera and most other enterprise-grade Hadoop distros live […]

Intel Goes Graph with Hadoop Distro

Intel Goes Graph with Hadoop Distro

Language Flags HPCwire Japan Omnibond Xyratex Brocade Fusion-io Data Direct Networks Revolution Analytics Scale MP Karmasphere ScaleOut December 17, 2013 Alex Woodie Intel will be targeting big retail operations with a new graph database that it unveiled today as part of its Intel Distribution for Apache Hadoop version 3 […]

How Ancestry.com Manages Generations Of Big Data

How Ancestry.com Manages Generations Of Big Data

Jeff Bertolucci Over the past year, the genealogy site’s repository of family historical data has more than doubled in size. Here’s how Ancestry managed its growth. Businesses often use — or overuse — the term “big data” to describe all sorts of data-related products and services, but the buzzword […]

Are VCs Getting Duped by Hadoop?

Are VCs Getting Duped by Hadoop?

Robert Mullins Always keeping an eye on the horizon While Hadoop is touted as the next big thing, the next, NEXT big thing is something called the “semantic Web,” according to Charles Silver, CEO of Algebraix, which has developed what Silver calls the first commercially available, high-performance platform for […]

Introducing CrimeMap – A Web App Powered by ShinyApps!

Introducing CrimeMap – A Web App Powered by ShinyApps!

A few months ago I did a mini project using open crime data and R to create crime visualisations . At that time, I was already thinking about a web app using Shiny but I couldn’t justify the time to develop the app and then set up a server […]

Five ways to handle Big Data in R

Five ways to handle Big Data in R

Five strategies to tackle big data with R Big data was one of the biggest topics on this year’s useR conference in Albacete and it is definitely one of today’s hottest buzzwords. But what defines “Big Data”? And on the practical side: How can big data be tackled in […]