Blog

Posts in Data Engineering
Building a Real-Time Bike-Share Data Pipeline with StreamSets, Kafka and MapD

In this post, we will use the Ford GoBike Real-Time System, StreamSets Data Collector, Apache Kafka and MapD to create a real-time data pipeline of bike availability in the Ford GoBike bikeshare ecosystem. We’ll walk through the architecture and configuration that enables this data pipeline  and share a simple auto-updating dashboard within MapD Immerse.

Read More
Jathena: An Open Source Amazon Athena

Back in July, I presented the work I’ve been doing creating an open source version of Amazon Athena at Open West. You can find the slides at the end of this post as well as a link to the GitHub repo. The GitHub repo has all of the configurations you need and a walk through video of how to use it. This blog post serves as a written document of my journey and some of the gotchas I experienced along the way. 

Read More