Jathena: An Open Source Amazon Athena

Jathena: An Open Source Amazon Athena

Back in July, I presented the work I’ve been doing creating an open source version of Amazon Athena at Open West. You can find the slides at the end of this post as well as a link to the GitHub repo. The GitHub repo has all of the configurations you need and a walk through video of how to use it. This blog post serves as a written document of my journey and some of the gotchas I experienced along the way. 

Read More

Partitions in Apache Spark

Partitions in Apache Spark

One of the most important things to learn about Spark is that it's not magic. The framework still adheres to the rules of computer science. What I mean by this is that you can still do plenty of unoptimized workflows and see poor performance. Understanding how Spark works under the hood, from even a cursory level, can help in writing better Spark applications. 

Read More