Livy provides an interesting way to use Spark as a RESTful service. In my opinion, this is not an ideal way to interact with Spark, however. There is just a tad too much overhead of language interoperability to make it worth it. For starters, sending strings of Scala code over the wire doesn’t inspire a lot of confidence.
Read MoreFor the last year or so I’ve been blogging regularly about the Apache Spark platform. During that time, Spark has grown from something that people in data science and engineering have used to something that is almost ubiquitous. I’ve enjoyed working with the platform professionally, and even on a number of personal projects.
Read MoreI’ve been into this home automation thing for some time now. Any device on the market, I’ve most likely tried it already and there is an equally good chance that there is one functioning in my house. Most of the home automation products available for the mass market are still pretty user-hostile and ever so expensive.
Read MoreGraphFrames allow us to do exactly this. It’s an API for doing Graph Analytics on Spark DataFrames. This way, we can try to recreate SQL queries in Graphs and have a better grasp of the graph concepts. Not having to load the data and create the relationships makes a lot of difference in a pedagogical context (At least I’ve found).
Read MoreThe past few weeks I’ve been testing Amazon Athena as an alternative to standing up Hadoop and Spark for ad hoc analytical queries. During that research, I’ve been looking closely at file formats for the style of data stored in S3 for Athena. I have typically been happy with Apache Parquet as my go-to, because of it’s popularity and guarantees, but some research pointed me to Apache ORC and it’s advantages in this context.
Read More