Jowanza Joseph

Blog

Streaming Systems Book Review

tl;dr 

Streaming Systems is an excellent book, but it is focused on stream processing and not necessarily end-to-end streaming systems (which some may find disappointing). 

Authors





Tyler Akidau, Slava Chernyak, and Reuven Lax are all engineers at Google. They have worked on everything from designing and developing Apache Beam, to working on the stream processing strategy at Google. Reading through their bios, I get the sense that they are not only intimately familiar with the challenges of stream processing, but they have a strong understanding of the tradeoffs in each approach to stream processing. 

There is some variance in the writing style of the authors, and It shows throughout the book. I found that stylistic changes from chapter-to-chapter to be refreshing. If I found that a section was too pedantic, I usually got some clarification in a future chapter by another author. 

Content

The book has a nice mix of written content, code samples, images, and videos (you have to use a digital source to get the videos). The content fits together well, and you can sometimes use them interchangeably. For example, sometimes, I didn't understand a concept, and I could watch a video, and it made sense. Other times the video was harder to follow, and reading through a code snippet made sense. 

An example figure from the book

An example figure from the book

One knock on the book, in my view, is that most of the code examples are using the Beam SDK. The choice to use Beam SDK code makes sense given the author's backgrounds, but I think the book would have benefitted by using code snippets from a more widely used library like Apache Flink. 

The authors will use Apache Flink and Apache Spark's stream processing model to contrast with the Beam or Cloud Data Flow approach, which are some of the best moments in the book. These passages help to solidify the tradeoffs discussed throughout the chapter and provide a perspective outside of that offered by Beams/Cloud DataFlows implementation.

One puzzling decision in the editing of this book is the decision to include a final chapter on the evolution of large scale data processing. Some of the contexts in this chapter would have been valuable throughout the book, and other parts of the section felt superfluous. I understand the connection of MapReduce to Stream Processing, but the placement of it felt strange. 

spark-logo-trademark.png

Pace

The pacing of this book is one of the things I like best about it. The authors break down the problem of streaming from the motivation to how SQL on Streams works from an implementation perspective. The authors are systematic and provide thorough coverage of complex topics like Watermarks, exactly-once processing, and streaming joins within reasonable chapter sizes. 

Style 

The writing style I would best classify as not for everyone. I read most of this book as part of a book club, and some of my co-workers found it wordy and academic. An example of this is when the authors define the meaning of a watermark:

The watermark is a monotonically increasing timestamp of the oldest work not yet completed.

I can comprehend this with the context I have with streaming systems, but others in the group struggled with this definition and sought definitions elsewhere. Outside of rare moments like this, I find the writing style to be approachable and beginner-friendly. The authors have a sense of topics that are harder to grok, and they spend more time on those subjects, helping the reader to solidify the concepts in their minds. 

Audience

I imagine the minimum requirements for reading this book are familiarity with large scale data processing. The motivation materials in the early chapters site large scale data processing systems for motivations for design and implementation for streaming systems, for example. Additionally, the book is filled with code snippets (In Java), and some ability to read code would be required as well. 

A high level illustration of streaming data

A high level illustration of streaming data

Conclusion

Streaming Systems is an excellent book for anyone who wants to gain a firm grasp on the challenges in stream processing systems. Despite being entrenched in the stream processing systems covered in the book for many years, I had some tangible takeaways from almost every chapter. 

Bonus

Streaming 101

Streaming 102

4.5/5