- Akka Stream is a library toolkit to provide low latency complex event processing streaming semantics for building highly concurrent, distributed and fault-tolerant event-driven applications using the Reactive Streams specification implemented internally with an Akka actor system. It provides a high-level API geared towards efficient processing.
- Akka is a JVM based implementation allowing you to build actor system.
- Actors can be seen as dealing with streams as well. Actors send and receive series of messages in order to transfer data from one place to another.
Why Akka Streams?
- Akka actors communicate with messages back and forth asynchronously in order to transfer data. This is a low level communication model.
- We can do any kind of streaming semantics using Akka actors. But there are few problems,
- fairly difficult to model high-level abstract semantics.
- It would have a lot of boilerplate code.
- when streaming the data, it can be overflow or overwhelming any buffer or mailboxes. There could be OutOfMemoryErrors. Another pitfall is that Actor messages can be lost and must be re-transmitted in that case.
Due to these reasons, they have implemented fully functional stream processing library toolkit called Akka Stream.
Reactive Streams is an initiative to provide a standard for asynchronous
stream processing with non blocking back pressure.
Relationship with Reactive Stream
- Akka Stream is a library toolkit to provide low latency complex event processing streaming semantics using the Reactive Streams.
- The Akka Streams API is completely decoupled from the Reactive Streams interfaces.
- The relationship between Akka Stream and Reactive Stream is that the Akka Streams API is geared towards end-users while the Akka Streams implementation uses the Reactive Streams interfaces internally to pass data between the different operators.
- While Akka Streams focus on the formulation of transformations on data streams the scope of Reactive Streams is to define a common mechanism of how to move data across an asynchronous boundary without losses, buffering or resource exhaustion.
- Implemented using Akka actors.
Why back pressure?
Akka Streams implement an asynchronous non-blocking back-pressure protocol standardised by the Reactive Streams specification.
1. Slow Publisher & Fast Consumer
This is the happy case. We don’t need to slow down the publisher in this case.
2. Fast Publisher & Slow Consumer
This is the case that back-pressure comes into the picture.
It is basically to accommodate the fast producer and slow consumer problem. Fast producer overwhelming any of your systems downstream. There can be outOfMemoryError or data can be lost.
If we have this kind of scenario, internally signal upstream and ask they to slow down or ask them to not overwhelm by sending messages or data.
Akka Stream Basic Components
This is using the reactive streams principles and help to build the flexible loosely coupled scalable systems that communicate with asynchronous messaging.
We can easily build complex data processing pipelines using Akka Stream because Akka Stream is handling back-pressure, concurrent, distributed and fault-tolerant and etc.
Source ~> Flow = Source
Flow ~> Sink = Sink
Flow ~> Flow = Flow
Broadcast ~> Merge = Flow
Source ~> Sink = RunnableGraph
Source ~> Flow ~> Sink = RunnableGraph
Akka Streams and Akka Actors
- We can only interact with the actor by putting messages into the mail box. The actor system then ensure actor will process a single message at a time.
- Each actor has exactly one mailbox. Mailbox is using FIFO to process the messages.
- Each one of these graph stages like an actor only processes one message at a time and it easy in actor systems to model concurrency. We don’t need to worry about closing another threads over the same state.
- If we have multiple threads then we have concurrency. So here also if we have multiple actors then we have concurrency.
- These source, flow and sink components run on the Actors.
We should pass the materializer to run this streams. ActorSystem can be used as the implicit materializer parameter to use the SystemMaterializer for running the stream.
By default(Example 1), source, flow and sink actually run on the same actor. Here There is no back-pressure. Every elements being processed by sequentially.
Here slow sink will slow down entire stream.
To allow the parallel processing, we will have to insert asynchronous boundaries manually into our flows and operators by using the method aync on Source, Flow and Sink. It will communicate with the downstream of the graph in an asynchronous fashion.
According to this example, source and flow run on one actor and sink run on another actor. We have used async keyword to put an asynchronous boundary around this Source.
In this case, flow has received 16 elements. Default capacity of internal buffer is 16. When flow is full, it will signal a back-pressure to the source.Flow will maintain the throughput of the stream by buffering elements in the mean time. That is why after full filled the internal buffer of flow, it is processing the sink.
when buffering the elements, flow will notice that there are empty buffer spaces and it will start requesting more elements from the source. Again when the buffer is full, it will signal a back-pressure to the source.
In this example we create two regions within the flow which will be executed in one Actor each. Source and adding flow run on one actor and multiplying and sink run on different actor. It means adding and multiplying can work in parallel.
Internal buffer has default capacity 16. that is why always 16 messages are
printed on the logs when back-pressure come into the picture.
In this example, source and adding flow run on one actor and multiplying flow and sink also run on different actors. You can see the logs. Multiplying and sink have been run in parallel.