Diagnose and decrease AWS Lambda iterator age for stream-based invocations.
AWS Lambda can use stream based services as invocation sources, essentially making your Lambda function a consumer of streams such as Kafka stream, Kinesis or DynamoDB streams.
If you are dealing with high volume data, you probably have to fine-tune your lambda function to process the incoming data as soon as possible and keep a low iterator age.
What is iterator age
One of the most important metrics which can show the latency of your Lambda function is Iterator age. Iterator age is the time between when the last record in a batch was recorded and when Lambda reads the record — AWS
In other word, it can show if your Lambda functions are keeping up with records added to your streams. If the Lambda functions get too far behind and your Lambda iterator age exceeds the retention period of your invocation stream, your data records are getting expired without being processed.
As shown in the picture below, if the invocation stream of your Lambda has a retention period of 7 days and your Lambda Iterator Age is approaching 605M milliseconds(around 7 days). You are losing some of your data before processing them!
Troubleshooting a high Iterator age
There are a number of reasons why your Lambda iterator age is too high.
1. Memory issues
If you Lambda function is memory intensive and you haven’t allocated enough memory to your function, your function could run slowly which raise the IteratorAge metric.
To test if memory is the bottleneck, increase the memory allocated to the function and check the CloudWatch graph for the Duration metric. If the duration gets lower when you increase the memory setting, there is high possibility that memory is what is slowing down your function.
2. Batch size issue
BatchSize controls the maximum number of records that can be sent to your function with each invoke. Lambda passes all of the records in the batch to the function in a single call, as long as the total size of the events doesn’t exceed the payload limit for synchronous invocation.
As long as the total records don’t hit the payload limit, a larger batch size can often more efficiently absorb a larger set of records, increasing your overall throughput.
As the picture shows below, after changing the batch size from 50 to 500, the iterator age drop from significantly. At the same time, the invocation number drops and duration went up as our Lambda function process more records per invocation. Make sure your duration is below the maximum of lambda execution time (for example, 5 mins) to avoid time out errors.
However, sometimes the actual batch size is smaller than the configured batch size which makes simply increasing batch size not able to decrease the Lambda Iterator Age.
From AWS, The actual batch size is determined by three factors:
The number of records received from the GetRecords call that Lambda makes when polling your event source.
The user-configured batch size, which is the largest number of records that are read from your stream at once. For more information, see Creating an Event Source Mapping.
The number of records that can fit within the Lambda invocation payload size limit of 6 MB. A larger record size means that fewer records can fit in the payload. For more information on limits, see AWS Lambda Limits.
3. Not enough Kinesis shards
Kinesis shards might be the final and the biggest bottleneck of your lambda iterator age.
A Kinesis stream is composed of one or more shards. One single Lambda invocation process records from one single shard. For example, if your stream has 10 active shards, there will be at most 10 Lambda function invocations running concurrently. Increasing the number of shards will directly increase the number of maximum concurrent Lambda function invocations which can increase your Kinesis processing throughput and decrease the Lambda Iterator Age.
As shown in the picture below, the iteratorAge dropped sharply and the invocation increased almost 4 times after we changed the number of kinesis shards from 4 to 16.
However, increasing the shard number won’t behave as good as expected if you did not set the partition key of kinesis stream in the right way. Be sure to choose the right partition key to make your kinesis stream equally share the incoming data so your Lambda function won’t be blocked on some specific hot stream.
4. Function Errors
Invocation errors can make your Lambda function keep processing the same event repeatedly. Because event records are processed sequentially, your function can’t progress to later records if there is an error when processing a batch of records. In these cases, the iterator age increases almost linearly as those records age.
5. Concurrent batches per shard
You can also increase concurrency by processing multiple batches from each shard in parallel. Lambda can process up to 10 batches in each shard simultaneously.
In the Additional Settings of kinesis stream, you can find the Concurrent batches per shard. You can increase it up to 10.
🎉 Congratulations: You decrease your Lambda Iterator Age successfully!
If you have any questions, don’t hesitate to ask me 😉