Santiago Calcagno

Notes on re:Invent 2022 talks I attended

Here’s a rough dump of all the notes I took while attending AWS’ re:Invent this year. I’m focusing on Chalk Talks, which are not recorded and usually have a smaller live audience, but I also included links to the Breakout Sessions if you’re interested in watching them. I hope this is useful to someone!

Serverless stream processing with AWS Lambda, Amazon Kinesis & Kafka (SVS402)


The talk began with a discussion on a car data collection architecture using Kinesis Data Streams/Lambda/Firehose/DynamoDB/S3

Custom metric: Records/Batch

At the shard level, λ polls shard 1/sec and invokes synchronously with batch of messages. If it succeeds it goes on to the next batch, if errors, depends on how the event source map is configured:

(Same for DDB streams 👆)

Some other comments made:

New stuff: event filtering

Keep Lambda concurrency limits in mind when using Kinesis on demand

Some words on parallelization factor (note: keeps order, “shard of shards”, won’t work if all/most shards have same pkey) 

A performance step-by-step:

  1. Ensure working system
  2. Measure performance: time to process 1 message, average batch
  3. Tune producer
  4. Tune λ: code, powertune with batch
  5. Tune event source mapping: batch size/window, enhanced fan-out, parallelization factor, error handling

Max performance: ignore error config in ESM and retries and log all to CloudWatch

Tune λ memory size if slow

Kinesis producer library not recommended for Lambda in speakers opinion (at least for Python)


Scenario: fraud detection (transactions stream)

AWS configures a “poller” which is the thing that pulls and sends messages to the function. Starts with 1 poller and 1 λ and then scales pollers vertically and horizontally according to the need (up to same number of partitions)

Metrics to monitor

Multi region: Kinesis (producer writes to 2 Kinesis), Kafka (has replication)

Migrate from a relational database to Amazon DynamoDB (DAT306)

Relational DB challenges

Related to above

Should you migrate?


Migration strategies

Architecting for resiliency and high availability (ENU303)

IIRC most of this talk’s content is already covered by preparing for the Solutions Architect exam (setting up things across AZs and regions, that kind of thing). My takeaway from it was this set of tools:

Running containerized workloads on AWS Graviton–based instances (CMP408)

Question on how to convince devs to use Graviton–based

Even though ie 10% slower, if cost is 20% lower and SLO is OK that’s still a win

Discussion on how to build multi-arch containers with pipelines that look like this

Having multi arch containers is good when combined with spot -> can use any arch instance for lower price

Can define ECS Fargate task as Linux ARM64 to use Graviton

Is there a tool to check for Graviton readiness? No, nothing automatic

When not to use Graviton? (I wonder if most of this holds with the new instance types, WS still does of course)

Build resilient applications using Amazon RDS and Aurora PostgreSQL (DAT316)

Some more tools/configs!


Global resiliency

JDBC driver is topologically aware -> handles DB connection failover scenarios better

Postgres connections: around hundreds connections, not thousands

Future: RDS blue-green deployments

Online talks

Lots of cool things on these ones.

Observability the open-source way (COP301)

Deploy modern and effective data models with Amazon DynamoDB (DAT320)

Best practices for advanced serverless developers (SVS401)

From RDBMS to NoSQL (sponsored by MongoDB) (PRT314)

Building observable applications with OpenTelemetry (BOA310)

Building containers for AWS (CON325)

Most I knew, but I’m taking this with me: ECR pull-through cache

Introducing Trusted Language Extensions for PostgreSQL (DAT221)

Build scalable multi-tenant databases with Amazon Aurora (DAT318)

I asked the speaker why they didn’t opt for DMS for the migration process, the answer was that their DBAs had (presumably much more) experience with standard MySQL tooling