Weekly Engineering Report

Summary

This week we completed the payment-service 1.14.0 rollout to 100% of production with no customer-facing incidents. p95 latency on the checkout path improved by ~18% (from 210ms to 172ms) and error rates remained below 0.2% across all regions.

Key highlights: - Connection pooling and timeout tuning reduced transient failures during peak load. - Canary analysis validated stability before full rollout. - Runbook updates shortened mean time to mitigate for payment-related alerts.

Focus areas for next week: - Ship SLO-based alert thresholds to reduce noise. - Address flaky refund idempotency test to unblock CI. - Plan production upgrade for Postgres 14.8 after final staging sign-off.

Open Issues

ID Title Severity Status
PAY-231 Intermittent 504s on payment-service during checkout High Investigating
SEC-019 Update JWT dependency to 2.9.x to address CVE-2026-1234 High Blocked
OPS-332 Flaky integration test test_refund_idempotency Low In Progress
INF-104 Upgrade NGINX Ingress Controller to 1.11 Medium Planned
DATA-088 Backfill missing payment events for March 29-31 Medium In Review

Changelog

Changes

git rev-parse --short HEAD
kubectl rollout status deploy/payment-service -n prod
Made with ShipPage ยท HTML in. URL out.