Autopilot: Workload Autoscaling at Google Scale – Google Research
DRANK

In many public and private Cloud systems, users need to specify a limit for the amount of resources (CPU cores and RAM) to provision for their workloads. A job that exceeds its limits might be throttled or killed, resulting in delaying or dropping end-user requests, so human operators naturally err on the side of caution and request a larger limit than the job needs. At scale, this results in massive aggregate resource wastage.To address this, Google uses Autopilot to configure resources automatically, adjusting both the number of concurrent tasks in a job (horizontal scaling) and the CPU/memory limits for individual tasks (vertical scaling). Autopilot walks the same fine line as human operators: its primary goal is to reduce slack – the difference between the limit and the actual resource usage – while minimizing the risk that a task is killed with an out-of-memory (OOM) error or its performance degraded because of CPU throttling. Autopilot uses machine learning algorithms applied …

research.google
Related Topics:
1 comments
  • Kubernetes Podcastの最新エピソードで紹介されていたGoogleのワークロードの自動スケーリングに関する論文。時間見つけて読むぞ~~