To manage the increasingly largerand more complex applications onequally large and complex clouds,mobile edge clouds, and other emerg-ing infrastructures, we increasinglyrely on automation and (semi-)autonomous systems. This is donenot only to improve their availabil-ity, reliability, and performance butalso to reduce cost and manual la-bor. The operation is largely rely-ing on planning and feedback provid-ing information for the many knobsused to steer the systems, knobs being auto-scalers, service differentiators, schedulers, orchestrators,power managers, etc. A valuable complement to planning, feed-back, and steering is the ability to detect when things go wrong,that is, to automatically detect anomalies as well as to identifytheir-root causes and suitable actions to rectify them. Althoughanomaly detection is being performed in many completely differ-ent areas, the problem is still open and challenging as its solutionsrequire equal amount of machine learning and domain knowledge.In this presentation we will start by setting the scene with someillustrative key management tasks and solutions before turningattention to anomaly detection and resolution, where we will mapthe field and show some recent progress, in particular addressingperformance, security, and functional anomalies, Presenter: Erik Elmroth
- Tags
-