This ‘Chaos Monkey’ was invented by Netflix. It randomly disables production instances to make sure that Netflix can survive this common type of failure without any customer impact. The Monkey runs during business hours, engineers are standing by to address any problems, learn about the remaining weaknesses of the system and build automatic recovery mechanisms to deal with them. The engineers love it because it challenges them to the max and they are paged less often at night and on weekends.
Inspired by the success of the Chaos Monkey, Netflix introduced many other monkeys to induce different kinds of failures, like the Latency Monkey (inducing artificial delays), the Conformity Monkey (shuts down instances that don’t adhere to best practices) and the Security Monkey (terminates instances with security violations or vulnerabilities)
Netflix has proven that this works because they built the Chaos Gorilla after they became immune to the Monkey.
And later they introduced Chaos Kong because they were looking for more extreme cases of failure. It made them immune to unavailability of an entire AWS Region.
Nowadays, the whole simian army can be downloaded as open source: https://github.com/Netflix/chaosmonkey
Other examples that contribute to antifragility are: