Puppet has become my go to system management tool in no small part because it is the tool that the operations group at $DAYJOB has standardized on for our production infrastructure management. It took quite a while for me to get the hang of how Puppet does what it does, but today I’d say I’m a fairly decent Puppet programmer. Every once in a while however I stumble on something new and surprising.
A couple of weeks ago I got an interesting bug report from a user about
a collection of Puppet manifests I help manage. The bug was that his testing
server was pegged at 99% CPU utilization for multiple minutes during each
puppet agent run. The bug reporter did a great job of investigating and had
also found that
strace showed a repetitive stream of
stat() calls while
the process was hogging the CPU.
This also turned out to the be the great kind of bug that was reproducible.
The first testing server I tried the steps from the bug report on showed the
exact same symptoms. I grabbed some very verbose logs by turning on the
--debug logging in
puppet agent and logging all of the system calls with
strace at the same time:
1 2 3