Thursday, September 19, 2019

Notes from the Field: Debug is for Debugging

I often hear system administrators say that logging is both the best and worst part of IT. They rely on the information provided by logged data, but dislike the overhead that typically comes as part of the package: you have to store the logs somewhere, and with enough systems and enough time, you end up with more logs than you know what to do with.

In many cases, sysadmins will approach this problem by tuning the logs. This is most often accomplished by changing the logging levels from their default (this is typically INFO) to something a little more discerning, like WARN. (If you're feeling a little lost at this point, read through this documentation on Log4j Custom Log Levels). The impact of this change is that the system generating the logs will restrict the events that it sends across the wire (in the case of a remote syslog solution) or to a local file (in the case of application logs). In both cases, you end up conserving resources, either network or disk. You'll still get diagnostic information that can help understand why a system is malfunctioning, just not at the detailed level provided by INFO.

But if you take a step in the other direction, you can inflict serious damage to your application or server's performance. How, you ask? By enabling DEBUG level logging.

Often, developers will set logging to debug when they're, you know, debugging. This is a pretty logical set up, and is almost exclusive to the world of non-production. Set to debug, test, break stuff, fix it, test, and turn off debug. It's cool, that's why debug facilities are built into application frameworks.

However, a single "log-level=debug" in a configuration file and throw your whole systems into chaos.

Take Apache Tomcat, for example. When you enable debug logging for a production application, well, just read this statement from Apache's documentation:
When enabling debug logging it is recommended that it is enabled for the narrowest possible scope as debug logging can generate large amounts of information.
No, you don't need all this information bruh.
What they don't tell you is exactly what that means. It means a single site with fewer than 100 users
can generate 4GB of log data in a matter of hours. The I/O alone required to write to a log file that garrulous can bring a server to its knees, and can slow down Apache's ability to serve even static pages. And since many applications, especially those from the open-source community built to run on Windows, log to text files, you can be sure that if you're accustomed to using notepad to review log files, you're going to have a bad time with a 4GB .txt file with constant I/O.

(Incidentally, if you need a way to pull data out of a large file like that, PowerShell is your answer. You can open up a PS shell and do a get-content -file c:\temp\bigasslogfile.txt -tail 1000 | out-file -path c:\temp\smallasslogfile.txt and you're good to go. You'll get a smaller file that notepad can handle with the 1000 most recent lines, which is typically enough to get a sense for what's going on.)

Anyway, this is a reminder that debugging is for debugging. Don't do it in production, and if you must, be very careful not to overwhelm your network or disk subsystem with excessive logging.