Here is an anecdote. I am sure some of you have had a similar experience.
The takeaway lesson of the above story is twofold: (1) logs are not just for humans to read and (2) logs change. (1) Logs are not just for humans. As Paul Querna points out, the primary consumer of logs are shifting from humans to computers. This means log formats should have a well-defined structure that can be parsed easily and robustly. (2) Logs change. If the logs never changed, writing a custom parser might not be too terrible. The engineer would write it once and be done. But in reality, logs change. Every time you add a feature, you start logging more data, and as you add more data, the printf-style format inevitably changes. This implies that the custom parser has to be updated constantly, consuming valuable development time.
Here is a suggestion: Start logging your data as JSON. JSON has a couple of advantages over other "structures".
We've already talked about Fluentd in this blog, so I won't bother you with the details. It's a logging daemon that can talk to a variety of services (ex: MongoDB, Scribe, etc.) One of the key features of Fluentd is that everything is logged as JSON. Here is a little code snippet that logs data to Fluentd from Ruby.
require 'fluent-logger'
# some code in between
log = Fluent::Logger::FluentLogger.new(nil, :host => 'localhost', :port=>24224)
log.post('myapp.access', {"user-agent" => user_agent})
Now, suppose you wanted to start logging the referrer URL in addition to user agent. You just need to update the Ruby hash that corresponds to JSON.
require 'fluent-logger'
# some code in between
log = Fluent::Logger::FluentLogger.new(nil, :host => 'localhost', :port=>24224)
log.post('myapp.access', {"user-agent" => user_agent, "referrer" => referrer_url}) # Added a field!
That's the only change you need to make. All the existing scripts work as before, since all we did was adding a new field to the existing JSON. In contrast, imagine you were logging the same data in a printf-inspired format. Your code initially looks like this:
log = CustomLogger.new
#some code in between
log.post("web.access", "user-agent: #{user_agent} blah blah")
When you decide to log the referrer URL, you update it to:
log = CustomLogger.new
#some code in between
log.post("web.access", "user-agent: #{user_agent} blah blah referrer: #{referrer_url}")
Now, most likely your old parser is broken, and you have to go and update your regex and whatnot. We are biased towards Fluentd because we wrote it ourselves. But regardless of what software/framework you choose for logging, you should start logging everything as JSON right away.