I used to cook, now I code
Walking the fine line between code and cooking since 2003
9 Things You Should Be Doing With Your Server, But Probably Aren't
Jul 22nd, 2010 at 01:22PM ago
Linux distributions today are incredibly easy to setup and get started. Whether for a blog, web app, or any other reason, installing the necessary services and getting things running can often be accomplished in a few hours even by an inexperienced developer. I can't praise the standardized and well written guides from Slicehost enough for the help they provide in this regard.
But the ease of getting started unfortunately belies the ongoing maintenance that is needed to keep a system stable and in good working order for the long term. A single server can often run without human interaction for a long time. But the success of doing so is tied directly to all the other bits and pieces that must be configured ahead of time.
The worst part about every item in this list is that you can probably get away without them, maybe even for months or years. But missing one of these items can come back to haunt you at the worst time: like during a traffic spike, hard drive crash, or hacking attempt.
Configuration Management
I start with Configuration Management because it's a bit different from the rest of the items on this list. This one is not as important for a single healthy server, but becomes critical when you have many systems. Configuration management tools, such as Puppet or Chef, allow you to write 'recipes' for how a server should be put together. These recipes are run on each server to produce a consistent and easily reproduced setup. This provides the ability to instantly boot a new copy of any system and can give enormous freedom to your setup.
Configuration management does, however, add a significant amount of initial complexity to server setup: so it's not for the faint of heart. But even with just two or three servers, the benefits are immense.
Backups
This one is pretty obvious and most sysadmins at least make an attempt in this area. If you don't have a solid backup strategy, you need to fix it now. Waiting even a single day can be disastrous. And make sure you do them right, because backups are prone to being done incorrectly (see the JournalSpace disaster. At-home backups have made great strides with the likes of Mozy, Carbonite, Backblaze, etc, but similar Linux solutions are far behind in terms of sophistication. Rsync, tar, and similar scripted tools are still a popular and viable option, but care must be taken to accommodate special cases like MySQL databases. Everyone's backup needs are different so whatever option you choose be sure to investigate its potential shortfalls. Your chosen solution should:
- Run regularly
- Keep several rounds of backups
- Automatically drop old backups
- Store the backups off-site from your actual system
- Remain as secure as your original data
- Incorporate all critical data, critical configuration files (anything you might need to get a replacement server up-and-running), and potentially recent logs
Testing your backups
Hot on the heels of having a backup plan is testing it. This means regularly checking that the backups are still being made, that the files produced are valid and not corrupt, and that they contain all the data you need. A good rule of thumb is that if your backups rotate out every 30 days, then you should be re-checking them just as often. Automated tools can help a little here (automatically checking that the backup files are recent, of reasonable size, and possibly valid). However, nothing is a substitute for human eyes here...otherwise those eyes will be crying when you discover you don't have the backups you thought you did.
Log Rotation
Ubuntu, RedHat, and the other major distributions have gotten a lot better in recent years at having logrotate running and configured for any packages they provide. So your apache & mysql logs are likely to be properly rotated (maybe not the way you want them, but the defaults are fairly sane). However anything 'extra' you add, like Rails apps, needs to have its own logrotate entry set up. Missing this step has been the cause of innumerable server failures as the hard drives fill up at the most inopportune time. Of course, it's always the logs you didn't even know you had that wind up being a problem. Resource monitoring is critical for this case.
Resource Monitoring
Tracking CPU, memory use, disk space, bandwidth, etc provides extremely valuable insight into the state of your system(s). As traffic increases, you can compare your increased memory or IO usage in order to plan your scaling well ahead of time. RRDTool/Munin, ServerDensity, and Cloudkick are all great options for looking at these metrics over time. If your chosen tool includes alerting to unforeseen changes (runaway processes, full drives, etc) then you'll be one step ahead of any potential problems.
Process Monitoring
Keeping your Apache, MySQL, and similar processes running is probably critical for your site. There are several great tools, such as Monit and God, that help to ensure your processes are working as they should. By checking responses, open ports, or process ids these tools can restart a dead service or even kill a runaway process before it takes down your whole system. Configuring the rules for such things is notoriously difficult, but when done properly has the potential to save a lot of 3am downtime.
Hardening
Hardening encompasses a lot of different actions that need to be taken to properly secure a stock system. Even many simple actions are often missed. Do you really know what every one of those processes running does? What extra ports and services are open on your system? Are the proper PAM modules loaded for secure authentication? Again, RedHat and Ubuntu have blazed the trails recently in putting out secure stock systems and ensuring that the most common packages follow proper security protocols. But that doesn't mean you can skip this critical step.
Security Updates
Security updates are very easy to perform on an apt or RPM based system. The catch of course is it's difficult to know if an upgraded package will suddenly cause some sort of error in your stack. Having an identically configured staging server is really the only good way to know for sure how the updates will affect your system. Thankfully, interference from security updates is extremely rare. The risk of a little downtime while fixing an update's compatibility issue is much smaller than the risk from having a known security hole exploited on your system. So don't let "not knowing" stop you from performing the proper upgrades. Finally, not every vulnerability gets a security patch right away. Monitoring the CVE dictionary for applicable alerts allows you to be proactive in keeping your systems secure before a patch is available. This is another area where there's really no replacement for a good 'ole set of eyeballs to keep everything running smoothly and up-to-date.
Log monitoring / Security Scanning / Intrusion detection
Of all the items on the list, these are probably done the least. They're easily forgotten and you won't miss them until your system has been compromised. Constant scanning for unusual activity, hacking attempts, and other foul play is incredibly important to help prevent and mitigate attacks.
Summary
This certainly isn't an exhaustive list, but it's quite extensive and many developers, devops, and sysadmins simply do not have the time, interest, or knowledge to handle them. Even worse, many development projects are turned over to a customer who has no in-house staff capable of addressing these items once the technical team moves on to another project.
Not everyone needs a service to accomplish these tasks. There are many devops and sysadmins that enjoy handling these types of tasks and have the knowledge and experience to do it themselves. But if dealing with this laundry list isn't your cup of tea, we've joined together at RoundHouse in order to provide an affordable option for server management. We do all these tasks (and more) so that our customers can focus on their end product and not on day-to-day details of keeping their systems healthy. If you'd like to learn more, please feel free to contact us.
Written by: Drew - Head ConductorShareThis
Email this • Save to del.icio.us (263 saves, tagged: sysadmin server linux) • Digg This! (7 Diggs) • Share on Facebook • Stumble It! • Add to Mixx! • Discuss on Newsvine
Jul 22nd, 2010 at 01:22PM