CIOs and IT managers who are responsible for systems should follow these best
practices as they develop and maintain systems. These aren’t just rules to
follow — they’re rules that should be ingrained in your daily IT life. They’re
core concepts to many in the IT field.



  • Always have a way to get back to where you
    started

    Whenever possible, provide a way to get back to
    the original problem, whether that means imaging a failing disk before working
    on it, backing up an entire directory structure in case there are files you
    aren’t aware of that you’ll need later, or simply pulling one disk of a RAID1
    array on a physical server before you mess with the operating system.


  • Document where you are and where you are
    going

    Documenting a problem and a resolution when you’re
    in the middle of a chaotic situation may not be practical. That said, always
    hold a postmortem on the problem when the dust settles and go over the steps
    taken and the path to the solution. Write it down. Keep it safe somewhere,
    preferably on a wiki hosted on your intranet — and backed up to several other
    places.

  • Do not modify the interface on a server or network device
    you’re currently connected to

    alt="IT Infrastructure Strategy Charter ISO" vspace=3 align=right
    src="http://it-toolkits.com/images/IT_Infrastructure_Strategy_Charter.jpg">If
    you do something wrong what you have working will fail.  While this may
    sound like a no-brainer, it’s amazing how often someone modifies the
    properties of the network interface they’re using to communicate with the
    device. Rather, configure a secondary IP on an interface if you have to —
    connect through another device or subnet, serial console, KVM, whatever. This
    is especially true if the device is in a remote office without on-site IT
    staff.

  • There’s no magic in IT, but there is
    luck

    As Thomas Jefferson said, “I find that the harder I
    work, the more luck I seem to have.” The same is true in IT. The more time you
    spend researching aspects of your infrastructure, noting certain operating
    conditions of routers, switches, servers, and whatnot, the more in tune with
    your infrastructure you become. That homework allows you to sniff out problems
    in their very early stages and to move far quicker when the game’s afoot.
    Also, there are plenty of ways to manufacture luck in IT. For example, use
    tools that automate network device configuration backups; that way, when a
    switch loses its mind, you can have it back up in minutes, not hours.

  • Make a backup of every configuration file before you modify
    it

    Before you go mucking around with sensitive
    configurations, save a copy.  In a pinch, reverting to prior known-good
    status is as simple as copying the file back and restarting the service. This
    generally isn’t possible on Windows, due to the registry and Windows’
    proclivity to complicate simple concepts. Even so, you can sometimes export a
    portion of the registry before messing with it so that it can be reapplied if
    all hell breaks loose. Note: As with all matters regarding the Windows
    registry, you take the life of the server in your hands when you make
    changes.

  • Monitor, monitor, monitor

    An ounce of
    prevention is worth a month of work weekends. You should monitor every aspect
    of your operation, beginning with the temperatures of the room, the racks, and
    the servers — plus, server process checks, uptime checks, ad infinitum. You
    should also implement centralized syslogging of all network devices, as well
    as set up trending and graphing tools to monitor bandwidth utilization,
    temperatures, disk partition use, and other datapoints. All of these monitors
    should alert you by any means necessary when they exceed reasonable
    thresholds.

Resources
Post Your Resume to 65+ Job Sites
Resume Service

Post to Twitter Tweet This Post