Sunday, April 5, 2009

How to check NTP is working fine

«The Network Time Protocol (NTP) is a protocol for synchronizing the clocks of computer systems over packet-switched, variable-latency data networks» as Wikipedia says.

Recently, I've got serious issue in production machine — system clock goes crazy, i saw almost 3 hours in past. Even one minute can be important on production environment where you have database and other services running. So I wrapped up here all information i found for future cases ;-)

Let's take a look on default step (first step) you need to do to check out how NTP is works. You should run ntpq —p or ntpq —pn.

alexey@ibm:~$ ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
*212.235.102.46 129.6.15.28 2 u 109 1024 377 17.743 -40.749 0.296
Let's see review each column:

  1. An asterisk * in the first column marks the reference time source which is currently preferred by the NTP daemon, the + character marks high quality candidates for the reference time which could be used if the currently selected reference time source should become unavailable. The x indicates a false ticker - a server whose time is just plain wrong
  2. The column remote displays the IP address or the host name of the reference time source, where LOCAL refers to the local clock.
  3. The refid shows the type of the reference clock, where e.g. LOCAL or LCL refers to the local clockagain, .DCFa. refers to a standard DCF77 time source, and .PPS. indicates that the reference clock is disciplined by a hardware pulse-per-second signal. Other identifiers are possible, depending on the type of the reference clock.
  4. The column st reflects the stratum number of the reference time source. In the example above, the local clock has stratum 12, the remote time server at 172.16.3.103 has stratum 1 which is the best you can see across the network, and the local radio clock has stratum 0, so the radio clock is currently being preferred.
  5. t (type?) - u: unicast, b: broadcast, l: local
  6. Every time a when count reaches the poll number in the same line, the NTP daemon queries the time from the corresponding time source and resets the when count to 0. The query results of each polling cycle are filtered and used as a measure for the clock's quality and reachability.
  7. The column reach shows if a reference time source could be reached at the last polling intervals, i.e. data could be read from the reference time source, and the reference time source was synchronized. The value must be interpreted as an 8 bit shift register whose contents is displayed as octal values. If the NTP daemon has just started, the value is 0. Each time a query was successful a '1' is shifted in from the right, so after the daemon has been started the sequence of reach numbers 0, 1, 3, 7, 17, 37, 77, 177, 377. The maximum value 377 means that the eight last queries were completed successfully. The NTP daemon must have reached a reference time source several times (reach not 0) before it selects a preferred time source and puts an asterisk in the first column.
  8. The delay value is derived from the roundtrip time of the queries.
  9. The offset value shows the difference between the reference time and the system clock.
  10. The jitter value indicates the magnitude of jitter between several time queries.

Continue reading here