Anyone done a NTP client on lwip?

29 Dec 2009

Hello All,

I asked this question before in one of the HTTP threads, but I thought I'd make this a topic on its own.  I think that this would be a useful tool for synchronizing the RTC in the LPC1768, but I am a rookie at this and have no knowledge on where to start. 

Features would include:

1. direct RTC update from receiving the NTP time packet.

2. optionally support more than one NTP server.

3. the IP address of the NTP server(s) stored in a local file.

4. timeout error recovery should the NTP server(s) not be available.  No RTC update should the NTP access fail.

5. allow for a UTC offset, also stored in the local file.

6. DST correction?

did I miss anything?

...kevin

31 Dec 2009 . Edited: 31 Dec 2009

Good list of requirements.

Coincidentally, I was looking for a similar set of functions, and found out that LWIP library has a contrib code for SNTP. It is a simplified NTP which is precise only within 30ms or so. It should be enough for pretty much any MBED application that I can think of.

I created a test project with LWIP code (not library, just the files) and added sntp files from contrib directory to it. With some #define's and one pointer typecast fix in the sntp source, it now works and receives NTP timestamp.

Looking through the code, it looks like your items # 1 through 4 are already there. Item 5 and 6 can be implemented in a callback function that is invoked when NTP response packet arrives. So I think this solution is the way to go, especially if the HTTPClient or HTTPServer is already included in your project.

However, there is a caveat. LWIP is compiled in MBED with NO_SYS = 1, which defines as empty timeout functions sys_timeout/sys_untimeout. I think these can be plugged in with mbed's timeout. Without it, sntp does only one NTP query, so it is not yet ready for primetime.

UPDATE:
I plugged (quick and dirty) these timeout/untimeout functions, and SNTP now runs and periodically updates the RTC. If there is an interest, I can prepare and publish the project. I am most interested in passing it to someone on the mbed team so it can be properly maintained.

01 Jan 2010 . Edited: 12 Jan 2010

Follow the progress on SNTP Client

04 Jan 2010

Wow Ilya,

This looks great.  I've complied but haven't been able to run it yet because of company internal firewalls.  Can the list of NTP severs contain a mix of named NTP servers and IP addresses?  I know IP addresses of internal NTP servers here.

Excellent work,

...kevin

04 Jan 2010

This is nice Ilya,

Mixing NTP IP addresses with named servers does work. 

I did note that only a maximum of two named server DNS requests go out and it doesn't matter where a valid or invalid IP addressed NTP server is in the file order.

...kevin

05 Jan 2010

Ilya,

After running the NTP client over night, I found that all mbed TCP/IP traffic dies, including Pings.  I had run my mbed over night with the default values and it crashed after the 70th NTP attempt (nearly 12 hours).  This morning I ran it again, but this time with 1 minute updates.  It too crashed after the 70th attempt.  When the failure mode begins, it starts looking for other NTP servers that it cannot access (because of my internal firewall). - see attached.

...kevin

05 Jan 2010 . Edited: 06 Jan 2010

Interesting.

For one, I will need to go through the sntp.c code to see what happens when DNS fails - it may have no timer use and hence an endless loop that locks everything up. If there is indeed a problem, it would be LWIP/sntp problem, and should be patched in LWIP's SVN, otherwise the branch will be really hard to maintain. I already noticed that LWIP has been updated, and sntp.c was bumped to rev 1.19.

For another, NTP breaking exactly after 70 responses seems to be too consistent - could it be an NTP server configuration? In your wireshark screenshot the last NTP packet is from the server. It would be interesting to observe that last packet - is it a KOD (kiss of death)? There is a debugging flag that can be enabled in lwipopts.h for sntp.c to dump more that should help to diagnose the problem.

06 Jan 2010

Do I turn on all of this in lwipopts.h ?

//#define LWIP_DEBUG               1
//#define LWIP_DBG_TYPES_ON     ~0x0
//#define LWIP_DBG_MIN_LEVEL       0
//#define MEM_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)
//#define TCP_INPUT_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)
//#define TCP_OUTPUT_DEBUG    (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)
//#define NETIF_DEBUG     (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)
//#define DHCP_DEBUG      (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)
//#define IP_DEBUG        (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)
//#define TCP_DEBUG       (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)
//#define TCP_CWND_DEBUG    (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)

 

I also noticed the hostname is set in lwipopts.h .  Is this actually looked at or does the hostname in main.ccp override this?

#ifndef HOSTNAME
#define HOSTNAME "mbed-c3p0"
#endif

 

06 Jan 2010

Kevin,

 

These are LWIP's debug options. Check out SNTP_DEBUG (I used LWIP_DBG_ON), but you also can add some level to it, like this:

 

 

#define SNTP_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)

 

I think HOSTNAME define is only for the default case when main.cpp does not specify it. Remember, most of the lwipopts.h file is for the LWIP. I added only a small SNTP section, since this is the file designed for customizations of LWIP and SNTP (and other LWIP parts). If you mess with non-SNTP part, you in fact go into innards of LWIP - and it may break completely.

 

--

Ilya

07 Jan 2010

Ilya,

I had enabled a few of the debug messages.  It looks like SNTP is running out of memory buffers.  I cannot attach a Wireshark capture or TeraTerm capture.  I've changed the NTP updates to 15 seconds.  The failure starts showing up at NTP update 68 and it crashed completely at 70. 

Here are the last few lines of the TeraTerm session, just before the crash:

sntp_send_request: Sending request to server

SNTP settime() #67 seconds=1262794411 TZ=-8.0 datetime=Wednesday 01/06/2010 16:13:31

sntp_process: Thu Jan 7 00:13:31 2010

sntp_recv: Scheduled next time request: 15000 ms

sntp_send_request: Sending request to server

SNTP settime() #68 seconds=1262794426 TZ=-8.0 datetime=Wednesday 01/06/2010 16:13:46

sntp_process: Thu Jan 7 00:13:46 2010

sntp_recv: Scheduled next time request: 15000 ms

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

sntp_send_request: Sending request to server

SNTP settime() #69 seconds=1262794441 TZ=-8.0 datetime=Wednesday 01/06/2010 16:14:01

sntp_process: Thu Jan 7 00:14:01 2010

sntp_recv: Scheduled next time request: 15000 ms

mem_malloc: could not allocate 700 bytes

sntp_send_request: Sending request to server

SNTP settime() #70 seconds=1262794456 TZ=-8.0 datetime=Wednesday 01/06/2010 16:14:16

sntp_process: Thu Jan 7 00:14:16 2010

sntp_recv: Scheduled next time request: 15000 ms

mem_malloc: could not allocate 700 bytes

sntp_send_request: Sending request to server

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

sntp_try_next_server: Sending request to server 2

sntp_request: Waiting for server address to be resolved.

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

mem_malloc: could not allocate 700 bytes

 

Here is what I turned on in lwipopts.h:

#ifndef HOSTNAME

#define HOSTNAME "mbed-c3p0"

#endif

#define LWIP_DEBUG 1 //*kb*

#define LWIP_DBG_TYPES_ON ~0x0 //*kb*

#define LWIP_DBG_MIN_LEVEL 0 //*kb*

#define MEM_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING) //*kb*

#define TCP_INPUT_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING) //*kb*

#define TCP_OUTPUT_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING) //*kb*

#define NETIF_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING) //*kb*

//#define DHCP_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)

//#define IP_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)

//#define TCP_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)

//#define TCP_CWND_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING)

//------------------------------------------------------------------------------

// [iva2k] Defines for LWIP/sntp, using SNTPClient wrapper

......

#define SNTP_DEBUG (LWIP_DBG_ON | LWIP_DBG_LEVEL_WARNING) // *kb*

 

...kevin

07 Jan 2010 . Edited: 07 Jan 2010

Kevin,

Good debugging! Apparently sntp runs out of memory. It looks like there is a memory leak. I know that SNTPClient does not allocate any memory, so it is either LWIP or sntp. I will have to look for unfreed memory, but without real debugger it could be a challenge. I can confirm that I was able to reproduce sntp locking up on the order of 68..70th request.

I glanced through the code... error happens in mem_malloc(). I have a sense that it is pbuf_malloc() in sntp.c/sntp_send_request() that calls mem_malloc() for PBUF_RAM type. Oh my! there is no balancing pbuf_free() there!

There is another pbuf_free() which is called from sntp_recv(), but it is for incoming buffer. Fix is one liner:

 

static void
sntp_send_request(struct ip_addr *server_addr)
{
  struct pbuf* p;
  p = pbuf_alloc(PBUF_TRANSPORT, SNTP_MSG_LEN, PBUF_RAM);
  if (p != NULL) {
    struct sntp_msg *sntpmsg = (struct sntp_msg *)p->payload;
    LWIP_DEBUGF(SNTP_DEBUG_STATE, ("sntp_send_request: Sending request to server\n"));
    /* initialize request message */
    sntp_initialize_request(sntpmsg);
    /* send request */
    udp_sendto(sntp_pcb, p, server_addr, SNTP_PORT);
pbuf_free(p);    // [iva2k] fixing memory leak
    /* set up receive timeout: try next server or retry on timeout */
    sys_timeout((u32_t)SNTP_RECV_TIMEOUT, sntp_try_next_server, NULL);
#if SNTP_CHECK_RESPONSE >= 1
    /* save server address to verify it in sntp_recv */ 
    sntp_last_server_address.addr = server_addr->addr;
#endif /* SNTP_CHECK_RESPONSE >= 1 */
  } else {
    LWIP_DEBUGF(SNTP_DEBUG_SERIOUS, ("sntp_send_request: Out of memory, trying again in %"U32_F" ms\n",
      (u32_t)SNTP_RETRY_TIMEOUT));
    /* out of memory: set up a timer to send a retry */
    sys_timeout((u32_t)SNTP_RETRY_TIMEOUT, sntp_request, NULL);
  }
}

 

I will do some testing overnight.

 

--

Ilya

07 Jan 2010

Llya - thanks,

A couple more notes:

1. It seems more of a buffer freeing problem rather than a memory leak - in my hardware opinion :-)

2. When pinging and doing multiple http accesses to the mbed, the NTP number gets to about 50 before the mbed goes deaf.  I can still display the RTC after the crash, so it's not completely dead.

If there is anything else I can do to help or tests to run, let me know.

 

...kevin

07 Jan 2010

Kevin,

 

My overnight test is running ok this morning. Current request #1959 (every 15 seconds). My fix is good - Just add one line "pbuf_free(p);" to sntp_send_request() in sntp.c as I posted above. Glad it is not in my code ;-)

You can stress-test the fix too of course, if you like.

To your questions:

1. not freeing an allocated buffer is indeed called a memory leak.
http://en.wikipedia.org/wiki/Memory_leak
"memory leak or leakage in computer science is a particular type of memory consumption by a computer program where the program is unable to release memory it has acquired."

2. No surprise here. The problem you found happens inside LWIP memory manager, so any TCP/IP interaction is affected/intertwined with sntp problems. Having some traffic prior to memory exhaustion may fragment LWIP's memory thus providing smaller buffers that remain available after sntp crashes.

--

Ilya

08 Jan 2010

Hi Ilya,

Your one-liner goes appear to fix the problem.  Good job!!

One another note, there are times when the mbed starts-up, but the NTP client never starts.  Other times it does.  I'll do more testing over the weekend.

...kevin

11 Jan 2010

Here's a low priority bug report.

On my home "public" network (AP+switch to Home_Router to ATT), using NTP refresh times greater than 35 minutes, the NTP client stops updating after 2-5 times.  If less than 35 minutes, it seems to go on forever.

I haven't tried this yet in my company "internal" network.

...kevin

 

11 Jan 2010 . Edited: 11 Jan 2010

Kevin,

 

Thanks for chasing this! Do you have a debug log with the problem from sntp? I would recommend enabling all SNTP debug options and trying to capture what happens.

On a second thought, it seems strange that timeout value has something to do with it. Maybe there is an overflow in MBED/timers somewhere... quick calculation: 35 minutes is 2100,000,000 microseconds, which is hexadecimal 0x7D2B7500. Double that to get FA56EA00. Hmm... it is very close to overflowing a 32-bit variable. For determining that I would test timers without SNTP. Just run a ticker (or timeout with self-rescheduling, to be more relevant to SNTPClient) for that problematic interval of 35 minutes and see if it gets problems on its own.

 

Simon,

What is the maximum time interval for Timeout.attach() and Timeout.attach_us() ?

 

--

Ilya

12 Jan 2010

Simon, you might be on to something,

If the calculating number is a singed integer +- 2,147,483,648, then it would explain why 35 minutes (2,100,000,000) would work and 40 minutes (2,400,000,000) wouldn't since anything over 2,147,483,648 is a negative number. 

Maybe "UpdateDelay=" should be in seconds rather than in milliseconds.

...kevin

12 Jan 2010

Kevin,

I understood first that SNTPClient was doing 2-5 updates when timeout was above 35 minutes. Now I'm getting different impression that it was not doing more than one update?

Anyway, timeouts in milliseconds was not a good choice that I picked from LWIP/sntp.c. And there is no technical reason to use milliseconds. Seconds would not change the implementation much, and make much more sense. I've updated the program.

pub_iva2k_ethsntp

For more details see the notebook: SNTP Client

12 Jan 2010

Ilya,

All you did was modify sntp.c in your update, correct?

12 Jan 2010

Kevin,

No. Actually, sntp.c did not change at all in rev.3, only in rev.2. I recommend using diff tool on zipped directories (I like BeyondCompare - it allows opening two zip files and comparing all files in it). I may miss something trying to list what I modified, but will try: lwipopts.h, SNTPClient.cpp, SNTPClient.h.

--

Ilya

13 Jan 2010

Hi Ilya,

I cut and pasted the newer files from your new ZIP file and got the following compile errors:

" Undefined symbol gSntpRecvTimeout_s (referred from sntp.o). (EL6218E)" in file "/"

" Undefined symbol gSntpUpdateDelay_s (referred from sntp.o). (EL6218E)" in file "/"

...kevin

13 Jan 2010 . Edited: 13 Jan 2010

Kevin,

gSntpRecvTimeout_s and gSntpUpdateDelay_s are new variables, they are defined in SNTPClient.cpp.

Can you try importing the whole program into the compiler first? It compiles and runs for me.

--

Ilya

13 Jan 2010

Thanks Ilya,

I got it!!  I missed STNPClient.cpp and .h - my bad.  I didn't want to import the whole thing because of my major changes to main.ccp, myrpc.ccp and myrpc.h

...kevin

14 Jan 2010

A new minor bug:

When I set the update to 9000 (2.5 hrs), the actual update occurs at each 1hr 11min 35sec increment.  A value of 60 (1 minute) updates as expected, 60sec.

...kevin

14 Jan 2010 . Edited: 14 Jan 2010

I confirm that. When asked 9000 seconds my test did 1:12:25. Calculating it to seconds: 4345 seconds. Makes not much sense... but in microseconds: 4345000000 = 0x102FB7040. Are we seeing a 32-bit overflow again? If its the case, this time I bet it is in the mbed timeout library.

Also, when setting it to 5000, the test lasts 1:11:35 --> 4295 -> 0x100007FC0. It is only 32ms longer than 0x100000000 microseconds, which can easily be explained by the latency of SNTP and some code inbetween. Looks like mbed:timeout which takes float, still limits it to 32 bits of microseconds.

15 Jan 2010

To further expand on the problem, the original problem still exists.

A timeout of 2100 (35min) works.  However, 2400 (40min) stops working after 2-3 updates

15 Jan 2010 . Edited: 15 Jan 2010

It looks like any UpdateDelay value above 2147 causes mbed library to overflow and all other timeouts/tickers to freeze. There is no permanent solution yet, only to limit timeouts to 2147 and below. We had a good discussion with Simon - hopefully he will have RTCTimeout() etc. implemented soon, then I can update the code to work for all extra long timeout values.