#!/usr/local/bin/tops -p -s /usr/local/tops/sys -u /opt/mytops/usr/
{  File tops_rtc  January 2008

   Copyright (C) 2008-2010  Dale R. Williamson

   Real time collection from a remote machine

   This script runs as a daemon connected to a remote machine that is 
   collecting data.  To become connected, the daemon causes a comple-
   mentary daemon on the remote machine to connect to a server here, 
   and the connection is kept open to receive an asynchronous flow of 
   data in real time.

   To make a permanent connection to the remote machine, this script
   runs msgPutIP() to put an "RTC_CONNECT" message on the remote's in-
   terprocess communication system (file dog.v).

   Message RTC_CONNECT is the agreed-to message that initiates the con-
   nection from the complementary daemon on the remote machine to this 
   one.  The RTC_CONNECT message placed on the remote machine contains 
   this machine's IP address and this daemon's listening port within a 
   phrase that fires the remote daemon's word server_connect() when the
   remote daemon runs it.

   The remote daemon will be polling for such a message (running word 
   msgPoll(), looking for message RTC_CONNECT) and will shortly find it
   and make a connection to the server here using its word server_con-
   nect().  This makes the permanent connection through which real time
   data will flow.

   Script tops_rtcmon is an example of what the remote daemon server
   is running, and it contains word server_connect() mentioned above.

   The Appendix below shows this daemon's log file during a period of
   connection problems with the remote server.

------------------------------------------------------------------------

   Contents:

      "tops_rtc" asciiload this " inline:" grepr reach dot

   inline: CLOSE ( --- ) \ this program will close
   inline: CONN ( --- ) \ leave a message on the remote to connect here
   inline: CONN_CLS (nS --- ) \ action when connection on S has closed
   inline: CONN_RECON ( --- ) \ reconnect to collector
   inline: CONN_SET (nS --- ) \ set up for collector just connected on S
   inline: current ( --- ) \ local files brought current to remote ones
   inline: current_for (nYYYMMDD --- ) \ local files current for YYYMMDD
   inline: extract_files (hT --- ) \ extract files contained in volume T
   inline: get_archive (hFiles --- hT) \ get Files archive from remote
   inline: local_files ( --- hT) \ list of all local files
   inline: local_for (nYYYMMDD --- hT) \ list of local files for YYYMMDD
   inline: rtc_close ( --- ) \ close connection to remote
   inline: rtc_files ( --- hT) \ list of all remote files
   inline: rtc_for (nYYYMMDD --- hT) \ list of remote files for YYYMMDD

------------------------------------------------------------------------

   Interactive testing.

   If running collector 0 test on plunger, start a dummy collector job 
   to which this test will connect (Mar 2010):
      [dale@plunger] /home/dale > tops_rtcmon

   To test this file interactively, start the program with an argv for 
   a running collector, then source this file and start a SERVER on 
   PORT.  

   This starts the program with argv for collector 1:
      [dale@plunger] /home/dale > tops -collect 1
               Tops 3.0.1
      Thu Apr 10 16:00:27 PDT 2008
   This is needed if testing with argv -collect 0 on plunger (Mar 2010):
      [tops@plunger] ready > yes "TESTING" book

      [tops@plunger] ready > "tops_rtc" source

      [tops@plunger] ready > "" PORT SERVER

      [tops@plunger] ready > clients
       Server local is listening on port 9879
       No clients

   Run CONN to make the connection with remote collector 1.  

   This shows msgPutIP connecting to the HTTP server at XXX.XX.48.191,
   leaving the message server_connect('YYY.YYY.244.138', 9879) and 
   closing the connection.  

      [tops@plunger] ready > (ntrace) CONN \ use ntrace for more output
       msgPutIP: connected to XXX.XX.48.191
       msgPutIP OK: "'YYY.YYY.244.138' 9879 server_connect"
                    "RTC_CONNECT" msgPut
       msgPutIP: connection closed

   A few moments later, XXX.XX.48.191 connects here to socket 9879 shown
   by the CONN_SET log entry at 16:01:05.  

      Thu Apr 10 16:01:05 PDT 2008 SERVER: XXX.XX.48.191 connect
       -512 bytes delta: memprobe socket 6 connect
       CONN_SET: connection on socket 6 Thu Apr 10 16:01:05 PDT 2008

   The clients list shows "S<C, XXX.XX.48.191" indicating that client
   (C) XXX.XX.48.191 has connected to the server (S) here (S<C):

      [tops@plunger] ready > clients
       Server local is listening on port 9879
       Clients:
        socket 6, port  4188, conn S<C, XXX.XX.48.191 LOGIN dale topsdog

   This multitasker task checks every 180 seconds that the connection 
   is still intact:

      [tops@plunger] ready > tasks
       Multitasker tasks:
        CONN_RECON,0:CODE__ alarm period 180 seconds; remaining 131

   Exiting closes the connection and causes CONN_CLS to run:

      [tops@plunger] ready > bye
       CONN_CLS: connection on 6 is closed
      59 keys
                Good-bye
      Thu Apr 10 16:04:03 PDT 2008
      [dale@plunger] /home/dale > 


   These lines in a script are handy to see what tops_rtcmon and
   tops_rtc jobs are running:

      #File rtc
      ps -Af --cols 512 | grep tops_rtc
      ps -Af --cols 512 | grep collect
}
\-----------------------------------------------------------------------

   CATMSG push no catmsg

\  Network setup.

   "IPlocal" "IP" macro                  \ this machine's IP address 
   def_port nextport intstr "PORT" macro \ this machine's listening port

{  Argv -collect equal to 0, 1 or 2 defines which collector will be 
   sending files and where they will be written; see usr/uboot.v for
   host-specific definitions of the following words:
      IPcol0, PORTcol0, epath0
      IPcol1, PORTcol1, epath1
      IPcol2, PORTcol2, epath2
}
   inline: collect0 ( --- ) \ set up collector 0
    \ If running on plunger, tops_rtccon will start topse to collect
    \ data like a remote collector, and this script is never run for
    \ collector 0.  

    \ But to test this word on plunger, start a dummy remote collector
    \ by running tops_rtcmon, and run yes "TESTING" book at the ready
    \ prompt before sourcing this file with "tops_rtc" source.

      "'TESTING' exists?" main 
      IF "TESTING" main ELSE no THEN "TESTING" book

      TESTING not IF host "plunger" = IF exit THEN THEN
 
      "HOME" env "tops_rtc0.log" catpath "LOG_RTC" mainbook
      "HOME" env "tops_rtc0mem.log" catpath "LOG_MEM" mainbook

    \ These macros defined when this word runs go into the main
    \ library for all to see:
      "epath0"   "DIR" macro     \ local dir receiving remote files
      "IPcol0"   "IPcol" macro   \ collector machine IP address
      "PORTcol0" "PORTcol" macro \ HTTP port on collector machine

      "RTC0_SERVER" dup msgGet drop \ remove old port number
      PORT intstr swap msgPut       \ put new port number

      TESTING \ override these macros:
      IF "IPloop" "IP"    macro \ this machine's IP address 
         "IPloop" "IPcol" macro \ collector machine IP address
      THEN
   end

   inline: collect1 ( --- ) \ set up collector 1
      "HOME" env "tops_rtc1.log" catpath "LOG_RTC" mainbook
      "HOME" env "tops_rtc1mem.log" catpath "LOG_MEM" mainbook

    \ These macros defined when this word runs go into the main 
    \ library for all to see:
      "epath1"   "DIR"     macro \ local dir receiving remote files
      "IPcol1"   "IPcol"   macro \ collector machine IP address
      "PORTcol1" "PORTcol" macro \ HTTP port on collector machine

      "RTC1_SERVER" dup msgGet drop \ remove old port number
      PORT intstr swap msgPut       \ put new port number
   end

   inline: collect2 ( --- ) \ set up collector 2
      "HOME" env "tops_rtc2.log" catpath "LOG_RTC" mainbook
      "HOME" env "tops_rtc2mem.log" catpath "LOG_MEM" mainbook

    \ These macros defined when this word runs go into the main 
    \ library for all to see:
      "epath2"   "DIR"     macro \ local dir receiving remote files
      "IPcol2"   "IPcol"   macro \ collector machine IP address
      "PORTcol2" "PORTcol" macro \ HTTP port on collector machine

      "RTC2_SERVER" dup msgGet drop \ remove old port number
      PORT intstr swap msgPut       \ put new port number
   end

   "-collect" argv chars 0= 
   IF " tops_rtc: collector must use argv -collect" . nl HALT THEN

   "-collect" argv "0" = IF collect0 THEN
   "-collect" argv "1" = IF collect1 THEN
   "-collect" argv "2" = IF collect2 THEN

\  The number for SOCK is set when the remote collector causes the
\  phrase
\     remotefd CONN_SET
\  to be run here.  See tops_rtcmon, word server_connect().
   -1 "SOCK" book \ will be valid when remote collector connects

\-----------------------------------------------------------------------

\  Words.

   "msgPut" missing IF "dog.v" source THEN

   inline: CLOSE ( --- ) \ this program will close
      nl " This program is closing " date + . nl
      remotesockets sclose
      5 "exit" ALARM
   end

   inline: CONN ( --- ) \ leave a message on the remote to connect here
{     This word, through word msgPutIP() below, makes a connection to 
      the remote's HTTP server and leaves a message for the companion 
      script to this one, running on the remote machine, to connect
      here, to IP address and listening PORT. 

      When this script is first started, this word CONN is run on an 
      ALARM that delays until DSERVER on IP:PORT is ready and listening
      for the upcoming connection that will establish socket SOCK.
}
      [ 10 (seconds) "SERVER_DELAY" book \ let DSERVER get started
        30 "TIMEOUT" book \ seconds until remote connects
      ]
      rtc_close \ make sure connection is closed

      www_open not
      IF WWW www_open not
         IF " CONN: failed to connect to Internet" . nl return THEN
      THEN
{
      Make a command string to run on the remote, for example
         "'71.107.4.6' 9886 server_connect" "RTC_CONNECT" msgPut,
      that will cause the remote machine running such a phrase to con-
      nect to listening port 9886 at IP address 71.107.4.6.
}
      "'ip' PORT server_connect" \ template; replace strings IP and PORT
      (hM) "ip" IP strp (hM)     \ replace string ip with IP address
      "PORT" PORT intstr strp    \ replace string PORT with PORT num

      (qS) "RTC_CONNECT"         \ S is an RTC_CONNECT message 
      IPcol PORTcol              \ sending to machine at IPcol:PORTcol 
      msgPutIP \ goes to remote machine's interprocess message list

    \ The remote collector should connect to the server here in a short
    \ time, and when it does CONN_SET() will be run and socket SOCK will
    \ be defined.
      TIMEOUT WAIT_ALARM \ time limit for connection
      WAIT_BEGIN         \ wait for connection through CONN_SET

    \ Turn off the WAIT_END alarm started by WAIT_ALARM
      "WAIT_END" -ALARM

    \ Set the alarm for reconnection:
      "CONN_RECON" "SEC" yank (nSec)
      (nSec) "CONN_RECON" ALARM \ set reconnection ALARM
   end

   inline: CONN_CLS (nS --- ) \ action when connection on S has closed
      " CONN_CLS: connection on " swap intstr + " is closed " + 
      date + . nl
      -1 "SOCK" mainbook \ invalidate SOCK
{
      xx \ clear the stack; there may be items from aborted connection

      NEVER CLEAR THE STACK.  I SHOULD KNOW BETTER.  

      HTTPget was returning a stack item when this word ran and cleared
      off the item (and whatever was below it: items that other words 
      may have been waiting for), causing the program to run away and
      then exit.  

      This is an event-driven system, words are recursive and can run 
      at any time, so the integrity of the stack must be maintained.

      It pays to understand a problem before attempting to fix it.

      I had a theory that the problem was due to the remote machine
      closing the socket on socket_ack, and if socket_ack was just
      discontinued, the problem would go away.  

      It has taken a long time, but by now I know it is wise to under-
      stand a problem first, and not just make blind changes and hope
      things are fixed.  

      So debug was added and after a few days the problem occurred but
      the debug was not quite informative enough and more was added.

      Finally after another day the problem occurred again, and the 
      unexpected has shown up as the culprit: the thoughtless clearing
      of the stack by "xx" above.  My shoot-from-the-hip theory about
      socket_ack was just plain wrong.

      Word CONN_CLS is called whenever the connection closes.  I guess
      that it usually runs after HTTPget has returned, because this
      error occurs infrequently.  The log below shows the case where 
      it ran at the worst time and nailed the stack:

      Here is word CONN running to make a connection to 205.134.240.76:
         Top of CONN Mon Jul 20 15:18:58 PDT 2009
         CONN message: '71.107.6.154' 9887 server_connect
         CONN RTC_CONNECT IPcol: 205.134.240.76
         CONN RTC_CONNECT PORTcol: 80
         Top of msgPutIP
         REMOTE_CONNECT: calling HTTPget: 205.134.240.76 clientIPs 
            remotefd clientindex quote 9887 CLIENT 'S' book "remotefd 
            WAIT_END" S remoterun (plunger)
         HTTPget: host 205.134.240.76
         HTTPget: connected to 205.134.240.76 on socket 2
         HTTPget: clientIPs remotefd clientindex quote 9887 CLIENT 'S' 
            book "remotefd WAIT_END" S remoterun (plunger)
         HTTPget: receiving bytes ...
         HTTPget: received 118 bytes at 1.10 Mbytes/sec
         HTTPget: closing connection

      Word CONN_CLS is called whenever the connection closes.  Here is 
      where the stack got cleared, removing the stack item that HTTPget
      was returning:
         CONN_CLS: connection on 2 is closed Mon Jul 20 15:18:58 PDT 200
         << xx cleared the stack here >>
         End of CONN_CLS

      This is debug print placed in REMOTE_CONNECT, and it verifies that
      the stack is empty following CONN_CLS:
         REMOTE_CONNECT: stack after HTTPget:
         stack is empty 

       This is the mess that followed:
         textget: expect string or volume on stack
         dup: empty stack
         cannot pop empty stack
         gt: stack items not as expected
         over: expect two items on stack
         quote: expect string or volume on stack
         strchop: expect string on stack
         cannot pop empty stack
         faulty phrase: "*" PORT DSERVER
         runaway detected: HALT on run level 6  Mon Jul 20 15:18:58 PDT

      Here is a repeat the next day, after removing "xx" and with the 
      same debug print.  It shows three stack items, obviously crucial
      for running, that "xx" had cleared off.  Remoteprompt to the pro-
      gram on IPloop, port 9885 showed the stack was empty following all
      this, as it should be, so the machinery is working ok:
         Top of CONN Tue Jul 21 15:19:33 PDT 2009
         CONN message: '71.107.6.154' 9885 server_connect
         CONN RTC_CONNECT IPcol: 205.134.240.76
         CONN RTC_CONNECT PORTcol: 80
         Top of msgPutIP
         REMOTE_CONNECT: calling HTTPget: 205.134.240.76 clientIPs
            remotefd clientindex quote 9885 CLIENT 'S' book "remotefd 
            WAIT_END" S remoterun (plunger)
         HTTPget: host 205.134.240.76
         HTTPget: connected to 205.134.240.76 on socket 2
         HTTPget: clientIPs remotefd clientindex quote 9885 CLIENT 'S'
            book "remotefd WAIT_END" S remoterun (plunger)
         HTTPget: receiving bytes ...
         HTTPget: received 118 bytes at 1.12 Mbytes/sec
         HTTPget: closing connection
         CONN_CLS: connection on 2 is closed Tue Jul 21 15:19:33 PDT 200
         End of CONN_CLS

      This is debug print placed in REMOTE_CONNECT, showing three stack
      items that CONN_CLS had previously cleared:
         REMOTE_CONNECT: stack after HTTPget:
         stack elements:
               0 string:  REQUESTrun: clientIPs remotefd cli...  118 
                  characters
               1 string: RTC_CONNECT  11 characters
               2 string: '71.107.6.154' 9885 server_connect  34 
                  characters
         [3] ok!

        Tue Jul 21 15:19:33 PDT 2009 SERVER: 205.134.240.76 connect
         104 bytes delta: memprobe socket 2 connect
         REMOTE_CONNECT1: connected to 205.134.240.76:80
         msgPutIP socket: 2
         msgPutIP: connected to 205.134.240.76
         msgPutIP running T: "'71.107.6.154' 9885 server_connect" 
            "RTC_CONNECT" msgPut
         msgPutIP: connection closed
         CONN WAIT_BEGIN Tue Jul 21 15:19:34 PDT 2009
         CONN_SET: connection on socket 3 Tue Jul 21 15:19:34 PDT 2009
         CONN WAIT_END Tue Jul 21 15:19:34 PDT 2009
}
      "CONN_RECON" "SEC" yank (nSec) 
      (nSec) 10 / \ next reconnection sooner than SEC
      (nSec/10) "CONN_RECON" ALARM \ set reconnection ALARM
   end

   inline: CONN_RECON ( --- ) \ reconnect to collector
\     This word runs on an alarm that it continuously resets, to see if
\     it is necessary to connect again by running CONN.

      [ 180 "SEC" book  \ test for reconnect every SEC seconds
        600 "TMAX" book \ max time between received files
      ]

      LOCKED not
      IF time "extract_files" "t_extract" yank - TMAX >
         IF " CONN_RECON: too much time since last extraction" . nl
            rtc_close
         THEN

       \ July 2009: run rtc_close if SOCK is not an open client:
         "SOCK" main -1 >
         IF "SOCK" main client_open not
            IF " CONN_RECON: SOCK is not a client, closing connection" 
               . nl rtc_close 
            THEN
         THEN

         "SOCK" main 0< (f1) www_open not (f2) or  

         IF " CONN_RECON: running CONN to reconnect " date + . nl 
            CONN  
         THEN

      THEN
      SEC "CONN_RECON" ALARM \ check again in SEC 
   end
      
   inline: CONN_SET (nS --- ) \ set up for collector just connected on S
\     When the remote collector connects, it runs this word so this end
\     can be set up.
      " CONN_SET: connection on socket " over intstr + spaced 
      date + . nl
      (nS) dup "SOCK" mainbook                \ connected on SOCK
      "CONN_CLS" ptr swap (ptr nS) ptrCls_upd \ set clientclose function
   end 

   inline: current ( --- ) \ local files brought current to remote ones
      rtc_files any?
      IF (hT2) 1st word drop (hT2)
         local_files 1st word drop (hT1) 
         (hT2 hT1) nomatch1 any?
         IF (hFiles) get_archive any?
            IF (hT) extract_files THEN
         THEN
      THEN
   end

   inline: current_for (nYYYMMDD --- ) \ local files current for YYYMMDD
      "DATE" book
      DATE rtc_for any?
      IF (hT2) 1st word drop (hT2) 
         DATE local_for (hT1) 1st word drop (hT1)
         (hT2 hT1) nomatch1 any?
         IF (hFiles) get_archive any?
            IF (hT) extract_files THEN
         THEN
      THEN
   end

   inline: extract_files (hT --- ) \ extract files contained in volume T
{     Volume T on the stack is a tar file archive.  Save T to FILE and 
      then extract the files of FILE into DIR.

      When the remote machine sends file archive T to this machine, it
      follows it with a string to run this word.  

      For example, the remote machine might run remoterun2() to send T 
      from its stack to here and then run this word, extract_files(), 
      on this machine.  

      Here is a phrase run on the remote machine to do this, showing
      T on its stack ready to be sent here:

         (hT) "extract_files" S remoterun2
}
      [ INF "t_extract" book ]

      time "t_extract" book \ time for elapsed test in CONN_RECON

    \ Write a line to the log file:
      " extract_files: to " DIR + spaced that sizeof intstr + 
      " bytes " + date + . nl 

      ftempsys "FILE" book
      FILE old binary "BIN" file \ open handle to old FILE
      (hT) BIN fput              \ bytes on stack to FILE
      BIN fclose                 \ close FILE handle
      DIR FILE xtar              \ extract tar files from FILE into DIR
{
      FILE should always exist, but when using delete the following 
      was obtained once (the program recovered and continued as the 
      last line shows):

         extract_files: to /home/dale/mdat/edat1/ 4433 bytes 
            Wed May 14 18:46:48 UTC 2008
         delete: file not found: /tmp/T3494_Wm8x37
         faulty phrase: extract_files
         faulty phrase: "*" PORT DSERVER
         extract_files: to /home/dale/mdat/edat1/ 4309 bytes 
            Wed May 14 18:48:28 UTC 2008
}
    \ Switch to deletif:
      FILE deleteif              \ delete FILE
    \ FILE delete                \ delete FILE

      "/bin/touch " DIR + shell  \ so filetime will show change
   end

   inline: get_archive (hFiles --- hT) \ get Files archive from remote
      "SOCK" main "S" book

      S -1 =
      S socket_open not or
      IF " get_archive: socket to remote is not open" . nl
         drop VOL tpurged
      ELSE
       \ Files are in DIR on remote; run word archive on the remote
       \ and have an archive of Files sent here (note that DIR on
       \ the remote is where collected files are placed; it probably
       \ is a different name than DIR here):
         (hFiles) "DIR archive (hT) remotefd remoteput" (hT2)
         (hFiles hT2) S remoterun2
         S 40 (nS nSec) BLOCK
      THEN
   end

   inline: local_files ( --- hT) \ list of all local files
\     Volume T contains a list of file names and times for remote files
\     that have been downloaded to directory DIR.
      DIR dirfiles (hNames hTimes) " %0.0f" format park
   end

   inline: local_for (nYYYMMDD --- hT) \ list of local files for YYYMMDD
\     Volume T contains a list of file names and times for files that
\     have been downloaded for YYYMMDD to directory DIR.
      DIR dirfiles (hNames hTimes) " %0.0f" format park
      dup rot intstr grepr any?
      IF reach ELSE drop VOL tpurged THEN
   end

   inline: rtc_close ( --- ) \ close connection to remote
      "SOCK" main "S" book

      S -1 >
      IF " rtc_close: closing socket " S intstr + " to collector " + 
         date + . nl 
         0 S ptrCls_upd \ essential to avoid endless loop with CONN_CLS
         "remotefd server_close" S remoterun
      THEN
      S sclose
      -1 "SOCK" mainbook
   end

   inline: rtc_files ( --- hT) \ list of all remote files
      "SOCK" main "S" book

      S -1 =
      S socket_open not or
      IF " rtc_files: socket to remote is not open" . nl 
         VOL tpurged
      ELSE
         "rtc_files remotefd remoteput" S remoterun1
      THEN
   end

   inline: rtc_for (nYYYMMDD --- hT) \ list of remote files for YYYMMDD
      "SOCK" main "S" book

      S -1 =
      S socket_open not or
      IF " rtc_for: socket to remote is not open" . nl 
         drop VOL tpurged
      ELSE
         intstr (hT1) "main (nYYMMDD) rtc_for remotefd remoteput" (hT2)
         (hT1 hT2) S remoterun2
         S 20 (nS nSec) BLOCK
      THEN
   end

   pull catmsg

   keys? IF halt THEN \ interactive testing, cannot run daemon server

\-----------------------------------------------------------------------

\  Start a multitasker job to track memory usage:
\  July 2009: Memory has looked fine for months, no leaks.  Discontinue
\  this:
\  LOG_MEM "memlog" "LOG" bank 
\  1 900 / "memlog" PLAY \ every 15 minutes

\-----------------------------------------------------------------------

\  This section makes the connection to the complementary daemon on the
\  remote machine.

\  SYSOUT must be defined for this daemon's output.  This line sets 
\  SYSOUT to the log file name defined above:
   LOG_RTC set_sysout \ SYSOUT will be LOG_RTC

\  Write the first lines in LOG_RTC file:
   "-" 72 cats nl dot nl
   "PID " getpid intstr + spaced date + dot nl
   tasks

\  Settings:
   12 new_client_timeout \ time allowed for remote to make connection

   NIST_SYNC

\  Run CONN on an ALARM that gives DSERVER time to start:
   "CONN" "SERVER_DELAY" yank "CONN" ALARM

\  Start the daemon server, running forever.  The remote collection
\  machine will connect shortly after CONN runs: 
   "*" PORT DSERVER

\-----------------------------------------------------------------------

;  Appendix

   Problem, July 2009.

      After readn1 errno, the program hung.  It remained connected
      to the collection site:
         extract_files: to /mdat/edat1/ 1519 bytes Thu Jul  2 21:02:22 
         extract_files: to /mdat/edat1/ 1513 bytes Thu Jul  2 21:05:13 
         connect_alarm: signum 14  Thu Jul  2 21:07:30 PDT 2009
         readn1: probable alarm interrupt, errno: 4

      At the collection site, it kept delivering files to socket 2, the
      one that is hung up, but socket_ack continually fails and it does
      nothing about that:

         Server local is listening on port 9879
         Clients:
          socket 2, port  9882, conn C>S, 71.106.247.190 dale plunger
          socket 3, port  9879, conn C>S, 64.62.148.191 dale topsdog
         send_files: sending 4008 byte archive of 4 files
         send_files: socket_ack on socket 3 ok
         send_files: to socket 3 Fri Jul  3 17:04:10 UTC 2009
         send_files: socket_ack on socket 2 failed
          Jmp table at lev 4
           lev   ret  typ  Lib:lib
            4     2    1     0:send_files
            3     2    1     0:dir_monitor
            2     2    1     0:D2
            1     1    2     0:DATA__
          send_files: to socket 2 Fri Jul  3 17:04:17 UTC 2009
          send_files end: Fri Jul  3 17:04:17 UTC 2009

      What if the collection site, on seeing socket_ack fails, simply
      closed the socket.  Then would the receiver cease to be hung up?

      Changes to tops_rtcmon and tops_rtc have been made to address 
      this problem.

Example of connecting during a noisy period, October 2008.

This shows the tops_rtc log file during a period when the remote tops_rtcmon
server had TCP/IP connection problems, making operation on this end very rocky.

Times like this are a pain, but they offer the opportunity to make changes
that improve reliability.  Lines below show program tops_rtc detecting bad
connections and continually reconnecting.

Communication to make a remote connection is through the remote's interprocess
communication system (file dog.v), and not through direct connection to server
tops_rtcmon (see documentation at the top of file tops_rtc).  This program sends 
a message to the remote's msgcomm file (see "msgPutIP OK:" below), while the 
remote daemon, tops_rtcmon, is polling for such a message.  When received, the 
remote makes a new connection to here.  This turns out to be a key feature in 
robust reconnection, in effect using a neutral or third party.  

Below is the excerpt from the tops_rtc log file, with comments inserted.

Connect to tops_rtcmon server and start receiving files:

Fri Oct 31 15:10:24 UTC 2008 SERVER: YY.XXX.ZZ.76 connect
 8 bytes delta: memprobe socket 3 connect
 msgPutIP OK: "'XXX.XX.148.191' 9879 server_connect" "RTC_CONNECT" msgPut
 msgPutIP: connection closed
 CONN_SET: connection on socket 3 Fri Oct 31 15:10:25 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 12311 bytes Fri Oct 31 15:10:41 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2709 bytes Fri Oct 31 15:11:53 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2794 bytes Fri Oct 31 15:13:04 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3107 bytes Fri Oct 31 15:14:05 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3786 bytes Fri Oct 31 15:15:58 UTC 2008

This shows the socket_ack phrase from remote server, trying to run word remoterun
here.  But writen1 finds the socket to the remote is now closed, and word remoterun
fails.  Word CONN_CLS officially closes the socket:

 writen1: socket 3 is not open, client closed
 CONN_CLS: connection on 3 is closed Fri Oct 31 15:27:25 UTC 2008
 fault at word: remoterun
 faulty phrase: "'remoteack' 'pile_ACK' localrun" remotefd remoterun

CONN_RECON running periodically detects too much time passed, and initiates 
another connection:

 CONN_RECON: too much time since last extraction
 CONN_RECON: running CONN to reconnect Fri Oct 31 15:30:24 UTC 2008

The connection initiated by CONN_RECON succeeds:

Fri Oct 31 15:30:25 UTC 2008 SERVER: YY.XXX.ZZ.76 connect
 -152 bytes delta: memprobe socket 2 connect
 CONN_SET: connection on socket 2 Fri Oct 31 15:30:26 UTC 2008

but after about 20 seconds socket_ack fails again:

 writen1: socket 2 is not open, client closed
 CONN_CLS: connection on 2 is closed Fri Oct 31 15:30:46 UTC 2008
 fault at word: remoterun
 faulty phrase: "'remoteack' 'pile_ACK' localrun" remotefd remoterun

and CONN_RECON again detects that (still) too much time has passed and 
starts another connection:

 CONN_RECON: too much time since last extraction
 CONN_RECON: running CONN to reconnect Fri Oct 31 15:33:45 UTC 2008
 msgPutIP: connected to YY.XXX.ZZ.76

Connection succeeds and a couple of files are received but socket_ack to
the server again fails and CONN_RECON starts another connection:

Fri Oct 31 15:33:46 UTC 2008 SERVER: YY.XXX.ZZ.76 connect
 -32 bytes delta: memprobe socket 3 connect
 msgPutIP OK: "'XXX.XX.148.191' 9879 server_connect" "RTC_CONNECT" msgPut
 msgPutIP: connection closed
 CONN_SET: connection on socket 3 Fri Oct 31 15:33:47 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 8933 bytes Fri Oct 31 15:33:55 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2900 bytes Fri Oct 31 15:35:07 UTC 2008
 writen1: socket 3 is not open, client closed
 CONN_CLS: connection on 3 is closed Fri Oct 31 15:37:40 UTC 2008
 fault at word: remoterun
 faulty phrase: "'remoteack' 'pile_ACK' localrun" remotefd remoterun
 CONN_RECON: running CONN to reconnect Fri Oct 31 15:40:39 UTC 2008
 msgPutIP: connected to YY.XXX.ZZ.76

Connection succeeds, and receipt of files is going more smoothly:

Fri Oct 31 15:40:40 UTC 2008 SERVER: YY.XXX.ZZ.76 connect
 -56 bytes delta: memprobe socket 3 connect
 msgPutIP OK: "'XXX.XX.148.191' 9879 server_connect" "RTC_CONNECT" msgPut
 msgPutIP: connection closed
 CONN_SET: connection on socket 3 Fri Oct 31 15:40:43 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 9132 bytes Fri Oct 31 15:40:44 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3653 bytes Fri Oct 31 15:41:55 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 4748 bytes Fri Oct 31 15:43:28 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3407 bytes Fri Oct 31 15:44:39 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2913 bytes Fri Oct 31 15:45:50 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3688 bytes Fri Oct 31 15:46:45 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2090 bytes Fri Oct 31 15:50:23 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 4040 bytes Fri Oct 31 15:51:33 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3700 bytes Fri Oct 31 15:52:36 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 4807 bytes Fri Oct 31 15:54:06 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3143 bytes Fri Oct 31 15:55:43 UTC 2008
