soyMAIL 2.2.0 requires JavaScript
soyMAIL @ wasd.vsm.com.au
       info-WASD Mailing List 2025 

Sat 01:47:43 Message "2025 / 0012" opened.  MIME.  utf-8.  Plain (−HTML)   7 kbytes.    JavaScript

Subject:[Info-WASD] Latency and front-ends0012 / 0000
From:mark.daniel@wasd.vsm.com.au
Reply-to:info-wasd@vsm.com.au
Date:Wed, 24 Sep 2025 14:05:17 +0930  [24-SEP-2025 14:05]
To:info-WASD@vsm.com.au

In Three Acts  ——  not a tragedy, neither a comedy
~~~~~~~~~~~~~
Just on a year ago this collection added an article, "Latency and back-ends",
which discussed a latent bug (of misunderstanding), after years, suddenly
becoming a problem due to a change in behaviour at the far end.

  https://wasd.vsm.com.au/info-WASD/2024/0015

This article discusses another latent coding bug, at the other end of the
processing chain.

Kudos to Process Software Corporation (PSC).  Retold with permission.

Dramatis Personae
~~~~~~~~~~~~~~~~~
[redacted]        client of PSC
Hunter Goatley    EISNER-meister (and PSC Engineer)
John Reagan       mythological compiler chimaera
Richard Whalen    PSC Principal Software Engineer
narrator          vox mea propria

Act I
~~~~~
Res quaedam non recte se habe.

[redacted] is having an issue with WASD on field-test x86-64 MultiNet.

To investigate, PSC deploy a VirtualBox hosted, X86 V9.2-3 and MultiNet v6,
along with a fresh WASD v12.3 source kit.  After the build, using the usual
demonstration procedure, WASD refuses to accept a request.  Any request at
all.  Just drops the connection.

Curiously, this is *not* the problem [redacted] is experiencing.

Normally at any point in problem solving the suggestion is WATCH.
But if you can't even connect?

 > You *can* WATCH Hunter, at the command-line.  It's a little messy.
 > $ SPAWN HTTPD <any-other-required-paramaters> /WATCH=NOSTARTUP,-1
 > The -1 just enables all WATCH items.

https://wasd.vsm.com.au/wasd_root/wasdoc/features/#watchfacilit.
https://wasd.vsm.com.au/wasd_root/wasdoc/features/#commandlineu..

   Thanks for the console WATCH info. Rich tried it today and got this:
   8< snip 8<
   |15:06:25.58 HTTPD    3368 000014 INTERNAL   TIMER input 30 seconds|
   |15:06:25.58 TCPIP    0768 000014 NETWORK    MAXQIO 0 maxseg:1460 sndbuf:62780 rcvbuf:0 _BG864: %X00000001|
   |15:06:25.58 NETIO    0717 000014 NETWORK    READ 0/16384 bytes (non-blocking)|
   |15:06:25.58 NETIO    0865 000014 NETWORK    READ %X00000001 0 (0/16384) bytes (non-blocking)|
   8< snip 8<
   The big question is “Why is it doing 0 (zero) byte reads?”

A fair question.  My development bench doesn't.

   |21:07:18.41 WATCH    3198 000002 FILTER     CLIENT adding gort.lan,63944 on http://x86vms.lan,80 (192.168.1.86)|
   |21:07:18.41 TCPIP    0736 000002 NETWORK    SETMODE sndbuf:62780 rcvbuf:62780 %X00000001|
   |21:07:18.41 TCPIP    0768 000002 NETWORK    MAXQIO 64240 maxseg:1460 sndbuf:999999 rcvbuf:999999  %X00000001|
   |21:07:18.41 NETIO    0722 000002 NETWORK    READ 0/16384 bytes (non-blocking)|

Er, hello, it's doing zero byte $QIOs because that's how many bytes are
reported available to the receive buffer.

   |15:06:25.58 TCPIP    0768 000014 NETWORK    MAXQIO 0 maxseg:1460 sndbuf:62780 rcvbuf:0 _BG864: %X00000001|
                                                                                  ^^^^^^^^
WASD, performing network I/O using $QIO, attempts to optimise the number of
whole TCP segments that can be fitted into a single QIO.

   /*****************************************************************************/
   /*
   The socket MSS value has been established during connection acceptance.
   Calculate the maximum number of full segments that can be QIOed and set MaxQio.
   */
   
   int TcpIpSocketMaxQio (void *vptr)
   8< snip 8<
      {
         qios = 65535;
         if (qios > ioptr->TcpSndBuf / 2) qios = ioptr->TcpSndBuf / 2;
         if (qios > ioptr->TcpRcvBuf / 2) qios = ioptr->TcpRcvBuf / 2;
         qios = (qios / ioptr->TcpMaxSeg) * ioptr->TcpMaxSeg;
         ioptr->TcpMaxQioSet = ioptr->TcpMaxQio = qios;
      }
   8< snip 8<
   /*****************************************************************************/

In this way, data far exceeding single $QIO capacity, send groups of whole
segments, theoretically optimising transfer 'on the wire'.

Act II
~~~~~~
Fama cimex.

   Rich finally tracked down the problem. I'm sorry to report that
   it's a WASD bug!"

   Once he verified that the MultiNet kernel was returning proper
   sizes in the SENSEMODE call, he started looking at the WASD code.

   In [SRC.HTTPD]TCPIP.C, these variables are declared as /ushort/,
   but they should be /int/.

       216   ushort  TcpIpMaxSegLength,
       217           TcpIpRcvBufLength,
       218           TcpIpSndBufLength;

   The documentation states that they should be /int/, and when we
   changed "ushort" to "int" and build it, the WASD demo ran just fine.

Fair enough.  This code has been running a long time across at least two
earlier CPU architectures.

Why broken on x86-64 now?  The WASD x86-64 work began in late 2020 and the
essential port declared concluded twelve months later with the release of
v12.0.  Further annual iterations through to v12.3 have consolidated both
WASD and the x86-64 versions of it.  (In fact 2025 saw my development bench
move from my 20+ year old Alpha to X86.  Even on an everyday desktop it is
blindingly fast in comparison.)

Literally millions of OWASP-ZAP generated crawl and exploit requests over
these four releases have not hinted at anything amiss.

https://wasd.vsm.com.au/wasd_root/wasdoc/config/#3.1.serverands...

My development bench is a commodity Dell, Windows 11, using Virtual Box,
hosting X86 VMS V9.2-3 and VSI TCP/IP Services V6.0.

Act III
~~~~~~~
Diabolus ex machina.

   The reason [redacted] doesn't have a problem and we did is probably
   because of different VM software and/or different CPUs and data
   alignment. Rich said that John Reagan said that prior architectures
   (Alpha, I64, and, to a lesser extent, VAX) have had significant
   penalties for bad alignment, so the compilers would longword align
   variables to minimize the impacts. 

   John said that x86_64 systems don't have the performance penalties,
   so the compilers no longer pad variables to make them longword-
   aligned. That and other things we can only speculate about explain
   the WASD problem. On our x86_64 systems running under VirtualBox,
   the /int/ values we were returning to the /ushort/ variables were
   apparently overwriting each other, resulting in the 0-length receive
   buffer size WASD was seeing.

That underlying CPU implementations and virtualisation architectures can in
combination with varying host O/S and virtualisation tools yield differing
behaviours is a sobering thought.

There was a brief suggestion that X86 VMS data alignment remaining padded at
some level (long/quad/octa) might go some way in mitigating these potential
alignment/size architectural issues.  The conclusion was that the X86
instruction stream largely would be a result of the LLVM Core, effectively
meaning issues such as data alignment are not under VSI control.

PS. Tired of folk salting their work with Latin phrases ad nauseum?

This item is one of a collection at
https://wasd.vsm.com.au/other/#occasional

  ¤¤¤       
  ¤¤¤