soyMAIL 2.2.0 requires JavaScript
soyMAIL @ wasd.vsm.com.au
       info-WASD Mailing List 2025 

Thu 03:31:42 Message "2025 / 0038" opened.  MIME.  utf-8.  Plain (−HTML)   19 kbytes.    JavaScript

Subject:[Info-WASD] Throttle revisited plus CPU exhaustion0038 / 0000
From:mark.daniel@wasd.vsm.com.au
Reply-to:info-wasd@vsm.com.au
Date:Wed, 26 Nov 2025 13:40:19 +1030  [26-NOV-2025 13:40]
To:info-WASD@vsm.com.au

(way)TL;DR ... remove access to the in-demand resource(s)

Earlier in the year there was a post regarding performance and process quota
management using WASD's 'throttle'

  https://wasd.vsm.com.au/info-WASD/2025/0004

This post builds on that, refining the approach based on experience, and then
proposes another tool on CPU constrained platforms - for those having more
than a single CPU.

(With just the one, upgrade -or- severely limit connections.)

Process Exhaustion
~~~~~~~~~~~~~~~~~~
Using the 'throttle' facility, originally designed to queue 'transactions'
(requests) in resource constrained environments (VAX and early Alpha), it is
possible to have a specified number processing while others wait.  The simple
example presented, controls processing of the 'Conan' librarian script

  script+ /conan* /cgi-bin/conan* throttle=5
  script+ /help* /cgi-bin/conan* throttle=5

meaning 5 of each are processed while others are queued until processing
becomes available.  The main issue with this is, while it can be somewhat CPU
conservative, each queued request has been network connected, the request
processed, and is consuming memory and associated resources while queued.
(Also, was never really designed for network traffic control.)

With DECUServe.org, which was labouring, we constrained some paths to say 5
processing.  Under siege, as the server sometimes seems lately, there were
observed 30, 40, 50+ queued requests.  These often were abandoned by clients
and still needed to be run down, resources released, and so forth.  Generally
lots of eventually non-productive effort.

The solution is to add another constraint to the 'throttle'.  This provides a
limit to the number allowed to be queued.  After this number the response is
just 503 busied.  Still has to be rundown and resources released but is a
pressure release valve of sorts (continuing the throttle metaphor :-)

  script+ /conan* /cgi-bin/conan* throttle=5,,,10
  script+ /help* /cgi-bin/conan* throttle=5,,,10

  https://wasd.vsm.com.au/wasd_root/wasdoc/config/#requestthrottl...

This did improve the situation appreciably.

CPU Exhaustion
~~~~~~~~~~~~~~
DECUServe.org is a 4 CPU, 2048MB emulated Alpha, hosted by PSC, with software
support from AVTware and VSI, using voluntary admin staff.  A modest machine
by any contemporary measure.

  https://decuserve.org

With the onslaught described in the opening reference, a single CPU was often
being driven 100%, particularly as many of the harvesters and other
bloody-minded 'bots seemed single request engines.  Each request required a
complete TLS (Secure Socket) handshake to initiate, the creation and
processing of the request and response, and then rather than persist and use
the connection again for another request, the connection would be torn down.
Over and over and over again.  Often dozens of them simultaneously.

WASD is not intrinsically multi-threaded.  It has an event-driven design.
CPU concurrency is effected through multiple 'instances' of WASD concurrently
executing, comparable in many ways to VMS clustering.

A cluster is often described as a loosely-coupled, distributed operating
environment where autonomous processors can join, process and leave (even
fail) independently, participating in a single management domain and
communicating with one another for the purposes of resource sharing and high
availability.

Similarly WASD instances run in autonomous, detached processes (across one or
more systems in a cluster) using a common configuration and management
interface, aware of the presence and activity of other instances (via the
Distributed Lock Manager and shared memory), sharing processing load and
providing rolling restart and automatic 'fail-through' as required.

  [unpaid advertisement concluded]

So DECUServe.org was made into a single system, multi- WASD instance server.

  https://wasd.vsm.com.au/wasd_root/wasdoc/features/#serverinstan...

HOW?
~~~~
By editing the WASD_CONFIG_GLOBAL directive [InstanceMax] (0 becomes 2 in
this case) and restarting.  The current server shuts down and restarts,
noting the changed directive.  A few seconds later another WASD process
starts.  We now have two instances on the system.

  000059BF WASD1:80        CUR   0  6 31191178   1 01:02:35.92      9557   9502
  000059CB WASD2:80        CUR   1  6 31113178   1 01:02:28.20      9839   9719

The global common is shared using multiple mutexes to coordinate access and
incoming network connections are distributed between the instances using a
round-robin algorithm.

HTTPDMON utility swaps between each instance with a status summary of all
instances shown below the most recent request line.

┊ Process: WASD2:80  PID: 000059CB  User: HTTP$SERVER  Version: 12.3.6
┊      Up: 2 00:13:43.09  CPU: 1 01:03:47.18  Startup: 2
┊ Pg.Flts: 9839  Pg.Used: 17%  WsSize: 208096  WsPeak: 155504
┊     AST: 3916/4000  BIO: 3928/4000  BYT: 4167296/4221504  DIO: 1991/2000
┊     ENQ:  937/1000  FIL:  384/400   PRC:     100/100       TQ:  498/500
┊
┊ Request: 992532  Current: 98/67/0/0  Throttle: 50/14/3%  Peak: 120/106

Note that the server displayed (WASD2:80) has been up just on 48 hours, has
used roughly 25 hours of CPU (!), and the consolidated server (WASD1:80 plus
WASD2:80) processed just under one million requests, with 120 peak
connections and 106 peak concurrent requests.

┊ Request: GET /anon/htnotes/xtract?f1=HARDWARE_HELP&f2=15.3
┊ 
┊   Instance          Ago   Up Count Exit Status     Version /Min  /Hour
┊   EISNER::WASD1:80  15s   2d     1                  12.3.6  169  10989
┊   EISNER::WASD2:80  14s   2d     1                  12.3.6  152  11091

  https://wasd.vsm.com.au/wasd_root/wasdoc/features/#status

Works in Practice
~~~~~~~~~~~~~~~~~
DECUServe.org seems much more stable and responsive under more extreme loads
with this 'simple' adaptation.  As described above, each instance manages its
own scripts, RTEs and the like, further distributing the load across the
system.  As the somewhat elementary measures of MONITOR PROCESS /TOPCPU and
MONITOR SYSTEM illustrate, CPU consumption is shared between processes WASD1
and WASD2, along with allied processes, as are other resources consumed.
 
┊                            OpenVMS Monitor Utility
┊                             TOP CPU TIME PROCESSES
┊                                 on node EISNER
┊                             7-NOV-2025 21:35:15.03
┊
┊                                     0         25        50        75       100
┊                                     + - - - - + - - - - + - - - - + - - - - +
┊ 000088C9  WASD2:80               98  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 00008CC1  WASD1:80               96  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 00008846  BATCH_533              19  ▒▒▒▒▒▒▒
┊ 00008CCD  /help+8CCD              6  ▒▒
┊ 000088C7  WASD1:80_88C7           4  ▒
┊ 000088CB  WASD2:80_88CB           4  ▒
┊ 00008CC3  /help+8CC3              3  ▒
┊ 000090B4  /htnotes_90B4           3  ▒
┊ 00007C5A  WASD1:80_7C5A           3  ▒
┊ 00009088  /htnotes_9088           1
┊ 0000041C  MULTINET_SERVER         1
┊ 00009002  NOTES$0216_1            1
┊ 000088CE  /conan+88CE             1
┊ 00000419  NETACP                  1
┊ 00009208  NOTES$0010_1
┊                                     + - - - - + - - - - + - - - - + - - - - +

┊ Node: EISNER                OpenVMS Monitor Utility      7-NOV-2025 21:35:14
┊ Statistic: CURRENT             SYSTEM STATISTICS
┊                                                      Process States
┊           ┌ CPU Busy (224)          ─┐         LEF:      65    LEFO:       0
┊           │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒          │         HIB:      40    HIBO:       0
┊ CPU     0 ├──────────────────────────┤ 400     COM:       0    COMO:       0
┊           │▒▒▒▒▒▒▒                   │         PFW:       0    CUR:        2
┊           └──────────────────────────┘         MWAIT:     1    Other:      0
┊           Cur Top: WASD2:80 (98)                         Total: 108
┊ 
┊           ┌ Page Fault Rate (562)   ─┐         ┌ Free List Size (35114)   ┐
┊           │▒▒▒▒|▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│         │▒▒▒                       │ 256K
┊ MEMORY  0 ├──────────────────────────┤ 500   0 ├──────────────────────────┤
┊           │▒▒▒▒▒▒▒                   │         │▒▒▒▒▒▒▒▒▒▒                │ 26K
┊           └──────────────────────────┘         └ Mod List Size (10448)    ┘
┊           Cur Top: WASD2:80_29DA (152)
┊ 
┊           ┌ Direct I/O Rate (306)   ─┐         ┌ Buffered I/O Rate (1405)─┐
┊           │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒           │         │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
┊ I/O     0 ├──────────────────────────┤ 500   0 ├──────────────────────────┤ 500
┊           │▒▒▒▒                      │         │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒          │
┊           └──────────────────────────┘         └──────────────────────────┘
┊           Cur Top: MULTINET_SERVER (86)        Cur Top: WASD2:80 (312)

These snaps were taken during a busy period.  Yet a third instance needed?
Let's see.  Editing [InstanceMax] from 2 to 3 and restarting.

┊                                     0         25        50        75       100
┊                                     + - - - - + - - - - + - - - - + - - - - +
┊ 0008F208  WASD3:80               86  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 000454B5  WASD2:80               84  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 0004C880  WASD1:80               80  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 000960C1  /conan+60C1            22  ▒▒▒▒▒▒▒▒
┊ 000990C2  /help+90C2             11  ▒▒▒▒
┊ 0009A20C  SERVER_000C             2
┊ 0000041C  MULTINET_SERVER         1
┊ 0008CCCF  /conan+CCCF             1
┊ 00005458  /help+5458
8< snip 8<
┊                                     + - - - - + - - - - + - - - - + - - - - +

Accessing DECUServe.org *feels* much more responsive.  If a real,
in-production server, would benefit from an underlying hardware upgrade.

By way of contrast, here it is more typically.

┊                                     0         25        50        75       100
┊                                     + - - - - + - - - - + - - - - + - - - - +
┊ 000454B5  WASD2:80               46  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 0004C880  WASD1:80               44  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 0008F208  WASD3:80               34  ▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 000B0154  /conan+0154             7  ▒▒
┊ 000AFD46  /conan+FD46             6  ▒▒
┊ 000095F8  PTSMTP    0001i         3  ▒
┊ 0000041C  MULTINET_SERVER         2
┊ 0008C4E5  <SMTP-04>               1
┊ 00005458  /help+5458              1
┊ 000B1896  SSHD 0000
┊ 00000423  NTP_SERVER
8< snip 8<
┊                                     + - - - - + - - - - + - - - - + - - - - +

Ad Hoc Adjustment
~~~~~~~~~~~~~~~~~
Should further CPU be required, perhaps due to a spike in processing, and
further physical CPUs be available to service that demand, more CPU(s) may
simply be added, by setting the appropriate Server Admin "Instance [Max]
[CPU] [1] [2] [3] ... [8]" button, and a [Restart], resulting in a rolling
restart, up to the new number of CPUs.  The converse resumes original
processing.

OR YOU CAN JUST TRY  removing access to the in-demand resource(s)
~~~~~~~~~~~~~~~~~~~  (at least to see if that materially improves performance)
                     https://decuserve.org/help

┊                                     0         25        50        75       100
┊                                     + - - - - + - - - - + - - - - + - - - - +
┊ 00000443  WASD1:80               15  ▒▒▒▒▒▒
┊ 00000472  WASD3:80               11  ▒▒▒▒
┊ 0000045C  WASD2:80                7  ▒▒
┊ 00000401  SWAPPER                 1
┊ 0000DD4F  <SMTP-02>
┊ 00000454  <SMTP-03>
┊ 0000044F  <POP3-01>
┊ 00000422  MULTINET_SERVER
┊ 00000411  SECURITY_SERVER
8< snip 8<
┊                                     + - - - - + - - - - + - - - - + - - - - +

PS. As described in the earlier posting
       https://wasd.vsm.com.au/info-WASD/2025/0009
       server activity can always be checked ...
    https://decuserve.org/httpd/-/admin/report/activity
    https://decuserve.org/httpd/-/admin/report/activity?of=4
    https://decuserve.org/httpd/-/admin/report/activity?of=24

PPS. DECUServe WASD performance data is always available using:
     https://decuserve.org/cgiplus-bin/alamode
     https://decuserve.org/cgiplus-bin/mondesi

This item is one of a collection at
https://wasd.vsm.com.au/other/#occasional

  ¤¤¤       
  ¤¤¤