(way)TL;DR ... remove access to the in-demand resource(s)
Earlier in the year there was a post regarding performance and process quota
management using WASD's 'throttle'
https://wasd.vsm.com.au/info-WASD/2025/0004
This post builds on that, refining the approach based on experience, and then
proposes another tool on CPU constrained platforms - for those having more
than a single CPU.
(With just the one, upgrade -or- severely limit connections.)
Process Exhaustion
~~~~~~~~~~~~~~~~~~
Using the 'throttle' facility, originally designed to queue 'transactions'
(requests) in resource constrained environments (VAX and early Alpha), it is
possible to have a specified number processing while others wait. The simple
example presented, controls processing of the 'Conan' librarian script
script+ /conan* /cgi-bin/conan* throttle=5
script+ /help* /cgi-bin/conan* throttle=5
meaning 5 of each are processed while others are queued until processing
becomes available. The main issue with this is, while it can be somewhat CPU
conservative, each queued request has been network connected, the request
processed, and is consuming memory and associated resources while queued.
(Also, was never really designed for network traffic control.)
With DECUServe.org, which was labouring, we constrained some paths to say 5
processing. Under siege, as the server sometimes seems lately, there were
observed 30, 40, 50+ queued requests. These often were abandoned by clients
and still needed to be run down, resources released, and so forth. Generally
lots of eventually non-productive effort.
The solution is to add another constraint to the 'throttle'. This provides a
limit to the number allowed to be queued. After this number the response is
just 503 busied. Still has to be rundown and resources released but is a
pressure release valve of sorts (continuing the throttle metaphor :-)
script+ /conan* /cgi-bin/conan* throttle=5,,,10
script+ /help* /cgi-bin/conan* throttle=5,,,10
https://wasd.vsm.com.au/wasd_root/wasdoc/config/#requestthrottl...
This did improve the situation appreciably.
CPU Exhaustion
~~~~~~~~~~~~~~
DECUServe.org is a 4 CPU, 2048MB emulated Alpha, hosted by PSC, with software
support from AVTware and VSI, using voluntary admin staff. A modest machine
by any contemporary measure.
https://decuserve.org
With the onslaught described in the opening reference, a single CPU was often
being driven 100%, particularly as many of the harvesters and other
bloody-minded 'bots seemed single request engines. Each request required a
complete TLS (Secure Socket) handshake to initiate, the creation and
processing of the request and response, and then rather than persist and use
the connection again for another request, the connection would be torn down.
Over and over and over again. Often dozens of them simultaneously.
WASD is not intrinsically multi-threaded. It has an event-driven design.
CPU concurrency is effected through multiple 'instances' of WASD concurrently
executing, comparable in many ways to VMS clustering.
A cluster is often described as a loosely-coupled, distributed operating
environment where autonomous processors can join, process and leave (even
fail) independently, participating in a single management domain and
communicating with one another for the purposes of resource sharing and high
availability.
Similarly WASD instances run in autonomous, detached processes (across one or
more systems in a cluster) using a common configuration and management
interface, aware of the presence and activity of other instances (via the
Distributed Lock Manager and shared memory), sharing processing load and
providing rolling restart and automatic 'fail-through' as required.
[unpaid advertisement concluded]
So DECUServe.org was made into a single system, multi- WASD instance server.
https://wasd.vsm.com.au/wasd_root/wasdoc/features/#serverinstan...
HOW?
~~~~
By editing the WASD_CONFIG_GLOBAL directive [InstanceMax] (0 becomes 2 in
this case) and restarting. The current server shuts down and restarts,
noting the changed directive. A few seconds later another WASD process
starts. We now have two instances on the system.
000059BF WASD1:80 CUR 0 6 31191178 1 01:02:35.92 9557 9502
000059CB WASD2:80 CUR 1 6 31113178 1 01:02:28.20 9839 9719
The global common is shared using multiple mutexes to coordinate access and
incoming network connections are distributed between the instances using a
round-robin algorithm.
HTTPDMON utility swaps between each instance with a status summary of all
instances shown below the most recent request line.
┊ Process: WASD2:80 PID: 000059CB User: HTTP$SERVER Version: 12.3.6
┊ Up: 2 00:13:43.09 CPU: 1 01:03:47.18 Startup: 2
┊ Pg.Flts: 9839 Pg.Used: 17% WsSize: 208096 WsPeak: 155504
┊ AST: 3916/4000 BIO: 3928/4000 BYT: 4167296/4221504 DIO: 1991/2000
┊ ENQ: 937/1000 FIL: 384/400 PRC: 100/100 TQ: 498/500
┊
┊ Request: 992532 Current: 98/67/0/0 Throttle: 50/14/3% Peak: 120/106
Note that the server displayed (WASD2:80) has been up just on 48 hours, has
used roughly 25 hours of CPU (!), and the consolidated server (WASD1:80 plus
WASD2:80) processed just under one million requests, with 120 peak
connections and 106 peak concurrent requests.
┊ Request: GET /anon/htnotes/xtract?f1=HARDWARE_HELP&f2=15.3
┊
┊ Instance Ago Up Count Exit Status Version /Min /Hour
┊ EISNER::WASD1:80 15s 2d 1 12.3.6 169 10989
┊ EISNER::WASD2:80 14s 2d 1 12.3.6 152 11091
https://wasd.vsm.com.au/wasd_root/wasdoc/features/#status
Works in Practice
~~~~~~~~~~~~~~~~~
DECUServe.org seems much more stable and responsive under more extreme loads
with this 'simple' adaptation. As described above, each instance manages its
own scripts, RTEs and the like, further distributing the load across the
system. As the somewhat elementary measures of MONITOR PROCESS /TOPCPU and
MONITOR SYSTEM illustrate, CPU consumption is shared between processes WASD1
and WASD2, along with allied processes, as are other resources consumed.
┊ OpenVMS Monitor Utility
┊ TOP CPU TIME PROCESSES
┊ on node EISNER
┊ 7-NOV-2025 21:35:15.03
┊
┊ 0 25 50 75 100
┊ + - - - - + - - - - + - - - - + - - - - +
┊ 000088C9 WASD2:80 98 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 00008CC1 WASD1:80 96 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 00008846 BATCH_533 19 ▒▒▒▒▒▒▒
┊ 00008CCD /help+8CCD 6 ▒▒
┊ 000088C7 WASD1:80_88C7 4 ▒
┊ 000088CB WASD2:80_88CB 4 ▒
┊ 00008CC3 /help+8CC3 3 ▒
┊ 000090B4 /htnotes_90B4 3 ▒
┊ 00007C5A WASD1:80_7C5A 3 ▒
┊ 00009088 /htnotes_9088 1
┊ 0000041C MULTINET_SERVER 1
┊ 00009002 NOTES$0216_1 1
┊ 000088CE /conan+88CE 1
┊ 00000419 NETACP 1
┊ 00009208 NOTES$0010_1
┊ + - - - - + - - - - + - - - - + - - - - +
┊ Node: EISNER OpenVMS Monitor Utility 7-NOV-2025 21:35:14
┊ Statistic: CURRENT SYSTEM STATISTICS
┊ Process States
┊ ┌ CPU Busy (224) ─┐ LEF: 65 LEFO: 0
┊ │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ │ HIB: 40 HIBO: 0
┊ CPU 0 ├──────────────────────────┤ 400 COM: 0 COMO: 0
┊ │▒▒▒▒▒▒▒ │ PFW: 0 CUR: 2
┊ └──────────────────────────┘ MWAIT: 1 Other: 0
┊ Cur Top: WASD2:80 (98) Total: 108
┊
┊ ┌ Page Fault Rate (562) ─┐ ┌ Free List Size (35114) ┐
┊ │▒▒▒▒|▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│ │▒▒▒ │ 256K
┊ MEMORY 0 ├──────────────────────────┤ 500 0 ├──────────────────────────┤
┊ │▒▒▒▒▒▒▒ │ │▒▒▒▒▒▒▒▒▒▒ │ 26K
┊ └──────────────────────────┘ └ Mod List Size (10448) ┘
┊ Cur Top: WASD2:80_29DA (152)
┊
┊ ┌ Direct I/O Rate (306) ─┐ ┌ Buffered I/O Rate (1405)─┐
┊ │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ │ │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
┊ I/O 0 ├──────────────────────────┤ 500 0 ├──────────────────────────┤ 500
┊ │▒▒▒▒ │ │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ │
┊ └──────────────────────────┘ └──────────────────────────┘
┊ Cur Top: MULTINET_SERVER (86) Cur Top: WASD2:80 (312)
These snaps were taken during a busy period. Yet a third instance needed?
Let's see. Editing [InstanceMax] from 2 to 3 and restarting.
┊ 0 25 50 75 100
┊ + - - - - + - - - - + - - - - + - - - - +
┊ 0008F208 WASD3:80 86 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 000454B5 WASD2:80 84 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 0004C880 WASD1:80 80 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 000960C1 /conan+60C1 22 ▒▒▒▒▒▒▒▒
┊ 000990C2 /help+90C2 11 ▒▒▒▒
┊ 0009A20C SERVER_000C 2
┊ 0000041C MULTINET_SERVER 1
┊ 0008CCCF /conan+CCCF 1
┊ 00005458 /help+5458
8< snip 8<
┊ + - - - - + - - - - + - - - - + - - - - +
Accessing DECUServe.org *feels* much more responsive. If a real,
in-production server, would benefit from an underlying hardware upgrade.
By way of contrast, here it is more typically.
┊ 0 25 50 75 100
┊ + - - - - + - - - - + - - - - + - - - - +
┊ 000454B5 WASD2:80 46 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 0004C880 WASD1:80 44 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 0008F208 WASD3:80 34 ▒▒▒▒▒▒▒▒▒▒▒▒▒
┊ 000B0154 /conan+0154 7 ▒▒
┊ 000AFD46 /conan+FD46 6 ▒▒
┊ 000095F8 PTSMTP 0001i 3 ▒
┊ 0000041C MULTINET_SERVER 2
┊ 0008C4E5 <SMTP-04> 1
┊ 00005458 /help+5458 1
┊ 000B1896 SSHD 0000
┊ 00000423 NTP_SERVER
8< snip 8<
┊ + - - - - + - - - - + - - - - + - - - - +
Ad Hoc Adjustment
~~~~~~~~~~~~~~~~~
Should further CPU be required, perhaps due to a spike in processing, and
further physical CPUs be available to service that demand, more CPU(s) may
simply be added, by setting the appropriate Server Admin "Instance [Max]
[CPU] [1] [2] [3] ... [8]" button, and a [Restart], resulting in a rolling
restart, up to the new number of CPUs. The converse resumes original
processing.
OR YOU CAN JUST TRY removing access to the in-demand resource(s)
~~~~~~~~~~~~~~~~~~~ (at least to see if that materially improves performance)
https://decuserve.org/help
┊ 0 25 50 75 100
┊ + - - - - + - - - - + - - - - + - - - - +
┊ 00000443 WASD1:80 15 ▒▒▒▒▒▒
┊ 00000472 WASD3:80 11 ▒▒▒▒
┊ 0000045C WASD2:80 7 ▒▒
┊ 00000401 SWAPPER 1
┊ 0000DD4F <SMTP-02>
┊ 00000454 <SMTP-03>
┊ 0000044F <POP3-01>
┊ 00000422 MULTINET_SERVER
┊ 00000411 SECURITY_SERVER
8< snip 8<
┊ + - - - - + - - - - + - - - - + - - - - +
PS. As described in the earlier posting
https://wasd.vsm.com.au/info-WASD/2025/0009
server activity can always be checked ...
https://decuserve.org/httpd/-/admin/report/activity
https://decuserve.org/httpd/-/admin/report/activity?of=4
https://decuserve.org/httpd/-/admin/report/activity?of=24
PPS. DECUServe WASD performance data is always available using:
https://decuserve.org/cgiplus-bin/alamode
https://decuserve.org/cgiplus-bin/mondesi
This item is one of a collection at
https://wasd.vsm.com.au/other/#occasional
|