TL;DR "For the Snark was a Boojum, you see." As described in the June 'occasional thought' "Have been busy reviewing OpenSSL 3.0 cipher suite specification(s) for TLSv3.0 [sic], as prompted by a private comment from a WASD user. The changes were really quite straight-forward but of course now I cannot get a (possibly related) function to work at all!! All the way back to 11.5.1 under WASD SSL 1.1.1k it does not work. Obviously something stoopid in my development system(s) setup but after working two weeks (with close assistance from a site running the same development code without issue) banging my head against a concrete desktop, have taken some much needed respite." https://wasd.vsm.com.au/info-WASD/2023/0087 I was verifying X509 authentication after making changes to the TLS/SSL code. Or at least trying to. Failure at a consistent WATCH [x]SSL report item |13:54:52.10 SESOLA 3768 000002 SSL SSL routines internal error (s3_enc.c:355)| |13:54:52.10 SESOLACL 1281 000002 SSL STATE 23 "SSLERR" error cert:FALSE| I tried to copy all of the various SSL related settings relayed from another site field testing the same code without issue; protocol versions, cipher lists, peer verification depths, peer verification CA list, X509 authentication rules, etc. No joy! After a two week breather I took up the offer of a ZIP file containing the site's [LOCAL] and [STARTUP] contents (no real security issue with this as it is a build and test setup of WASD for a separate, third-party commercial site). The idea being to emulate the working field test site as completely as possible to establish whether 'something else' was to blame. $ SET DEFAULT WASD_ROOT:[000000] $ RENAME LOCAL.DIR LOCAL_.DIR $ RENAME STARTUP.DIR STARTUP_.DIR $ UNZIP .ZIP ... $ @[STARTUP]STARTUP Hmmm. Nothing. KLAATU$ TYPE WASD_SERVER_LOGS:KLAATU_20230710100022.LOG;1 Error opening primary input file SYS$INPUT Insufficient privilege or file protection violation Easiest way to investigate such issues $ REPLY /ENABLE $ SET AUDIT /ALARM /ENABLE=ACCESS=FAIL=ALL then $ @[STARTUP]STARTUP Understandable failure accessing the (just UNZIPed) startup procedure(s). $ SECHAN /ASIF=STARTUP [000000]STARTUP.DIR $ SECHAN /ASIF=STARTUP [STARTUP]*.* $ @[STARTUP]STARTUP At least the server started. Process WASD:888 now instantiated with no unexpected failure alarms. $ SET AUDIT /ALARM /DISABLE=ACCESS=FAIL=ALL Access the server Admin Menu. 403 reported. $ HTTPD /DO=AUTH=SKELKEY=_: Access the server Admin Menu and have look around. There were mapping and authorisation rules that required some adjustment. Then restart the server. Access WATCH using the default items plus [x]Authorisation and [xlSSL and then exercising the failing $ openssl s_client -cert client.pem -tls1_2 -state -nbio -connect host.name:443 |Enter pass phrase for client.pem: |CONNECTED(00000003) |Turned on non blocking io | |SSL_connect:before SSL initialization |SSL_connect:SSLv3/TLS write client hello 8< snip 8< |New, TLSv1.2, Cipher is ECDHE-RSA-AES128-SHA256 |Server public key is 2048 bit |Secure Renegotiation IS supported 8< snip 8< and giving it the test request |GET /x509/cgi-bin/cgi_symbols HTTP/1.0 | | |SSL_connect:SSL negotiation finished successfully |SSL_connect:SSL negotiation finished successfully |SSL_connect:SSLv3/TLS read hello request |SSL_connect:SSLv3/TLS write client hello 8< snip 8< |SSL_connect:SSLv3/TLS write client certificate |SSL_connect:SSLv3/TLS write client key exchange |SSL_connect:SSLv3/TLS write certificate verify |SSL_connect:SSLv3/TLS write change cipher spec |SSL_connect:SSLv3/TLS write finished 8< snip 8< |----------- |CGI symbols |----------- | | WWW_AUTH_ACCESS == "READ+WRITE" | WWW_AUTH_REALM == "X509" | WWW_AUTH_REALM_DESCRIPTION == "X509 Client Certs" | WWW_AUTH_REMOTE_USER == "237B0FE81A14217A8BDD4378FC66BD3E" | WWW_AUTH_TYPE == "X509" | WWW_AUTH_USER == "MGD" | WWW_AUTH_X509_CIPHER == "ECDHE-RSA-AES128-SHA256" | WWW_AUTH_X509_FINGERPRINT == "23:7B:0F:E8:1A:14:21:7A:8B:DD:43:78:FC:66:BD:3E" | WWW_AUTH_X509_ISSUER == "/C=AU/ST=SA/L=Adelaide/O=WASD CA Cert/OU=OpenSSL 1.0.2o Testing Only/CN=WASD VMS Web Services/emailAddress=Mark.Daniel@wasd.vsm.com.au" | WWW_AUTH_X509_KEYSIZE == "128" | WWW_AUTH_X509_SUBJECT == "/C=AU/ST=SA/L=Adelaide/O=WASD/OU=OpenSSL 1.0.2o Testing Only/CN=mark.daniel@wasd.vsm.com.au/emailAddress=Mark.Daniel@wasd.vsm.com.au" 8< snip 8< *** SUCCESS *** Hurrump! What then? Nothing intrinsic to the test-bench. W-h-a-t? Swapping the [LOCAL] and [STARTUP] directories back again and restarting the server. Exactly the same executable. Using the same s_client setup. $ openssl s_client -cert client.pem -tls1_2 -state -nbio -connect host.name:443 |Enter pass phrase for client.pem: |CONNECTED(00000003) 8< snip 8< |GET /x509/cgi-bin/cgi_symbols HTTP/1.0 | | |SSL_connect:SSL negotiation finished successfully |SSL_connect:SSL negotiation finished successfully |SSL_connect:SSLv3/TLS read hello request |SSL_connect:SSLv3/TLS write client hello |SSL_connect:error in SSLv3/TLS write client hello |read R BLOCK |SSL_connect:SSLv3/TLS write client hello |SSL_connect:error in SSLv3/TLS read server hello |read R BLOCK |SSL3 alert write:fatal:decode error |SSL_connect:error in error |80CEBC7B00000000:error:0A000126:SSL routines:(unknown function):unexpected eof while reading:rec_layer_s3.c;1:321: and the server WATCH report again showing the telltale |13:54:52.10 SESOLA 3768 000002 SSL SSL routines internal error (s3_enc.c:355)| |13:54:52.10 SESOLACL 1281 000002 SSL STATE 23 "SSLERR" error cert:FALSE| *** FAILURE *** So the external site works and the test-bench doesn't, with only the config/startup directories changed. Routine examination showed nothing dissimilar in the startup procedures. Comparing the config files was a bit more involved but could not find any substantial delta in the X509 setup. The *only* other difference was the server certificate. Surely not! So the test-bench was pointed at the renamed out-of-the-way external site's cert. [ServiceSSLcert] WASD_ROOT:[LOCAL__]WASD_WEB01.pem; And restarted. SUCCESS! Reverted the [ServiceSSLcert] and restart. FAILURE. And so on and so forth. Something about the server certificate. My development site used a recently renewed Let's Encrypt cert. The external site a self-signed. Hmmm. So 'something else' *was* to blame! Time to get OpenSSL.org involved. https://github.com/openssl/openssl/issues/21422 And 24 hours later, as the advert promises, "problem solvered" ... |diff --git a/ssl/statem/statem_lib.c b/ssl/statem/statem_lib.c |index 0268ffa144..0fd71f3d4b 100644 |--- a/ssl/statem/statem_lib.c |+++ b/ssl/statem/statem_lib.c |@@ -49,7 +49,7 @@ int ssl3_do_write(SSL *s, int type) | s->init_num, &written); | if (ret < 0) | return -1; |- if (type == SSL3_RT_HANDSHAKE) |+ if (type == SSL3_RT_HANDSHAKE && written > 0) | /* | * should not be done for 'Hello Request's, but in that case we'll | * ignore the result anyway with a slightly later wrap-up from OpenSSL |In the end I went with a slightly different fix to the one I originally |proposed above. Apparently the LE cert had grown over past re-issues until it exceeded some threshold within renegotiation resulting in the observed behaviour. As explained by Matt Caswell: o TLSv1.2 or lower o The write BIO returns a "retry" response early in the handshake o A large certificate message is sent (larger than the 4k handshake buffer that is being used in the libssl) o SSL_VERIFY_PEER is set (non default for servers) o The BIO returns 0 on retry not -1. BIO_write() is documented to return -1 on retry - but there are definitely custom BIOs that don't do this (maybe some OpenSSL ones?) And the reference to the poem? "They hunted till darkness came on, but they found Not a button, or feather, or mark, By which they could tell that they stood on the ground Where the Baker had met with the Snark. In the midst of the word he was trying to say, In the midst of his laughter and glee, He had softly and suddenly vanished away-— For the Snark was a Boojum, you see." With apologies to Charles Lutwidge Dodgson. POSTSCRIPT: In re-reading this before publication I think it can and should be asked, with the WATCH report showing the distinctive OpenSSL message "internal error (s3_enc.c:355)", why not engage OpenSSL.org far earlier? Of course, hindsight is 20/20 foresight. Probably would have arrived at the same solution much earlier. However, after a few days trying to understand this behaviour, as I requested the collaboration of a site I knew had extensive experience with X509 authentication, I also wrote that if it failed there then it'd be thrown straight to OpenSSL. Of course, it didn't, and wasn't, not until the Snark was finally spied some three weeks later. This item is one of a collection at https://wasd.vsm.com.au/other/#occasional