TL;DR "For the Snark was a Boojum, you see."
As described in the June 'occasional thought'
"Have been busy reviewing OpenSSL 3.0 cipher suite specification(s)
for TLSv3.0 [sic], as prompted by a private comment from a WASD user.
The changes were really quite straight-forward but of course now I
cannot get a (possibly related) function to work at all!! All the
way back to 11.5.1 under WASD SSL 1.1.1k it does not work. Obviously
something stoopid in my development system(s) setup but after working
two weeks (with close assistance from a site running the same
development code without issue) banging my head against a concrete
desktop, have taken some much needed respite."
https://wasd.vsm.com.au/info-WASD/2023/0087
I was verifying X509 authentication after making changes to the TLS/SSL code.
Or at least trying to. Failure at a consistent WATCH [x]SSL report item
|13:54:52.10 SESOLA 3768 000002 SSL SSL routines internal error (s3_enc.c:355)|
|13:54:52.10 SESOLACL 1281 000002 SSL STATE 23 "SSLERR" error cert:FALSE|
I tried to copy all of the various SSL related settings relayed from another
site field testing the same code without issue; protocol versions, cipher
lists, peer verification depths, peer verification CA list, X509
authentication rules, etc. No joy!
After a two week breather I took up the offer of a ZIP file containing the
site's [LOCAL] and [STARTUP] contents (no real security issue with this as it
is a build and test setup of WASD for a separate, third-party commercial
site). The idea being to emulate the working field test site as completely
as possible to establish whether 'something else' was to blame.
$ SET DEFAULT WASD_ROOT:[000000]
$ RENAME LOCAL.DIR LOCAL_.DIR
$ RENAME STARTUP.DIR STARTUP_.DIR
$ UNZIP <location><name>.ZIP
...
$ @[STARTUP]STARTUP
Hmmm. Nothing.
KLAATU$ TYPE WASD_SERVER_LOGS:KLAATU_20230710100022.LOG;1
Error opening primary input file SYS$INPUT
Insufficient privilege or file protection violation
Easiest way to investigate such issues
$ REPLY /ENABLE
$ SET AUDIT /ALARM /ENABLE=ACCESS=FAIL=ALL
then
$ @[STARTUP]STARTUP
Understandable failure accessing the (just UNZIPed) startup procedure(s).
$ SECHAN /ASIF=STARTUP [000000]STARTUP.DIR
$ SECHAN /ASIF=STARTUP [STARTUP]*.*
$ @[STARTUP]STARTUP
At least the server started.
Process WASD:888 now instantiated with no unexpected failure alarms.
$ SET AUDIT /ALARM /DISABLE=ACCESS=FAIL=ALL
Access the server Admin Menu. 403 reported.
$ HTTPD /DO=AUTH=SKELKEY=_<username>:<password>
Access the server Admin Menu and have look around. There were mapping and
authorisation rules that required some adjustment. Then restart the server.
Access WATCH using the default items plus [x]Authorisation and [xlSSL and
then exercising the failing
$ openssl s_client -cert client.pem -tls1_2 -state -nbio -connect host.name:443
|Enter pass phrase for client.pem:
|CONNECTED(00000003)
|Turned on non blocking io
|
|SSL_connect:before SSL initialization
|SSL_connect:SSLv3/TLS write client hello
8< snip 8<
|New, TLSv1.2, Cipher is ECDHE-RSA-AES128-SHA256
|Server public key is 2048 bit
|Secure Renegotiation IS supported
8< snip 8<
and giving it the test request
|GET /x509/cgi-bin/cgi_symbols HTTP/1.0<return>
|<return>
|
|SSL_connect:SSL negotiation finished successfully
|SSL_connect:SSL negotiation finished successfully
|SSL_connect:SSLv3/TLS read hello request
|SSL_connect:SSLv3/TLS write client hello
8< snip 8<
|SSL_connect:SSLv3/TLS write client certificate
|SSL_connect:SSLv3/TLS write client key exchange
|SSL_connect:SSLv3/TLS write certificate verify
|SSL_connect:SSLv3/TLS write change cipher spec
|SSL_connect:SSLv3/TLS write finished
8< snip 8<
|-----------
|CGI symbols
|-----------
|
| WWW_AUTH_ACCESS == "READ+WRITE"
| WWW_AUTH_REALM == "X509"
| WWW_AUTH_REALM_DESCRIPTION == "X509 Client Certs"
| WWW_AUTH_REMOTE_USER == "237B0FE81A14217A8BDD4378FC66BD3E"
| WWW_AUTH_TYPE == "X509"
| WWW_AUTH_USER == "MGD"
| WWW_AUTH_X509_CIPHER == "ECDHE-RSA-AES128-SHA256"
| WWW_AUTH_X509_FINGERPRINT == "23:7B:0F:E8:1A:14:21:7A:8B:DD:43:78:FC:66:BD:3E"
| WWW_AUTH_X509_ISSUER == "/C=AU/ST=SA/L=Adelaide/O=WASD CA Cert/OU=OpenSSL 1.0.2o
Testing Only/CN=WASD VMS Web Services/emailAddress=Mark.Daniel@wasd.vsm.com.au"
| WWW_AUTH_X509_KEYSIZE == "128"
| WWW_AUTH_X509_SUBJECT == "/C=AU/ST=SA/L=Adelaide/O=WASD/OU=OpenSSL 1.0.2o Testing
Only/CN=mark.daniel@wasd.vsm.com.au/emailAddress=Mark.Daniel@wasd.vsm.com.au"
8< snip 8<
*** SUCCESS ***
Hurrump! What then? Nothing intrinsic to the test-bench. W-h-a-t?
Swapping the [LOCAL] and [STARTUP] directories back again and restarting the
server. Exactly the same executable. Using the same s_client setup.
$ openssl s_client -cert client.pem -tls1_2 -state -nbio -connect host.name:443
|Enter pass phrase for client.pem:
|CONNECTED(00000003)
8< snip 8<
|GET /x509/cgi-bin/cgi_symbols HTTP/1.0<return>
|<return>
|
|SSL_connect:SSL negotiation finished successfully
|SSL_connect:SSL negotiation finished successfully
|SSL_connect:SSLv3/TLS read hello request
|SSL_connect:SSLv3/TLS write client hello
|SSL_connect:error in SSLv3/TLS write client hello
|read R BLOCK
|SSL_connect:SSLv3/TLS write client hello
|SSL_connect:error in SSLv3/TLS read server hello
|read R BLOCK
|SSL3 alert write:fatal:decode error
|SSL_connect:error in error
|80CEBC7B00000000:error:0A000126:SSL routines:(unknown function):unexpected
eof while reading:rec_layer_s3.c;1:321:
and the server WATCH report again showing the telltale
|13:54:52.10 SESOLA 3768 000002 SSL SSL routines internal error (s3_enc.c:355)|
|13:54:52.10 SESOLACL 1281 000002 SSL STATE 23 "SSLERR" error cert:FALSE|
*** FAILURE ***
So the external site works and the test-bench doesn't, with only the
config/startup directories changed. Routine examination showed nothing
dissimilar in the startup procedures. Comparing the config files was a bit
more involved but could not find any substantial delta in the X509 setup.
The *only* other difference was the server certificate. Surely not! So the
test-bench was pointed at the renamed out-of-the-way external site's cert.
[ServiceSSLcert] WASD_ROOT:[LOCAL__]WASD_WEB01.pem;
And restarted. SUCCESS! Reverted the [ServiceSSLcert] and restart.
FAILURE. And so on and so forth. Something about the server certificate.
My development site used a recently renewed Let's Encrypt cert. The external
site a self-signed.
Hmmm. So 'something else' *was* to blame!
Time to get OpenSSL.org involved.
https://github.com/openssl/openssl/issues/21422
And 24 hours later, as the advert promises, "problem solvered" ...
|diff --git a/ssl/statem/statem_lib.c b/ssl/statem/statem_lib.c
|index 0268ffa144..0fd71f3d4b 100644
|--- a/ssl/statem/statem_lib.c
|+++ b/ssl/statem/statem_lib.c
|@@ -49,7 +49,7 @@ int ssl3_do_write(SSL *s, int type)
| s->init_num, &written);
| if (ret < 0)
| return -1;
|- if (type == SSL3_RT_HANDSHAKE)
|+ if (type == SSL3_RT_HANDSHAKE && written > 0)
| /*
| * should not be done for 'Hello Request's, but in that case we'll
| * ignore the result anyway
with a slightly later wrap-up from OpenSSL
|In the end I went with a slightly different fix to the one I originally
|proposed above.
Apparently the LE cert had grown over past re-issues until it exceeded some
threshold within renegotiation resulting in the observed behaviour.
As explained by Matt Caswell:
o TLSv1.2 or lower
o The write BIO returns a "retry" response early in the handshake
o A large certificate message is sent (larger than the 4k handshake
buffer that is being used in the libssl)
o SSL_VERIFY_PEER is set (non default for servers)
o The BIO returns 0 on retry not -1. BIO_write() is documented to
return -1 on retry - but there are definitely custom BIOs that don't
do this (maybe some OpenSSL ones?)
And the reference to the poem?
"They hunted till darkness came on, but they found
Not a button, or feather, or mark,
By which they could tell that they stood on the ground
Where the Baker had met with the Snark.
In the midst of the word he was trying to say,
In the midst of his laughter and glee,
He had softly and suddenly vanished away-—
For the Snark was a Boojum, you see."
With apologies to Charles Lutwidge Dodgson.
POSTSCRIPT: In re-reading this before publication I think it can and should
be asked, with the WATCH report showing the distinctive OpenSSL message
"internal error (s3_enc.c:355)", why not engage OpenSSL.org far earlier?
Of course, hindsight is 20/20 foresight. Probably would have arrived at the
same solution much earlier. However, after a few days trying to understand
this behaviour, as I requested the collaboration of a site I knew had
extensive experience with X509 authentication, I also wrote that if it failed
there then it'd be thrown straight to OpenSSL. Of course, it didn't, and
wasn't, not until the Snark was finally spied some three weeks later.
This item is one of a collection at
https://wasd.vsm.com.au/other/#occasional
|