Recently a large Skype for Business customer experienced a critical outage across their entire infrastructure. Both their paired pools had users start complaining about not being able to join meetings as well as some not being able to sign in whatsoever. Users that were signed in experienced presence issues and could not send IMs to some users.
The event logs were filled with the following errors occurring on all Skype Front End Servers:
Alert: [Skype] A server did not respond to HTTP request
Source: Certificate Provisioning Component [SfBFE03.contoso.com]
Path: SkypeFEServer03.contoso.com
Last modified by: System
Last modified time: 3/15/2017 9:57:19 PM Alert description: A server did not respond to HTTP request
Server SkypeFEServer02.contoso.com did not respond to HTTP request GetPublishedCertRequest targeted at https://SkypeFEServer02.contoso.com:444/LiveServer/UserPinService.
Cause: Server might be down or the network path between servers might not be properly configured.
Resolution:
Please ensure that the server can be connected on the target port using telnet and then re-try.
As well as the following errors:
Alert: [Skype] Conferencing Attendant lost connection with the Skype for Business Server 2015 Front End.
Source: Conferencing Auto Attendant Component [SkypeFEServer01.contoso.com]
Path: SkypeFEServer01.contoso.com
Last modified by: System
Last modified time: 3/15/2017 10:35:37 PM Alert description: Conferencing Attendant lost connection with the Skype for Business Server 2015 Front End.
Front End=SkypePOOL01.contoso.com.
Cause: This issue may occur due to network connectivity, DNS lookup failure, TCP failure or issues on the remote server.
Resolution:
Resolve network connectivity issues between Conferencing Attendant and the Skype for Business Server 2015 Front End.
Please see the 'Product Knowledge' and the 'Alert Context' tab on Alert Properties view for more information.
And finally this error complaining about the pool certificate:
Invalid incoming HTTPS certificate.
Subject Name: SkypePool01.contoso.com Issuer: RapidSSL SHA256 CA
Cause: This can happen if the HTTPS certificate has expired, or is untrusted. The certificate serial number is attached for reference.
Resolution:
Please check the remote server and ensure that the certificate is valid. Also ensure that the full certificate chain of the Issuer is present in the local machine.
This was really the first good clue as to where the root of the issue was. All of the Skype related certificates were valid and placed in the correct certificate stores. Given this was a large enterprise Microsoft PSS was engaged early on and luckily had seen a recently logged issue with a McAfee Antivirus update that placed two certificates in the Trusted Root Certification Authorities store. This customer was running McAfee AV on their Skype servers. PSS had indicated they had seen a few of these cases recently logged where the improperly placed certificates caused Skype for Business issues. Moving the certificates to the appropriate store which in this case was the Intermediate Certification Authorities store was the resolution. This needed to be done on all Front End and Edge Servers. Once completed the errors ceased and users could resume normal functionality.
McAfee has since posted a KB on this: https://kc.mcafee.com/corporate/index?page=content&id=KB87705
We used the following PowerShell to locate the improperly placed certificates:
Get-Childitem cert:\LocalMachine\root -Recurse | Where-Object {$_.Issuer -ne $_.Subject} | Format-List * | Out-File "c:\computer_filtered.txt"
This will output any certificate that doesn’t match the strict requirements of being in the Trusted Root Certification Authorities store. I would recommend running this from time to time to ensure you haven’t inadvertently placed a certificate in this store that shouldn’t belong there. If you find any you can manually move them to the correct location.
And right on cue, fellow Skype for Business MVP and PowerShell Ninja Pat Richard has just released a new script that nicely automates both the searching for and moving of improperly placed certificates to their correct location. You find out more about that script and download it over at UC Unleashed “Function: Test-InvalidCerts – Ensuring Certificates Are In The Correct Certificate Store”