One of my clients recently went through a very painful troubleshooting process with OCS 2007R2 Group Chat whereby the Channel Service would stop after minimal load was presented to it. We would get the following error in the Group Chat logs:
System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.Rtc.Internal.Chat.Server.ServerCommon.User.ChannelParticipantsMap.YieldForEachNativeEndpoint(IServerChannel channel, PermissionType permType, Predicate`1 predicate)
at Microsoft.Rtc.Internal.Chat.Server.Channel.MessageProcessors.GroupChatProcessor.Process(GroupChatMessage command, ProcessStatus status)
at Microsoft.Rtc.Internal.Chat.Server.Channel.MessageProcessors.MessageProcessor`2.Microsoft.Rtc.Internal.Chat.Server.Channel.MessageProcessors.IMessageProcessor.ProcessInSafeMode(ITransportMessage transportMessage, ProcessStatus status)
After weeks of troubleshooting with Microsoft it turns out that root cause of the issue was due to the Group Chat Logging Level set to up to DEBUG. The default is INFO but we had turned up the logging level during the initial burn in period to troubleshoot another issue and had left it that way. In fact Microsoft had asked us to turn it up to DEBUG as well to troubleshoot this issue as Group Chat doesn’t really log much relevant into to the Application Event Log.
The issue was escalated up to Product Development and they were able to reproduce the issue citing the logging level actually creating a situation while under heavy loads would terminate the Channel Service. Apparently it was related to Group Chat functions that attempt to determine if client is native or remote. If NULL then it impacts the Channel Service if Logging is above DEBUG level causing a QUIT PROCESS to being the graceful shutdown.
The immediate workaround is to lower logging level to INFO for the Channel Service which can be done on the Group Chat Server Configuration Tool as shown below.
Microsoft said that this issue will be fixed in CU11 which is slated for sometime in early 2012.
I haven’t confirmed if this also affects Lync 2010 Group Chat but suspect it does given the little differences between the versions.