Failover Cluster: Generic Applications Fail with OutOfMemoryException

Published on Thursday, August 14, 2014 in , , , , ,

Recently I helped a customer which was having troubles migrating from a Windows 2003 cluster to a Windows 2012 cluster. The resources they were running on the cluster consisted of many in house developed applications. There were about 80 of them and they were running as generic applications.

Due to Windows 2003 being end of life they started a phased migration towards Windows 2012 (in a test environment). At first the migration seemed to go smooth, but at a given moment they were only able to start a limited amount of applications. The applications that failed gave an Out Of Memory exception (OutOfMemoryException). Typically they could start about 25 applications, and from then on they weren’t able to start more. This number wasn’t exact, sometimes it was more, sometimes it was less.

As I suspected that this wasn’t really a failover clustering problem but more a Windows problem I googled for “windows 2012 running many applications out of memory exception”. I found several links:

HP: Unable to Create More than 140 Generic Application Cluster Resources

IBM: Configuring the Windows registry: Increasing the noninteractive desktop heap size

If the parallel engine is installed on a computer that runs Microsoft Windows Server, Standard or Enterprise edition, increase the noninteractive desktop heap size to ensure that a sufficient number of processes can be created and run concurrently

So it seems you can tweak the desktop heap size in the registry. Here is some background regarding the modification we did to the registry.

The key: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows

SharedSection has three values: (KB184802: User32.dll or Kernel32.dll fails to initialize)

  • The first SharedSection value (1024) is the shared heap size common to all desktops. This includes the global handle table, which holds handles to windows, menus, icons, cursors, and so forth, and shared system settings. It is unlikely that you would ever need to change this value.
  • The second SharedSection value (3072) is the size of the desktop heap for each desktop that is associated with the "interactive" window station WinSta0. User objects like hooks, menus, strings, and windows consume memory in this desktop heap. It is unlikely that you would ever need to change this second SharedSection value.
  • The third SharedSection value (512) is the size of the desktop heap for each desktop that is associated with a "noninteractive" window station. If this value is not present, the size of the desktop heap for noninteractive window stations will be same as the size specified for interactive window stations (the second SharedSection value).

Default on Windows 2012 seems to be 768

Raising it to 2048 seems to be a workaround/solution. A reboot is required! After this we were able to start up to 200 generic applications (we didn’t test more). However after a while there were some failures, but at first sight quite limited. This might be due to the actual memory being exhausted. Either way, we definitely saw a huge improvement.

Disclaimer: ASKPERF: Sessions, desktops and windows stations

Please do not modify these values on a whim. Changing the second or third value too high can put you in a no-boot situation due to the kernel not being able to allocate memory properly to even get Session 0 set up

Bonus info: why didn’t the customer didn’t have any issues running the same workload on Windows 2003? They configured the generic applications with “allow desktop interaction”. Something which was removed from the generic applications in Windows 2008. Because they had “allow desktop interaction” configured, generic applications were running in an interaction session and thus were not limited by the much smaller non-interactive desktop heap size.

Related Posts

2 Response to Failover Cluster: Generic Applications Fail with OutOfMemoryException

26 August, 2014 17:13

I have a similar scenario occurring on your production Server 2012 machine. Currently we are having to reboot the server every two-three days. We have noticed event id 243 in the system event log around the time the reboot is needed. What would be your recommendation?

20 September, 2014 09:19

I'm sorry, but without more information It's hard to provide advice. I would advice you to ask your question on the TechNet Forums, where you will get better assistance than here.

Add Your Comment