Monday, May 11, 2009

ALUI Webcenter Grid Search: Maintaining the search cluster repository without loss of service

Sometimes, for maintenance reasons, the cluster repository has to be unavailable for a short amount of time (I.e. the NAS on which it is hosted is patched and need to be restarted, or the central search cluster repository needs to be moved from one server to another, etc…).

The main problem is that search relies heavily on the availability of search cluster repository in order for the portal content to be properly indexed (for example, the cluster registers any new index delta and ensures that these delta are redistributed to each node’s local index, that way the search nodes are always in sync with one another). Unfortunately, I’ve noticed many times that when the cluster becomes not available for as short as a couple of seconds, the search infrastructure does NOT handle this gracefully…

At best, the nodes all go in “read-only” mode automatically (meaning the nodes act only as query service instead of query+index service), corrupting most of the time the process handles of the node being the “indexer” at the time of disconnection (you’ll see “invalid handles” errors in the search status screen for example)…at worse, all the nodes shutdown with a great “out of memory” error. Both scenario is not a good one, since it will not go back in run mode automatically after the disruption is over, and will probably require a overall restart of nodes.

If you need to do this in PROD where uptime is usually a strong requirement, then the idea is to do such an operation without jeopardizing the search capabilities of your portal site(s). Indeed, search being so central to portal, when down, many portlets or components relying on search will be down, and overall service will be pretty degraded.

Fortunately, search comes with a powerful admin utility: cadmin.exe. You can find it on any of your search nodes, usually at the following path:

<pt_search_home>/bin/native/cadmin.exe

Using the tool, you can gracefully put all the search nodes in “read-only” mode before the maintenance operation. Indeed, when the nodes are in read-only mode, each node act as a “disconnected” query service, providing search results solely based on their local index. While in that state, the search cluster can be fully unavailable… and apart from any new content not being indexed, end users will not see any search disruption.

So here are the commands you would want to perform, either manually or in a batch:

cadmin runlevel readonly –-> this puts all the nodes of the cluster in readonly mode
cadmin status –-verbose –-> this give you the status of the cluster. useful to make sure the previous operation worked as expected.

…perform your maintenance operation…

cadmin runlevel run –> put all the nodes on the cluster back in run mode
cadmin status –verbose –> sanity check…

What you would want to do before any search maintenance is perform a search checkpoint (in other word search backup). 2 options: doing it manually using the Admin UI, or using the cadmin tool as follows:

cadmin checkpoint --create

As an extension of this, you can check out the other operations you can perform with the cadmin tool (pretty much everything you can do with the search cluster admin UI, with more power added to it) by entering cadmin –help

Hope that helps. Until next time, take care!

1 comment:

  1. Hi Fabien, I am consulting in Portal Solutions in Colombia - South America. In this moment my company needs a training about WebCenter Interaction (ALUI).
    Do you provide training services?
    We are searching training from someone that has experience in real projects with ALUI.
    My mail is jvasquez@softbolivar.com
    Thanks.

    ReplyDelete