Monday, June 30, 2008

ALUI Publisher - Part 2: Increase Performance by enabling REAL Caching

In my previous post http://fsanglier.blogspot.com/2008/02/alui-publisher-increase-performance.html(man, already couple of month ago...I know I've been sucked into a black hole since then :) ) I was talking about how to best design a scalable and redundant ALUI Publisher architecture. But what I had not pushed enough in the last article was performance.

In my opinion, the Publisher standard behavior for serving content has some performance flaw (at least in ALUI 6.1 - Publisher 6.4) related to caching... Since there are ways around this (that's what we do, right?), I think you might benefit from this a lot, and in my last implementation, the change explained below increased performance under load to complete new levels (in the meantime reducing DB requests and DB and publisher redirector CPU utilization)...So here it goes...

As I explained in previous post, each published content portlet within a portal page are going to make a request to the Publish Content Redirector component, and particularly the JSP page in charge of performing the redirect: published_content_redirect.jsp (you can see that defined in the Published Content portal web service object). As the name says, this Java Server Page (JSP) performs a 302 redirect to the published content item (as seen in previous post, this location should be served by your favorite web server, apache or IIS for example...). But BEFORE making this redirect, it must know where to redirect to... so for that, the code makes a DB request to the publisher Database in order to get the browsing path to the published content item (passing the publisher content item ID that was saved when you created your published content portlet earlier).

Ok so here is the first flaw I was taking about in previous paragraph: To be able to see some content through the publisher portlets within the portal page, you can see from the above explanation that multiple calls to the publisher DB will be made. Let's say we have 5 publisher portlets on the page (not uncommon), for each page rendering for 1 user, 5 DB calls will be made to the publisher DB (in addition of multiple other DB calls for portal and analytics). If we are now talking about thousands of users, we are talking about too many DB calls to simply see some content that does not change often. This has an unnecessary impact on DB load of your infrastructure, and will increase the page load consequently since DB calls are inherently slower that simple content rendering...

While reading, some of view are already thinking for a good reason: CACHING! Yeah, indeed, caching is the secret to scaling and performance (not always necessary to take out the big bucks and supercharge even more the DB infrastructure). And great for us, the portal offers a great out-of-the-box caching capability within the web service object: simply set the minimum caching to 2 hours, max to 20 days, and normally, you would think that the portal should simply cache the published content for that amount of time...removing the need to call the published_content_redirect.jsp altogether, and thus the need to make a DB call!! ...but it does not happen this way. You don't believe me? Enable access logging on publish content redirector components (un-comment "Access logger" section within <ALUI HOME>\ptcs\6.4\container\deploy\jbossweb-tomcat50.sar\server.xml) and you will clearly see that even though your page contains only published content portlets that should be cached, the published_content_redirect.jsp  is still constantly called...and thus caching is not really....caching.

Why does this happen? It is because of the redirect mechanism for serving content...From www.w3.org, 302 is explained this way: "The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field." Thus, the portal is doing is job perfectly (it acts as a client here) and will never cache a response with a temporary redirect 302 status code.

Ok so what can make it better? Changing the redirect mechanism to something that does not redirect...Well, using this other JSP actually already present and developed within ALUI 6.4 (I have not verified if present in earlier versions of publisher): published_content_noredirect.jsp. Instead of issuing a redirect to the published content, this jsp performs an HTTPGetRequest to it and write its content to the JSP output stream. The response code is now a simple 200 OK ("The request has succeeded" - www.w3.org) that can be cached by the portal. To enable this, simply change the HTTP url of the Published Content Web Service Object to be published_content_noredirect.jsp instead of published_content_redirect.jsp. Of course, check out the publisher redirector access logs to see the dramatic difference...under load, you will initially see a bunch of requests to published_content_noredirect.jsp, but very very fast, the access log becomes silent, all the content being really cached by portal...

Result?? You can increase the load even more, and the page response satys the same (or is even better), and the DB utilization is not altered by that load...you simply have a site that is so much more performant. Our initial results showed that under constant intensive load (with and without the change) the publisher infrastructure would not crash anymore, the CPU usage of both DB and publisher redirectors would be consequently diminished, the number of DB requests would drop, and the load could actually be increased to new levels without loss of performance and functionality, thanks to caching.

Since this post is already long, I am going to stop here for now...Please read next post (http://fsanglier.blogspot.com/2008/06/alui-publisher-part-2-increase_30.html) to understand the second flaw: the published_content_noredirect.jsp has a truncation bug...that is fixable of course :)

So long!

14 comments:

  1. Geoff,

    On your last comment, you were questioning the fact that such simple setup could work, and especially why BEA would not have planned for this...

    Basically the only bug I saw is related to relative urls within publisher...these relative urls cannot be handled correctly since the displayed item is now displayed through a JSP (the relative link is not correct in that case...)...i will update my post to talk further about this, and how to fix this.

    ReplyDelete
  2. Fabien,

    We are experiencing redirector issues. Using the noredirect doesn't work, because our content is served from a different server. Is this the same "bug" as the relative URL issue you mention in a comment reply to geoff? If so, do you have a fix for this?

    ReplyDelete
  3. Hey! Sorry I deleted my comment, its been a while now and I have no idea what I had written or why I deleted it!

    ReplyDelete
  4. Daryl,


    The fact that the content is served from another server does not matter for the noredirect method.
    Basically, what the noredirect does is perform a HttpRequest to your content, stream it back and display it in the output of noredirect.jsp.


    It should work for you.

    the bug i am talking about in previous comment is related to the possibility that you might have relative links going from one publisher item to another...those links are the things that will not work with the noredirect. I have not fixed that yet because i don't have such links :)

    ReplyDelete
  5. Fabien,

    The issue is that our images within the posting are relative to the content item html. Therefore, the images are rendered in the browser relative to the web service http path, which is wrong. The path needs to be relative to the redirect path.

    Thanks for your replies.

    ReplyDelete
  6. hi fabien,
    i am a newbie and m trying to migrate webservices published on alui 6.1 to alui 6.5. but i am finding problem doing this...i published it using 6.5 dlls but it is giving me Configuration error saying that following line of web.config has an error:
    sectionGroup name="system.web.extensions" type="System.Web.Configuration.SystemWebExtensionsSectionGroup, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"
    Please help me asap...thanks a lot.

    ReplyDelete
  7. Hey Nisha,

    system.web.extension library is only for DotNet Framework 3.5...what might be happening is that you are not running your portlet app using the correct framework (most likely using dotnet 2.0), and thus the error. Either you remove this line in your webconfig, and run using dotnet 2.0, or you need to change your portlet app to use dotnet 3.5. Hope that helps.

    ReplyDelete
  8. Fabien,
    What about adding Expires and Cache-Control headers to published_content_redirect.jsp? Does that work, or do the headers get lost when the redirect happens?

    ReplyDelete
  9. To answer my own question: no, adding Cache-Control or Expires headers does not cause the portal to cache the 302 response from published_content_redirect.jsp. And if you change it to a 301 (permanent redirect) response, you get an error that the portlet has redirected outside the gateway space. (I spent a lot of time last night trying these things.)

    So, back to published_content_noredirect.jsp. There's no need to create new char[] or String objects each time through the loop, so I made these changes:

    char[] content = new char[buffersize];
    ...
    do {
    charread = bisr.read(content);
    if (charread > 0)
    out.write(content, 0, charread);
    } while(charread > -1);

    ReplyDelete
  10. hey steve,

    sorry i did not get a chance to reply to you conerning your first question...but you might not have trusted me :) so it is good you saw with your own testing that caching does not work with the simple redirect.jsp.

    As for the code, it is my bad...I actually improved it myself since my last post....but never reposted it...
    This is what i have now (no char array altogether...):

    BufferedReader bisr = null;
    try {
    bisr = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
    String line;
    while ( (line = bisr.readLine( ) ) != null ) {
    out.println(line);
    }
    }
    catch(Exception exc){
    throw exc; //to be caught by the global try catch
    } finally {
    if(bisr != null)
    bisr.close();
    bisr = null;
    }
    return;

    ReplyDelete
  11. What causes the "cannot be displayed because the portlet redirected outside of the gateway space" error, and how do we fix it?

    We have publisher items that were originally created in publisher 6.2 on a different portal.

    We migrated the data to our new portal with publisher 6.4

    our web service is set to use the default redirect.jsp

    Our publishing target matches our gateway prefix, and checks out ok.

    When we attempt to view the portlet we get the error about going outside the gateway space.

    daorr@deloitte.com

    ReplyDelete
  12. adnote: As others stated, in the content.html files they reference the images w/ relative urls.

    i.e. img src="images/mypic.jpg"

    On our clean install (no migrated publisher items), the announcements, and their images show up fine.

    ReplyDelete
  13. We republished, but since it was a production upgrade, the url has the same dns alias, which is still the same ie.

    http://serverX.mil/publishedcontent/publish/portletname/portlet.htm

    If I use the default '~redirect.jsp' I get the "out of gateway" error. If I change it to the ~noredirect.jsp then it displays the htm portion of the content, but the image errors out due to the relative url not being associated to the htm (My understanding is it's relative to the servlet?).

    url becomes https://filedisplayurl/http://serverX.mil/ptcs/images/image.jpg

    Which is missing the full path.

    Seems like there should be an easy fix.

    Have tried republishing.

    ReplyDelete
  14. Great! Glad you found it. Yeah this type of "non gateway" problem is almost always due to a mismatch between the redirected url and the url prefix in your web service object.
    The portal does the urls "gatewaying" transformation strictly based on the text entered in the "gateway prefix" (text based...nothing more clever than that)

    ReplyDelete