Monday, June 30, 2008

ALUI Publisher - Part 2: Increase Performance by enabling REAL Caching - No Redirect Bug Fix

Previously (http://fsanglier.blogspot.com/2008/06/alui-publisher-part-2-increase.html) , I've been talking (and proving I hope) that using a "no redirect" mechanism for serving published content from publisher is the best option to enable portal caching. Publisher 6.4 offers already such a possibility (although not publicized a lot): using in the portal Published Content Web Service object published_content_noredirect.jsp instead of the standard published_content_redirect.jsp.

Unfortunately, if you start using this, you are going to start seeing a weird behavior: the publish content is getting truncated in some special cases...and this is due to the way the JSP has been coded. Several options for you: either you wait for a Critical fix to be issued to you by BEA (i am not aware of one yet), or you upgrade to ALUI 6.5 (I hear that this has been fixed in 6.5...have not verified though), or you simply do it yourself, as this is a simple fix to implement (ultimately, that might be the same type of code that would be issued by a CF I imagine)

By looking at the JSP within the publisher web application archive (ptcs.war - explode the war using jar command), we can see what's wrong and why the content is truncated in some case:

HttpURLConnection conn = (HttpURLConnection)url.openConnection();

// make the request
conn.connect();

//read the content length
int contentLength = conn.getContentLength();

//if there is content, forward to the requesting client
if( contentLength > 0 ){
// UTF-8 is necessary
InputStreamReader isr = new InputStreamReader(conn.getInputStream(), "UTF-8");
char[] content = new char[contentLength];
isr.read(content);
isr.close();
out.write(content);
}


As you can see, an HTTP GET request is made, and the content length of the response is gotten from the "getContentLength()" method. This call is going to get the content length number fro mthe response header rather than actually count all the bytes that are contained in the response content. Thus, since the code base itself on this number to output the content to the JSP output stream (see above: char array of length equal to contentlength), the content will indeed be truncated if the contentlength number is not correct...



A simple correction (and more robust code) is actually to make sure ALL the content is pushed to the output stream, independently from the contentlength number returned by the response header. Here is my code below that fixes that issue, and also increase performance by using the preferred BufferedReader wrapper class instead of the bare InputStreamReader:




------EDITED 3/12/2009--------
BufferedReader bisr = null;
try {
bisr = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
String line;
while ( (line = bisr.readLine( ) ) != null ) {
out.println(line);
}
}
catch(Exception exc){
throw exc; //to be caught by the global try catch
} finally {
if(bisr != null)
bisr.close();
bisr = null;
}
return;
------END EDITED 3/12/2009--------

Basically, the code will read by chunks of 2000 chars (this is a chunk size that I think is appropriate) the entirety of the content until the last character...and write it all to the output stream...This does not rely on the contentlength at all, and thus is more reliable and robust.



After changing the published_content_noredirect.jsp as above, you can repackage the ptcs.war with the new corrected JSP (within the root of the previously extracted ptcs.war folder, run jar -cvf ptcs.war * command) and redeploy to ALL redirector and publisher instances...



Voila, you have your perfect solution for ALUI 6.1 and Publisher 6.4 (and previous versions too).

ALUI Publisher - Part 2: Increase Performance by enabling REAL Caching

In my previous post http://fsanglier.blogspot.com/2008/02/alui-publisher-increase-performance.html(man, already couple of month ago...I know I've been sucked into a black hole since then :) ) I was talking about how to best design a scalable and redundant ALUI Publisher architecture. But what I had not pushed enough in the last article was performance.

In my opinion, the Publisher standard behavior for serving content has some performance flaw (at least in ALUI 6.1 - Publisher 6.4) related to caching... Since there are ways around this (that's what we do, right?), I think you might benefit from this a lot, and in my last implementation, the change explained below increased performance under load to complete new levels (in the meantime reducing DB requests and DB and publisher redirector CPU utilization)...So here it goes...

As I explained in previous post, each published content portlet within a portal page are going to make a request to the Publish Content Redirector component, and particularly the JSP page in charge of performing the redirect: published_content_redirect.jsp (you can see that defined in the Published Content portal web service object). As the name says, this Java Server Page (JSP) performs a 302 redirect to the published content item (as seen in previous post, this location should be served by your favorite web server, apache or IIS for example...). But BEFORE making this redirect, it must know where to redirect to... so for that, the code makes a DB request to the publisher Database in order to get the browsing path to the published content item (passing the publisher content item ID that was saved when you created your published content portlet earlier).

Ok so here is the first flaw I was taking about in previous paragraph: To be able to see some content through the publisher portlets within the portal page, you can see from the above explanation that multiple calls to the publisher DB will be made. Let's say we have 5 publisher portlets on the page (not uncommon), for each page rendering for 1 user, 5 DB calls will be made to the publisher DB (in addition of multiple other DB calls for portal and analytics). If we are now talking about thousands of users, we are talking about too many DB calls to simply see some content that does not change often. This has an unnecessary impact on DB load of your infrastructure, and will increase the page load consequently since DB calls are inherently slower that simple content rendering...

While reading, some of view are already thinking for a good reason: CACHING! Yeah, indeed, caching is the secret to scaling and performance (not always necessary to take out the big bucks and supercharge even more the DB infrastructure). And great for us, the portal offers a great out-of-the-box caching capability within the web service object: simply set the minimum caching to 2 hours, max to 20 days, and normally, you would think that the portal should simply cache the published content for that amount of time...removing the need to call the published_content_redirect.jsp altogether, and thus the need to make a DB call!! ...but it does not happen this way. You don't believe me? Enable access logging on publish content redirector components (un-comment "Access logger" section within <ALUI HOME>\ptcs\6.4\container\deploy\jbossweb-tomcat50.sar\server.xml) and you will clearly see that even though your page contains only published content portlets that should be cached, the published_content_redirect.jsp  is still constantly called...and thus caching is not really....caching.

Why does this happen? It is because of the redirect mechanism for serving content...From www.w3.org, 302 is explained this way: "The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field." Thus, the portal is doing is job perfectly (it acts as a client here) and will never cache a response with a temporary redirect 302 status code.

Ok so what can make it better? Changing the redirect mechanism to something that does not redirect...Well, using this other JSP actually already present and developed within ALUI 6.4 (I have not verified if present in earlier versions of publisher): published_content_noredirect.jsp. Instead of issuing a redirect to the published content, this jsp performs an HTTPGetRequest to it and write its content to the JSP output stream. The response code is now a simple 200 OK ("The request has succeeded" - www.w3.org) that can be cached by the portal. To enable this, simply change the HTTP url of the Published Content Web Service Object to be published_content_noredirect.jsp instead of published_content_redirect.jsp. Of course, check out the publisher redirector access logs to see the dramatic difference...under load, you will initially see a bunch of requests to published_content_noredirect.jsp, but very very fast, the access log becomes silent, all the content being really cached by portal...

Result?? You can increase the load even more, and the page response satys the same (or is even better), and the DB utilization is not altered by that load...you simply have a site that is so much more performant. Our initial results showed that under constant intensive load (with and without the change) the publisher infrastructure would not crash anymore, the CPU usage of both DB and publisher redirectors would be consequently diminished, the number of DB requests would drop, and the load could actually be increased to new levels without loss of performance and functionality, thanks to caching.

Since this post is already long, I am going to stop here for now...Please read next post (http://fsanglier.blogspot.com/2008/06/alui-publisher-part-2-increase_30.html) to understand the second flaw: the published_content_noredirect.jsp has a truncation bug...that is fixable of course :)

So long!