Pages

Tuesday, May 06, 2014

Using camel FOP to convert text documents to PDF

Camel is awesome. No doubts about it. While working on camel I encountered a requirement where I had to convert text files to PDF. After looking into camel components I found out about "FOP" (http://camel.apache.org/fop.html). You just need to add dependency and create the required route. Seems, simple at the beginning, looking at the few snippets on the website. But, to convert the text file to PDF you need to generate the XSL-FO that contains both the formatting instructions (xslt) and the real data (xml). While this is good when you know the documents that you need to generate, it is not that obvious how to generate/convert any random document text file to PDF. But fear not, there is a way.
Create a route that reads text documents from a directory, send it through a processor that creates the XSL-FO around the document, set the file name, author etc attributes, forward to the FOP component, then to the file system. The code is shown below and is well documented where necessary.
package com.bitourea.camel.routes;
import com.bitourea.camel.util.AppUtil;
import org.apache.camel.Exchange;
import org.apache.camel.Processor;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.component.fop.FopConstants;

public class TextToPdf extends RouteBuilder{
@Override
public void configure() throws Exception {
String readDir = "c:/Temp/camel/textToPdf";

//read from directory, filter for text files
from("file://"+readDir+"?noop=true&include=([a-zA-Z]|[0-9])*.(txt)")
.routeId("textToPdf")
.process(new Processor() {
@Override
public void process(Exchange exchange) throws Exception {
final String body = exchange.getIn().getBody(String.class);
final String fileNameWithoutExtension = AppUtil.getFileNameWithoutExtension(exchange);
final String convertToXSLFOBody = AppUtil.getFilledXSLFO(body);
exchange.getIn().setBody(convertToXSLFOBody);
exchange.getIn().setHeader(Exchange.FILE_NAME, fileNameWithoutExtension + ".pdf");
exchange.getIn().setHeader(FopConstants.CAMEL_FOP_RENDER + "author", "Shreyas Purohit");
}
})
.to("fop:application/pdf")
.to("file://" + readDir);
}
}

The AppUtil used in the above code is shown below.

package com.bitourea.camel.util;
import org.apache.camel.Exchange;

public class AppUtil {
public static final String EXT_DELIM = ".";
private static final String fopMainTemplate = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"\n" +
"<fo:root xmlns:fo=\"http://www.w3.org/1999/XSL/Format\">\n" +
"\n" +
"<fo:layout-master-set>\n" +
" <fo:simple-page-master master-name=\"A4\">\n" +
" <fo:region-body margin=\"25pt\"/>\n" +
" </fo:simple-page-master>\n" +
"</fo:layout-master-set>\n" +
"\n" +
"<fo:page-sequence master-reference=\"A4\">\n" +
" <fo:flow flow-name=\"xsl-region-body\">\n" +
"#BLOCK_CONTENT" +
" </fo:flow>\n" +
"</fo:page-sequence>\n" +
"\n" +
"</fo:root>";
private static final String fopBlockTemplate = " <fo:block font-family=\"Courier\" font-weight=\"normal\" " +
"font-style=\"normal\" score-spaces=\"true\" white-space=\"pre\" linefeed-treatment=\"preserve\" " +
"white-space-collapse=\"false\" white-space-treatment=\"preserve\" font-size=\"10pt\">#CONTENT</fo:block>\n";

public static String getFileNameWithoutExtension(Exchange exchange){
String fileName = (String) exchange.getIn().getHeader(Exchange.FILE_NAME);
return fileName.substring(0, fileName.indexOf(EXT_DELIM));
}

public static String getFilledXSLFO(String content){
return fopMainTemplate.replaceAll("#BLOCK_CONTENT", getXSLFOBlock(content));
}

private static String getXSLFOBlock(String line){
return fopBlockTemplate.replaceAll("#CONTENT", line);
}
}

If you want, you can externalize the template into xslt, and create required XML for the XSLT to generate the right XSL-FO. I find that overkill when all I want is to append the strings and forward to next route for all incoming documents. Also, note that I am using the monospace font Courier in the template since it maintains the formatting the best and I dont have to supply additional fonts(One of the 13 font family which are directly available with PDF readers- Helvetica, sans-serif, SansSerif, Times, Times Roman, Times-Roman, serif, any, Courier, monospace, Monospaced, Symbol and ZapfDingbats). The FOP component also support the XML configuration file that can be used to define/load the fonts from the file system if necessary that can be used instead. Change 'margin' in 'region-body' if you wish to increase or decrease the PDF document margin. For the text to be embedded as is in the PDF the following attributes are set, white-space=pre, linefeed-treatment=preserve, white-space-collapse=false, white-space-treatment=preserve.

At the moment of this writing the FOP component supported loading the configuration file from classpath, but, did not support custom UriResolvers to be plugged in or provided a classpath resolver. In effect, we could not load the fonts defined in the XML FOP configuration from classpath and they must be present on the filesystem.

This is one of the easiest way to convert text files to PDF documents.

No comments:

Post a Comment