Write your own Pipeline

Pipelines were introduced to separate Tag processing from content received from the XMLWorker. Different pipelines will result in different actions. Creating PDF is only one of the many possible actions.

If you need functionality that goes beyond HTML to PDF rendering, you need to implement the Pipeline interface.

public interface Pipeline<T extends CustomContext> {

	Pipeline init(final WorkerContext context) throws PipelineException;
	Pipeline open(WorkerContext context, Tag t, ProcessObject po) throws PipelineException;
	Pipeline content(WorkerContext context, Tag t, byte[] content, ProcessObject po) throws PipelineException;
	Pipeline close(WorkerContext context, Tag t, ProcessObject po) throws PipelineException;
	Pipeline getNext();

}

For your convenience, the AbstractPipeline already implements all this method. It's always a good idea to write subclass. This allows you to inherit all the default behavior, so that you only have to implement the open(), content(), and close() methods. These methods are called when XMLWorker detects that a tag is opened, when it detects content inside a tag, and when it detects that a tag is closed.

XMLWorker passes a Tag object (containing the name of the tag, attributes, styles, its parent, and its children) as well as a ProcessObject to these methods. In the case of the content() method, you also get a byte array with whatever was found in-between tags. This lifecycle of such a ProcessObject object starts in the first pipeline that is encountered and it ends in the final pipeline. It contains a list of Writable objects. Writable is used as a marker interface, allowing you to pass virtually anything from one pipeline to another. For instance the PdfWriterPipeline expects the ProcessObject to contain lists of WritableElements. These contain lists of Element object that can be added to a document. In the HTML to PDF implementation, the HtmlPipeline add Element objects to a WritableElement and puts them in the ProcessObject that is passed on to the PdfWriterPipeline.

The WorkerContext lives as long as the parsing is going on, the context can be filled with CustomContext implementations used by pipelines. This way pipelines can access a CustomContext of another pipeline. In the existing pipelines this is done in the init method which is called by the XMLParsers various parse methods.

Please consult the source code of the existing pipelines for inspiration when writing your own Pipeline implementation.