How I Designed My Latest BBCode Library

I have programmed at least four or five BBCode libraries during my programming career, with each iteration improving in some way or another. With my last BBCode library (currently used on Lowter), I thought it would last me longer and survive any major issues requiring me to reprogram it. Of course, I was wrong and I have spent these past few days working on a new BBCode library to replace Lowter's current one.

The old BBCode library faced a few problems:

  • The code wasn't easily expandable.
  • Acronyms, link indexing, and table-of-contents generation were separate libraries and, hence, had to operate from the HTML code the BBCode library outputted. This resulted in various errors, such as placing <abbr> tags in the middle of an image's alt text. The external libraries performing these tasks could probably have been improved, but it seemed more logical to place the indexing functionality within the BBCode library where text, links, and headers were already passed through various methods that could easily index them at the same time as generating the HTML code.
  • Various block elements (quotations, code, headers, etc.) created a number of problems if the start and end tags weren't placed on a separate line from the block's contents. The main problem here is that comments on blog entries were not properly parsed because the BBCode wasn't flexible enough for public use.

Therefore, the main issue that needed tackling with the restructuring of the BBCode library was a better, expandable system to handle block elements (those that should not be inside a paragraph), such as quotations, code, tables, headers, and lists. The previous BBCode library had an interesting process to remove code blocks from normal BBCode parsing. Code blocks were identified, processed by the library, and then stored in an array. In place of a code block in the original text, a simple reference tag was placed. After the normal BBCode parsing was complete, the library went through and replaced all of the reference tags with the corresponding code blocks. There might be a better way to do this, but this process was simple and it worked. Therefore, I decided to apply this same method to all of the block elements.

First, using preg_replace(), all block elements are process through the extract() method, which decides which specific method to run the text through and then replaces it with a reference tag fashioned like {reference=key}. The processed data is saved into the $references array, with the data's key in the array corresponding to the reference tag's key. Each specific type of block element has its own method within the library, so quotations are processed by quotation() and tables by table(). Then, only paragraph text is left with inline BBCode tags: bolds, italics, links, etc. All of this remaining text is parsed and then the library replaces all the reference tags with the corresponding elements.

The new system for handling block elements is also easily expandable. In order to expand the system, you only need to add a new method to the BBCode class and a small extra bit to the regular expression code. The other issues were tackled with better regex code and some extra code to index various elements passing through the BBCode parser. A few other nice features were added such as nested lists and smilie support.

Look to see the newly designed BBCode library live on Lowter next month! Eventually, the BBCode library will also be available for download as part of Olympiad.