As you know, I am using Jekyll to generate this blog. When setting mine up I read how Benjamin Thomas read how Jack Moffitt set up his Jekyll installation, and thought the idea of an html_truncate filter was pretty cool, and thought the idea of ‘an even smarter html_truncate tag’ was pretty cool.

To recap, the truncate and truncatewords tags in Ruby/Jekyll work great on text, but don't work well with HTML. For example <b>Hello World!</b> naively truncated to one word results in <b>Hello, unexpectedly causing the rest of the page to be bolded!

The Jekyll theme I'm using works around this issue by simply stripping all HTML out of the text before using truncatewords, but this of course means that all the formatting will be lost. Benjamin Thomas provides a script which works around this problem, parsing the HTML content to correctly close HTML tags when truncating, preserving formatting.

There is, however, one final problem I encountered. Some HTML components, most notably table/tr, but also others like ruby and script, really shouldn't be cut-off half-way. For example, a table should truncate at the end of a row, not in the middle of one.

To solve this, I present my even smarterer version of html_truncate.

The script recurses through the document, and in text, truncates the input in a HTML-aware manner once the required word count is reached, printing an ellipsis if necessary, but upon reaching a table, will continue to output the rest of the current row before stopping. Check out the home page of this blog to see it in action!

smart_truncate on GitHub