Custom writer for TASVideos forum markup (BBCode-like) #11434

refashioned · 2026-02-01T19:02:40Z

refashioned
Feb 1, 2026

I've made a custom writer for the markup format used on the TASVideos forum, which is based on BBCode. The format is defined by the upstream parser, which is BbParser.cs.

$ echo '*hello **world*** <https://lua.org/>' | pandoc -f markdown -t tasvideos_forum.lua
[i]hello [b]world[/b][/i] [url]https://lua.org/[/url]

The most notable aspect of the writer is probably how it does escaping in order to prevent accidental syntax in the input from being interpreted as BBCode in the output. For example, this Markdown input:

The *i*th element of the array is a[i].

* <https://link.example/>
* https://notalink.example/

Becomes this forum markup output:

The [i]i[/i]th element of the array is a[noparse][[/noparse]i].

[list]
[*][url]https://link.example/[/url]
[*]https[noparse]://[/noparse]notalink.example/
[/list]

The [i] in the input, if copied verbatim to the output, would be interpreted as a start tag for italics. The writer prevents that incorrect interpretation by wrapping the left square bracket in [noparse]/[/noparse].
The TASVideos parser autolinks anything that looks like a URL. The second URL in the input is not actually a link, so it should not become a link in the output. The writer prevents autolinking of URL-like strings by wrapping :// in [noparse]/[/noparse].

This is my first time making a custom writer. I wonder if someone more experienced can comment on the implementation and whether it fits with typical practice. I have these specific questions:

Is there a general best practice for escaping text output, when a pattern that needs to be escaped may be split over two or more AST nodes? I could not think of a way to deal with this problem using pandoc.scaffolding.Writer and instead invented a custom abstraction.
If a custom writer finds that a given input document is impossible to represent, what should happen? This one exits with an error, by calling the Lua error or assert functions.
What's a good way to select a single programming language name class from el.attr.classes, in order to render a code block?

Escaping when a pattern spans AST nodes

I started off trying to use pandoc.scaffolding.Writer, where you provide a bunch of type-specific callback functions for AST nodes, each returning a fragment of the complete output in the form of a string or a Doc. But I had to give up that approach. The problem is that the rules for escaping text differ depending on what BBCode tags are currently open (in particular, whether the most recently opened tag allows nested child tags or not). I didn't see a way to pass that necessary information into the pandoc.scaffolding.Writer callbacks. Instead, I implemented a similar framework of callback functions, but with every callback taking an additional stack parameter. Each callback function did appropriate escaping on its fragment of the output, depending on the state of the stack:

function Inlines.Span(el, stack, opts)
    return render_inlines(el.content, stack, opts)
end

function Inlines.Str(el, stack, opts)
    return escape(el.text, stack)
end

But even this I had to change. The reason has to do with escaping URLs to prevent autolinking. When there's a URL-like string in the input that is not actually a link, we want to escape the string in the output so it remains plain text and does not get autolinked. We do that by searching for the substring :// in text and converting it to [noparse]://[/noparse]. With the scaffolding-like approach, every callback function independently escapes the output fragment that it returns. The problem arises in an input like this HTML:

<p><em>http://example.com/</em></p>
<p><em><span>http:</span><span>//example.com/</span></em></p>

Because the TASVideos forum markup does not have a special representation for pandoc.Span, the Inlines.Span callback just returned its contents without any surrounding markup. The fact that the second URL is broken up by spans means that the http: and //example.com/ parts of the URL were handled by two different calls to Inlines.Str. Because the pattern :// doesn't appear in either part in its entirety, it didn't get escaped by either call. The second URL was output in way that would, incorrectly, cause it to be autolinked:

[i]http[noparse]://[/noparse]example.com/[/i]

[i]http://example.com/[/i]

In short, here we have a situation where the "escape then concatenate" paradigm of pandoc.scaffolding.Writer gives different and incorrect results, in comparison with "concatenate then escape".

To fixed this problem I modified the framework further. Now, instead of callbacks returning a pre-escaped fragment of the output document, they yield a sequence of typed "tokens". A token may be something like start_tag, end_tag, or text. A top-level consolidate_tokens function iterates over the preliminary sequence of tokens produced by the callbacks and merges adjacent text tokens before they are escaped.

With the above example, the preliminary sequence of tokens is as follows:

{type = "start_tag", tag = "i"}
{type = "text", text = "http://example.com/"}
{type = "end_tag", tag = "i"}
{type = "blankline"}
{type = "start_tag", tag = "i"}
{type = "text", text = "http:"}
{type = "text", text = "//example.com/"}
{type = "end_tag", tag = "i"}

After merging adjacent text tokens, the sequence becomes:

{type = "start_tag", tag = "i"}
{type = "text", text = "http://example.com/"}
{type = "end_tag", tag = "i"}
{type = "blankline"}
{type = "start_tag", tag = "i"}
{type = "text", text = "http://example.com/"}
{type = "end_tag", tag = "i"}

The top-level render_tokens function iterates over the consolidated token sequence and emits BBCode tags, escaped text, etc., as appropriate for each tag. Following this approach, the output has both URLs properly escaped:

http[noparse]://[/noparse]example.com/

http[noparse]://[/noparse]example.com/

This technique is effective, but it makes me wonder if I'm overlooking something. I may be reinventing a wheel. It seems like a general problem, the need to look at the content of text nodes in different places in the AST that become adjacent in the output. Is there some existing function of Doc, for example, that does what I'm trying to do, marking certain text as needing to be merged and escaped before finally being output? I notice that Text.DocLayout does something similar internally with FlatDoc.

Dealing with impossible-to-represent inputs

There are two cases where the particulars where an input document may be impossible to represent in this BBCode variant.

When we're inside an tag, such as [code], that does not allow nested child tags, and we're asked to output text that would look like its corresponding end tag ([/code]). For example, consider this Markdown input:
```
```lua
local end_tag = "[/code]"
```
```
If it were converted naively, there would be no way to prevent the inner [/code] from prematurely ending the code block:
```
[code=lua]local end_tag = "[/code]"[/code]
```
Not even [noparse] is allowed inside tags that forbid nesting, so the usual way of escaping cannot be used. Inside such a tag, square brackets are copied to the output verbatim, so the lack of escaping is not actually a problem, except in the particular case of an end tag for the most recently opened tag.
When we're asked to output a BBCode start tag parameter (the param in [tag=param]) that contains unbalanced [ and ] characters. The param part of a start tag is terminated by the first ] character that is not balanced by an earlier [ character. Consider this Markdown input:
```
```ab]cd
test
```
```
If such a parameter with unbalanced brackets were rendered naively, it would make the parser see the content of the parameter as being shorter or longer than it should be:
```
[code=ab]cd]test[/code]
```

Both cases are narrow, but they can happen. It's really only with the [code] tag that they can happen, because [code] is the only tag used by the writer that both doesn't allow nesting and contains something other than a URL. The only other non-void tags that don't allow nesting, [img] and [url], both contain a URL, and URL percent escaping changes [ and ] into %5b and %5d, which avoids the problem.

When one of these situations occurs, I have the writer exit with an error: case (1), case (2).

$ pandoc -f markdown -t tasvideos_forum.lua <<'EOF'
```lua
local end_tag = "[/code]"
```
EOF
Error running Lua:
tasvideos_forum.lua:829: cannot escape [/code] in "local end_tag = \"[/code]\""

$ pandoc -f markdown -t tasvideos_forum.lua <<'EOF'
```ab]cd
test
```
EOF
Error running Lua:
tasvideos_forum.lua:863: cannot escape param: "ab[cd"

Is it expected for a custom writer to behave like this, exiting with an error when it hits an impossible representation? Or should it rather attempt to soldier on no matter what, even if it would result in an output that a parser will misinterpret?

Language class in code blocks

When a fenced code block has a language class, the writer includes it as a parameter on the BBCode [code] tag:

```lua
print("hello world")
```

becomes

[code=lua]print("hello world")[/code]

The problem is when there is more than one class set on the code block:

``` {.lua .numberLines .customClass}
print("hello world")
```

All the classes end up together in the attr.classes of a pandoc.CodeBlock. The TASVideos forum markup allows at most one parameter on a code block, so which do we choose? What I'm doing is taking the first class that does not have a known special interpretation, like numberLines and sourceCode. Is there a better way?

Embedding resources

I like the --embed-resources feature of Pandoc, and I wanted to support something similar to embed image data without an external link. But the --embed-resources command-line option is not exposed to custom writers. So I implemented it as a format-specific extension, embed_resources. Can you suggest an alternative way of controlling an option like this one?

$ echo '![](test.png)' | pandoc -f markdown -t tasvideos_forum.lua
[img]test.png[/img]

$ echo '![](test.png)' | pandoc -f markdown -t tasvideos_forum.lua+embed_resources
[img]data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAAAFUlEQVR4AWP4f3EnSWgYaxjVMKoBAEtOiR9xMI4nAAAAAElFTkSuQmCC[/img]

TeX math inlines

The custom writer uses the hack for math inlines from #11399. If a math formula is simple enough, it can be rendered with BBCode syntax. Otherwise the TeX syntax is output in a [tt] or [code] tag (depending on whether the math is inline or display).

Simple math: $a^2 + b^2 = c^2$

Complicated math: $\frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$

The above converts to:

Simple math: [i]a[/i][sup]2[/sup] + [i]b[/i][sup]2[/sup] = [i]c[/i][sup]2[/sup]

Complicated math: [tt]$\frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$[/tt]

jgm · 2026-02-01T23:50:26Z

jgm
Feb 1, 2026
Maintainer

Impossible-to-represent inputs: since it's a custom writer, you can do as you like. But our approach in normal pandoc writers is not to raise an error, but to skip the content and emit a log message (SkippedContent SourcePos). I think that there is a function in the Lua API that will allow you to emit such a message.

Language class: in Text.Pandoc.Highlighting, we do this:

case msum (map (`lookupSyntax` syntaxmap) classes) of

which means that we take the first class that has a defined syntax. But I don't think this is exposed in the Lua API. You could just grab a list of supported languages from pandoc --list-highlight-languages.

Embedding resources: other ways of triggering options that would be accessible to the custom reader are environment variables and metadata fields.

1 reply

refashioned Feb 3, 2026
Author

But our approach in normal pandoc writers is not to raise an error, but to skip the content and emit a log message (SkippedContent SourcePos). I think that there is a function in the Lua API that will allow you to emit such a message.

I found pandoc.log, and I'm using pandoc.log.warn for RawBlock and RawInline when the format is not recognized. I didn't find a way, from Lua, to do SkippedContent or other typed log messages.

we take the first class that has a defined syntax

That makes sense. Since the forum uses Prism for syntax highlighting, I suppose I should filter on the list of Prism-supported languages.

refashioned · 2026-02-03T20:31:13Z

refashioned
Feb 3, 2026
Author

Because the pattern :// doesn't appear in either part in its entirety, it didn't get escaped by either call. The second URL was output in way that would, incorrectly, cause it to be autolinked:

Something similar may be possible with built-in writers, under some circumstances. Here is one I found. If you read an HTML input that has --- at the beginning of a line, the Markdown writer escapes it in a way that prevents it from being interpreted as a horizontal rule:

$ echo '<p>---</p>' | pandoc -f html -t markdown
\-\--
$ echo '<p>---</p>' | pandoc -f html -t markdown | pandoc -f markdown -t native
[ Para [ Str "---" ] ]

But with a certain arrangement of spans to break up the ---, it doesn't get escaped and it does become a Markdown horizontal rule:

$ echo '<p><span>-</span><span>-</span>-</p>' | pandoc -f html -t markdown
---
$ echo '<p><span>-</span><span>-</span>-</p>' | pandoc -f html -t markdown | pandoc -f markdown -t native
[ HorizontalRule ]

But I don't understand entirely what's going on, because in this case there seems to be some interference by the smart extension also potentially wanting to recognize ---. When I write to markdown-smart, I get a horizontal rule, even without any spans to break up the input:

$ echo '<p>---</p>' | pandoc -f html -t markdown-smart
---
$ echo '<p>---</p>' | pandoc -f html -t markdown-smart | pandoc -f markdown -t native
[ HorizontalRule ]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom writer for TASVideos forum markup (BBCode-like) #11434

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Custom writer for TASVideos forum markup (BBCode-like) #11434

Uh oh!

refashioned Feb 1, 2026

Escaping when a pattern spans AST nodes

Dealing with impossible-to-represent inputs

Language class in code blocks

Embedding resources

TeX math inlines

Replies: 2 comments · 1 reply

Uh oh!

jgm Feb 1, 2026 Maintainer

Uh oh!

refashioned Feb 3, 2026 Author

Uh oh!

Uh oh!

refashioned Feb 3, 2026 Author

refashioned
Feb 1, 2026

Replies: 2 comments 1 reply

jgm
Feb 1, 2026
Maintainer

refashioned Feb 3, 2026
Author

refashioned
Feb 3, 2026
Author