Input data is parsed via nested chains of rules. There are 3 nested chains -
core core.rule1 (normalize) ... core.ruleX block block.rule1 (blockquote) ... block.ruleX core.ruleX1 (intermediate rule that applies on block tokens, nothing yet) ... core.ruleXX inline (applied to each block token with "inline" type) inline.rule1 (text) ... inline.ruleX core.ruleYY (applies to all tokens) ... (abbreviation, footnote, typographer, linkifier)
The result of the parsing is a list of tokens, that will be passed to the
renderer to generate the html content.
These tokens can be themselves parsed again to generate more tokens (ex: a
list token can be divided into multiple
env sandbox can be used alongside tokens to inject external variables for your parsers and renderers.
Each chain (core / block / inline) uses an independent
state object when parsing data, so that each parsing operation is independent and can be disabled on the fly.
Instead of traditional AST we use more low-level data representation - tokens. The difference is simple:
See token class for details about each token content.
In total, a token stream is:
.childrenproperty with a nested token stream for inline content:
Why not AST? Because it's not needed for our tasks. We follow KISS principle. If you wish - you can call a parser without a renderer and convert the token stream to an AST.
More details about tokens:
Rules are functions, doing “magic” with parser
state objects. A rule is associated with one or more chains and is unique. For instance, a
blockquote token is associated with
You can note, that some rules have a
validation mode - in this mode rules do not modify the token stream, and only look ahead for the end of a token. It's one important design principle - a token stream is “write only” on block & inline parse stages.
Parsers are designed to keep rules independent of each other. You can safely enable/disable them, or add new ones. There are no universal recipes for how to create new rules - design of distributed state machines with good data isolation is a tricky business. But you can investigate existing rules & plugins to see possible approaches.
Also, in complex cases you can try to ask for help in tracker. Condition is very simple - it should be clear from your ticket, that you studied docs, sources, and tried to do something yourself. We never reject with help to real developers.
After the token stream is generated, it's passed to a renderer. It then plays all the tokens, passing each to a rule with the same name as token type.
Renderer rules are located in
md.renderer.rules[name] and are simple functions with the same signature:
def function(renderer, tokens, idx, options, env): return htmlResult
In many cases that allows easy output change even without parser intrusion. For example, let‘s replace images with vimeo links to player’s iframe:
import re md = MarkdownIt("commonmark") vimeoRE = re.compile(r'^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)') def render_vimeo(self, tokens, idx, options, env): token = tokens[idx] aIndex = token.attrIndex('src') if (vimeoRE.match(token.attrs[aIndex])): ident = vimeoRE.match(token.attrs[aIndex]) return ('<div class="embed-responsive embed-responsive-16by9">\n' + ' <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' + ident + '"></iframe>\n' + '</div>\n') return self.image(tokens, idx, options, env) md = MarkdownIt("commonmark") md.add_render_rule("image", render_vimeo) print(md.render("!(https://www.vimeo.com/123)"))
Here is another example, how to add
target="_blank" to all links:
from markdown_it import MarkdownIt def render_blank_link(self, tokens, idx, options, env): aIndex = tokens[idx].attrIndex('target') if (aIndex < 0): tokens[idx].attrPush(['target', '_blank']) # add new attribute else: tokens[idx].attrs[aIndex] = '_blank' # replace value of existing attr # pass token to default renderer. return self.renderToken(tokens, idx, options, env) md = MarkdownIt("commonmark") md.add_render_rule("link_open", render_blank_link) print(md.render("[a]\n\n[a]: b"))
Note, if you need to add attributes, you can do things without renderer override. For example, you can update tokens in
core chain. That is slower, than direct renderer override, but can be more simple.
You also can write your own renderer to generate other formats than HTML, such as JSON/XML... You can even use it to generate AST.
This was mentioned in Data flow, but let's repeat sequence again:
Also you can change output directly in renderer for many simple cases.