Tab Completion

I'm Tab Atkins Jr, and I wear many hats. I work for Google on the Chrome browser as a Web Standards Hacker. I'm also a member of the CSS Working Group, and am either a member or contributor to several other working groups in the W3C. You can contact me here.
Listing of All Posts

A Standards-Compliant CSS Parser

Last updated:

Let's open with the pitch:

Do you want to help the CSSWG, but are better at JS than CSS? You're in luck!

I'm writing the CSS Syntax module, and as part of this, I needed to implement the parsing algorithm in javascript (so I can actually test things). I've now done so, and hosted it on GitHub.

I could use your help! I think I've gotten things right, but I'm not certain. I, and the CSSWG, would greatly appreciate you playing around with it and helping to suss out any bugs. If it does anything you don't expect, let me know! Report the bug on GitHub or Twitter or shoot me an email.

I'm also not very up on the modern best practices for modularizing JS, so if someone wants to fix up the code and send me a pull request, I'd be grateful.

Why Do This?

I'm writing the Syntax module because CSS's grammar isn't good enough.

For starters, grammars are hard to read when they get non-trivial. They get extra hard when they try to be "total" - that is, attempt to match every possible input. CSS property grammars are usually okay, because "you don't match, gtfo" is a valid and useful answer (it means you drop the property and soldier on). This doesn't work for stylesheets, though - even if the stylesheet is invalid, you need to produce a stylesheet object somehow. So, the CSS Core Grammar tries to match anything and then do error-handling on that, and it gets confusing to read and understand.

Worse, it fails at that! There's still plenty of possible documents you can feed to the CSS parser that don't match the Grammar, and so their handling is totally undefined. Browsers obviously keep going and do some kind of error-handling, but the spec doesn't explain how they're supposed to do this.

So, the Syntax spec rewrites the whole Grammar as an explicit state-machine Parser instead. This doesn't really make it easier to read, but it does make it easier to think about, edit, and implement. It also makes the handling of invalid stuff a lot simpler, which is a big win.

A caveat: my JS implementation is not meant to be fast or even particularly well-written. Instead, it's meant to match the spec as closely as possible, so I can easily translate between the two, and fix bugs in both at the same time.

That said, it might still be useful for other people. At minimum, it will end up being guaranteed standards-compliant, because it's what I'll use to test the standard, so it can be useful for testing other parsers intended to be faster and better. If speed doesn't matter overly much, though, this can be useful as itself, as a really complete parser.

(a limited set of Markdown is supported)