Tab Completion

I'm Tab Atkins Jr, and I wear many hats. I work for Google on the Chrome browser as a Web Standards Hacker. I'm also a member of the CSS Working Group, and am either a member or contributor to several other working groups in the W3C. You can contact me here.
Listing of All Posts

Depixelating Pixel Art

Last updated:

For my future use, I distilled the algorithm for depixelating pixel art into its component steps, minus the explanations and figures.

I've also copied this over into a GitHub project, which will host the code when I write it.

General Algorithm

  1. Construct a graph where each pixel is a node, connected to its 8 neighbors.

  2. Remove all edges between nodes with "dissimilar" colors. (In YUV, colors with dY>48, dU>7, or dV>6.)

  3. Examine all four-squares of nodes:

    1. If all four are fully connected, remove the diagonals. (It's a block of the same color.)
    2. If they're only connected via diagonals, use feature detection to tell which diagonal to keep:
    3. Compare the length of the curves (valence-2 nodes) that each diagonal is a part of. Cast vote in favor of the longer one, with weight equal to difference.
    4. Compare the size of the connected graphs each diagonal is a part of, in an 8x8 window. Cast vote in favor of the smaller component, with weight equal to difference in size.
    5. If either diagonal connects directly to a lone pixel (valence-1 node) cast a vote in favor of it, with weight 5 (empirically determined).
  4. You now have a fully planar set of disconnected graphs.

  5. Chop each connection in half, associating each half with its closer point. Construct a Voronoi diagram from each of these points+half-connections. Quantize the Voronoi to a quarter-pixel grid.

  6. Simplify the Voronoi diagram by collapsing valence-2 nodes into lines that connect their higher/lower-connected endpoints.

  7. You now have the basic depixelated shape.

  8. Walk the graph of cell points (nodes). Ignoring junctions separating similar (connected) cells, find all runs of valence-2 cell points and turn them into quadratic b-splines, with control points set to the nodes.

  9. Simplify all 3-spline junctions.

    1. If one of the splines is a "shading spline" (separating two areas with YUV distance <= 100) and the other two are "contour spline" (YUV distance > 100), connect the two contour splines into one.
    2. Otherwise, find the two splines which connect with an angle closest to 180deg, and join them together.
    3. Keep in mind that the end-point of the non-connected spline of each triad will need to be adjusted to lie exactly on the curve defined by the connected splines.
  10. You now have a relatively smooth depixelated shape.

  11. Detect "corners" that shouldn't be smoothed away. Corners are all runs of 4-5 points that are related by the following 5 relations (plus rotations/mirrors):

    1. (0,0) (.25, .75) (.75, .75) (1, 0)
    2. (-.25, .25) (.25, .75) (.75, .75) (1, 0)
    3. (-.25, .25) (.25, .75) (.75, .75) (1.25, .25)
    4. (0,0) (0,1) (1,1) (1,0)
    5. (.75, -.25) (.25, .25) (.25, .75) (.75, .75) (1.25, .25)
  12. Optimize all remaining nodes by minimizing an energy function over them. Each node contributes the sum of:

    1. Curvature. Integrate the curvature function over the segment influenced by the node. Can use simple numerical sampling here.
    2. Position deviation. Take the fourth power (empirically determined) of the distance between its current location and its original location.

The energy function is non-linear but smooth, so you can do local modifications by randomly selecting nodes, randomly jiggling them to find a lower-energy state, and repeating until satisfied. Paper doesn't specify how many iterations they used.

  1. Rejigger the "corner" nodes to match up better with the energy-minimized nodes and minimize cell-shape distortion. Paper is unclear here: use harmonic maps; it's solving a simple sparse linear system; [Hormann 2001] has a full explanation.
  2. Remember to rejigger the endpoint of the non-connected spline in each triad connection!
  3. Render the graph accordingly. [Nehab and Hoppe 2008] Each reshaped cell diffuses its color from its centroid, constrained by the spline boundaries? Simpler if you assume that all color differences are significant; you get more contours, but they're all solid-color and look pretty good. (And can actually be done in SVG, which doesn't have diffusion-based paint servers yet.)

DONE.

Future work

  1. Detect dithering patterns used in color-constrained graphics, and consider them part of a single solid-color region.
  2. Detect "corners" (junctions where long straight lines meet) and increase the multiplicity of the knot vector to get a sharper corner.

Refs

NEHAB, D., AND HOPPE, H. 2008. Random-access rendering of general vector graphics. ACM Trans. Graph. 27, 5, 135:1–135:10.

HORMANN, K. 2001. Theory and Applications of Parameterizing Triangulations. PhD thesis, University of Erlangen.

Switching Your GitHub Repo To Use Only The gh-pages Branch

Last updated:

(This is a simplified version of the advice on Oli Studholme's blog, cut down to just the parts that I've actually needed. I do this often enough, and always forget the steps, so having it put down here is useful for me, and maybe for someone else.)

So you want to serve your repo from the http://user.github.io/repo url, and don't have any reason to separate a "code" and "serve" branch. In other words, you want your GitHub repo to only use the gh-pages branch, not master. Here's an easy guide to making that happen.

First, if you already have a gh-pages branch, make sure it's synced up to master, showing exactly what you want. If you don't, type:

git checkout -b gh-pages
git push origin gh-pages

Second, go to GitHub, settings for your repo, and switch “Default Branch” to gh-pages.

Third, delete the local master branch with git branch -d master.

Finally, delete the remote master branch with git push origin :master.

Done. You now have only the gh-pages branch, and it's the default one that git will use when talking with GitHub.

A Quick Primer on P and NP

Last updated:

(This is a quick gloss over the problem, intended to give someone a good idea of what's going on without bogging down too much in details. Pretty sure I've gotten it right, but if I'm slightly off about something, don't worry about it.)

So, a lot of problems sit in P or NP complexity classes. This classifies a problem based on how quickly the time-to-solve increases as the size of the problem increases. Adding two numbers is linear - double the size of the numbers, you double the running time. Multiplying two numbers is (roughly) square - double the size of the numbers, you quadruple the running time. Bruteforcing a password is exponential - double the size of the password, you increase the running time by ~60^n (where n is the original size).

Linear and square are both "polynomial", or in P, because the relationship between problem size and running time can be written as a polynomial equation where the problem size is the base and the power is a static number. Adding is "size^1 = time", multiplying is "size^2 = time". Password cracking is not polynomial, because its equation is "60^size = time" - the size is a power, not a base.

NP doesn't quite stand for "non-polynomial", but it's often described as such, and that's pretty close to what it means. There's an extra wrinkle, though. NP actually stands for "non-deterministic polynomial", which means that if you had a magic 8-ball that you could ask yes/no questions of and it always gave the right answer, you could solve the problem in polynomial time. Bruteforcing passwords is like that - if you can just ask the oracle "is the first letter A? is it B?" etc, you can figure it out in linear time (roughly 60*N, rather than 60^N). A lot of exponential problems are like this - there's not actually that much work that goes into solving it, it's just that you don't know where to do the work, so you have to try every possible combination. The oracle fixes that.

An alternate definition of NP is that it requires at least polynomial time to solve, but only needs polynomial time to verify the solution. Again, passwords are like this - if someone claims they know the password, you can tell if they're right by just checking it against the real password, which is linear time (checking each letter).

The definition of NP technically covers everything in P as well. There's another category, NP-complete, which means "the hard part of NP". Most of the time, when people say "NP" they actually mean "NP-complete".

Quick diversion - it's still an open question whether P and NP are equal to each other. It might be that, with very clever algorithms, password cracking can be done in polynomial time. Most computer scientists don't think so, but we haven't yet been able to prove that it can't be done. (We can prove that some complexity classes are definitely different, but P and NP haven't been achieved yet.)

So, we don't yet know whether the NP-complete category is part of P (if P=NP) or is a hard chunk separate from P (if P!=NP). We do know, though, that some problems are definitely in NP-complete - they're the "hardest possible" problems in NP. (though, again, they might still turn out to be in P as well). We started out with 21 examples of NP-complete problems, and now we mostly add new ones by proving that you can transform the new problem into an NP-complete one and vice-versa (in polynomial time).

The NP-complete category is useful because it's big and connected, so that if anyone can ever prove that any single NP-complete problem is in P, they prove that they're all in P automatically.

Brad Wardell Is A Douchebag, And I'm Never Buying Anything From StarDock Again

Last updated:

Heya everyone! I'm pretty insulting in this blog post, but that's it. They're just bad names, nothing threatening. If you threaten anyone, or even worse, threaten their family, regardless of what side you think they're on or how bad you think they are, you are literal human trash and don't deserve to be on this planet. Don't do it.

I rarely participate in boycotts. They're terrible coordination problems, and almost never accomplish anything. But every once in a while something is so heinous that I can't morally support a company with my dollars anyway. Today, that company is StarDock.

I love StarDock's games - I've sunk several hundred hours into Galactic Civilizations 2, and highly anticipated doing the same for their new Gal Civ 3. But today I saw several tweets from their CEO, and was blown away by how much of a sexist fucking jackass he is.

It started with an offensive comic drawn about Zoe Quinn, repeating the lie-that-won't-die about her sleeping with game reviewers to get them to give her games good review scores:

(The artist took down their comic, so I'll honor their intent and not show it either.)

Sometime a little later, the comic artist mentioned how much they loved doing comics for GG, which prompted Brad Wardell, StarDock CEO, to respond by encouraging them to apply to StarDock as an artist.

Brad Wardell inviting the artist to apply

This is some bullshit by itself - the person's a half-decent artist, but they're being explicitly courted due to their contribution to a harassment campaign.

But it got really bad afterwards. Zoe complained on Twitter about this behavior, and Brad went off the fucking deep end calling her a "liar" and explaining why she is so "reviled".

Brad Wardell being a dick

More of Brad Wardell being a dick

This seemed astonishing to me - a highly visible gaming CEO, harassing a woman and lying about his own actions? So I checked the rest of his Twitter feed, and lo and behold, he's a GamerGater and a complete dick.

Wow. I can't support someone who's that much of a jerk, willing to harass and support the harassment of innocent people, and then lie and misrepresent his own actions. I really wanted to play more of his company's games, but I've already got plenty of backlog to burn through, and I don't need to throw money at that trash. I'm boycotting StarDock for the foreseeable future.

This is almost literally the tiniest sacrifice I can make, costing me almost nothing at all, but every little bit of support helps for those people dealing with this harassment every day. I support Zoe Quinn and all like her who are being abused by the misogynistic hate group called GamerGate, and I encourage all of you to do so too.


Updated Dec 21 to remove comic, fix some pronouns, and add a disclaimer, after having a polite conversation with Brad.

Why We Can't Have Nice Things: Metric Time

Last updated:

I was recently chatting with my friend Brenna O'Brien about datetime functions, and she was sad about how complicated time was. We joked a bit about metric vs imperial time (just use kiloseconds, it's about 15 minutes!), which reminded me that I'd thought about this in the past.

Unfortunately, metric time can't ever happen. Even if we relax our requirements to just be uniform time (define "uniform" however you like) that's easy to work with at both large and small timescales, we're still screwed.

The fundamental issue is that all the metric units have a single notion of what a "significant" unit is. It's often arbitrary, but it's important that it doesn't vary. For example, all temperature is measured in Kelvins, and using Kelvins (or a metric-prefix multiple) is valid at all temperatures.

Time is different. We have two or three "significant" units for time, and extending any of the units too far, while useful in some narrow contexts, is worthless for most purposes.

Our notion of "time" is fundamentally built on the motion of our planet in spaaaaaace. We have two relevant timescales: the "day", and the "year". Both of these are extremely important: our body has a bunch of cycles based around the "day" (actually, around a 26-hour period, which is corrected every day by sunlight levels; the body is weird), while the world around us has cycles based around the "year".

Say we tried to rely solely on days, throwing around terms like "kiloday" and the like. If we kept the same zero point to our numbering (the start of 1AD), then today would be day 735,569 (at least according to this website; I haven't tried to verify its exact values). We'd probably refer to this as day 569 of kiloday 735. However, there's no way to know what the weather is like on this day. The kiloday has no relation to our solar orbit, lasting about 2¾ years, so day 569 of one kiloday is completely different from day 569 of another kiloday.

We run into similar problems if we try to rely on the year as our base unit. Milliyears sound half-reasonable, as they're about a third of a day (8.77 hours). It's that "about" that kills you. Using the conversion ratio of 3 milliyears = 1 day, just 5 "days" later (15 milliyears), is already a half-day off - if you started counting at midnight, you end at nearly noon. So telling someone that something is happening at 100my this year doesn't give them any information about what time of day it is, which is usually fairly important.

(Earlier I said there were "two or three" significant units. The third is the second, or some multiple of it. Days and years are both based on the movement of the earth, and they drift over time - the day gets longer due to drag from the moon, and the year appears to get shorter as less of the longer days fit in one orbit. We need a timekeeping unit that doesn't vary over time as well, and the metric second does a good enough job at that.)

So, we're screwed. Too many cycles in the natural world are based around one or both of the two "significant" durations, and those two durations aren't tied to each other in any way. So we can't have metric dates. ;_;

What Can We Have?

We can approach this question in two ways.

One is that when we move to spaaaaaace, we'll no longer have the year as a significant marker. Growing plants and whatnot will happen constantly in hydroponics and similar, so there's no significant cycles based around a "year". We could switch to a day-based system at that point, as our talk about kilodays would no longer conflict with any other important information. I'd be 10.6 kilodays old, for example.

(There'd be only a third as much birthdays then. Maybe we could celebrate half-birthdays more commonly?)

The length of the day isn't sacred anymore either, for that matter. Our body cycles are still important, but perhaps we'd switch over to a "day" closer to our body's actual cycle, about 26 hours.

Since the day isn't tied to the spinning of a rock either, we can define it precisely, so it doesn't change over time. This allows us to replace the "second" as the stable time unit as well, with metric fractional days. The milliday is a fairly useful duration, covering about a minute and a half. Microdays would be a little under a tenth of a second, which is still about the right duration to be useful.

In fact, if we lengthen the day to about 27¾ hours (within the range of reasonable, for our body cycles), we can make 10 microdays exactly 1 second, allowing us to treat "second" as one of those weird names science has for certain units, like how "angstrom" is just an alternate name for a tenth of a nanometer. Bam, two reference points unified together, and all we need are spaaaaaace colonies!

Okay, But Forget Spaaaaaace

If we're stuck on Earth, we remain screwed with three unreconcilable bases. But we can at least get something a little more sensible.

Right now we group together days into "weeks" and "months", to get some useful durations less than a year. Unfortunately, neither of them are tied together either - a week is 7 days, and months are kinda sorta a twelth of a year, except rounded to the nearest day and also stupid.

We can make this work better. My friend Tantek Çelik wrote up something he called NewCalendar some time ago. NewCalendar maintains the 12-month system, and even maintains dates really well - every date in NewCalendar is at most 4 days off from the corresponding date in our current calendar. (It's worst in March/April, due to February.)

The core of this system is making each month exactly 30 days, dividing them into 6 5-day weeks. Obviously, there are 5-6 days left over, depending on the year: these are distributed between the months, every two months, to February, April, June, August, October, and on leap years, December. They're called Sundays, just for fun. The Sundays can be thought of as the 31st day of every second month, making the months alternate between 30 and 31 days, except that they don't throw off the weekdays numbering; each Sunday is outside the normal week schedule.

NewCal is meticulously thought out, with his reasoning laid out plainly on the page linked above. For example, the five-day week works much better with our decimal numbering system, as we can mentally multiply or divide by 5 much faster than we can by 7. (Quick, if today is Tuesday, what day will it be 40 days from now? In NewCal, the answer is obviously "Tuesday".) Same with six weeks per month: six is also easy to work with, it makes things that alternate weeks always land on the same day of the month, it gives us 30-day months, which is another number we seem to like, and it maintains the legacy 12 months per year.

I also like his reasoning for assigning the day names:

  • New Monday - mon resembles the prefix "mono-" meaning one, providing an easy mnemonic for the first day of the week.
  • New Tuesday - sounds like "twos-day", again providing an easy mnemonic for the second day of the week
  • New Wednesday - keeps the traditional naming of the middle of week, which in some languages is the actual name of the day (e.g. German - Mittwoch)
  • New Friday - people like Fridays, and the prefix "Fri" has similar consonant sounds to the word "four", another mnemonic for remember the fourth day of the week
  • New Saturday - people like Saturdays.

(Though for serious, the third day should be Thursday, due to the close resemblance to "third", like the reasoning he uses for Friday and "fourth".)

The only thing I disagree with is his placement of Sundays, and even then I'm torn. His reasoning tying them to major Western holidays is cute, and even without that, spreading them out in the year is reasonable - it keeps business quarters almost even, for example.

On the other hand, it messes up the week math; you can't tell whether "Monday next week" is 5 or 6 days from now without knowing which Monday it is, because there might be a Sunday in that interval (and same with months). An alternative would be to place all the extra days in a bonus week at the end of the year, after December. Leap years would add the bonus day after that week. This lets us have a more consistent "next week/month" length (5 and 30 days, respectively). It's not perfect, as you still need to know if a "next month" period includes the New Year's Week (in which case it's 35/36 days away), but weeks are perfect on non-leap years; leap years you just need to know if the Leap Day is included.

Your datetime libraries thus still need to talk about months and weeks as first-class objects in the system (rather than translating them to days or something), but it's still a much easier system overall.

Finally: fuck timezones.