Regular expressions you can read: A new visual syntax (and UI)

Regular expressions you can read:
A new visual syntax (and UI)

Lots has been written about the problems with regular expressions: learning them, debugging them, etc.

I propose a more visual syntax and a keyboard-usable UI for generating regular expressions.

The UI/syntax proposed here helps address issues related to readability, learnability, and memorability. Those who readily understand regex will find that this visual syntax does not slow them down. It makes existing regexes easier to read for both novices and true regex superheroes.

Simplified email matching in new visual regex syntax (not for production use)

You write regexes just like you always have — with optional ctrl+space popup menu command completion or insertion. Also, part of the UI concept is to be able to import existing regex expressions for editing, then export them in your chosen dialect.

This dialect-agnostic visual syntax seeks a balance between two ends of a continuum:

Traditional regexes are so terse that it is hard to tell apart elements and their meanings. Literals, syntax, wildcards, variable placeholders, etc. are all mashed up together:
\b[A-Z0–9._%+-]+@[A-Z0–9.-]+\.[A-Z]{2,4}\b
Some editors do already visualize regexes with charts. They are not directly editable, particularly not with a keyboard. These representations are typically very verbose and as such, are not particularly quick to scan through.

An example from regexper.com:

A regexper.com sample output for the above email example

The real power of the visual syntax comes to life with the suggested UI. The UI will particularly help those who find the traditional syntax hard to remember.

You write a regex as you normally would. The UI will visualize the structure on the fly. When you find that you can’t remember a command, you can press ctrl+space to summon a search menu. This menu contains all regex commands and descriptions: You can either search by command (to confirm if you remember the command’s meaning right) or by description (to recall what is the command for given task).

Supporting user memory

Regular expressions have a hard-to-memorize syntax. This is a particularly serious an issue considering that most of us do not write regexes for a living.

Regexes are typically a tool that gets summoned say, a couple of times a year. When we come back to them, previous learning has faded, and we might need hours just to get up to speed with the syntax.

To solve this, we will augment the above visual syntax with an UI that enables learning. This means three things:

As mentioned above, the new visual language is dialect-agnostic. Generate any dialect from your expression, the engine behind the syntax takes care of the actual generating.
Progressive disclosure for learning special element meanings. The general aim is to make elements self explanatory. To remain terse though, not all meanings are readily visible. If you forget the meaning of a symbol, you can just hover or click on elements to get explainers on what each element does.
In the visual syntax, a symbol means the same no matter where in the expression it is shown. The traditional regex language is context modal: Different characters mean different things in different situations, and have different escaping rules. This is particularly true inside and outside character classes [ ]. These inconsistencies are particularly difficult to remember between usages.

Implementation

This is a concept design. The idea is that the visual syntax will generate traditional regexes. You could see it as a visual DSL that generates (only barely human readable) traditional regular expressions. Ideally, IDEs would have support for this visual syntax such that you could switch between traditional syntax and this visual one.

@TODO

Even this syntax can get unwieldy if the expression is complex enough.

Also, this does not solve all issues with regular expressions. Namely, it does not solve the core issue more intrinsically built into regexes: How do I make sure that my regex matches exactly those strings I want it to and none of the ones I don’t? There are debugging tools for regexes that allow you to find what you want by means of trial and error, but that’s a topic for another post.

I’d love to hear your comments about how this could be implemented. Some years ago I tried doing a demo in HTML5 and js, but turns out contentEditable was hard to use. In any case, if you’re thinking of creating something like this, I’d love to join you. An open source project, perhaps?

Contact me via this snappy form or on Twitter.

Thanks to Bret Viktor for providing inspiration for much of this.