Did you know that Google Chrome and other environments such as Node.js now support named capture groups in regular expressions? That's right, now we can use regular expressions such as the following:
/(?<year>\d{4})(?<delim>[\-\/\.])(?<month>\d\d)\k<delim>(?<day>\d\d)/
That might be your first question. First of all, it is important to remember that capture groups (or capturing groups) are basically a way of keeping track of a sequence of characters that were matched by your regular expression. Let's say that we have a variable which contains the string "2019-12-31"
. We can actually use a regular expression to pull out the year, the month and the day of the month:
var result = /^(\d{4})-(\d\d)-(\d\d)$/.exec("2019-12-31");
Running the above code will assign an augmented array object to result
where the first item (result[1]
) will be the entire match. The second item (result[1]
) will be the year which in this case is "2019"
. The third item (result[2]
) will be the month which in this case is "12"
. The third item (result[3]
) will be the month which in this case is "31"
. Each capture group is represented in our regular expression by simply wrapping the desired pattern in parentheses.
Even though this works just fine, we want to be able to reference our capture groups by name. That is why named capture groups (or named capturing groups) were added to ECMAScript (AKA JavaScript). They are capture groups which can be referenced by name (as you most-likely guessed).
Of course! Let's say that we want to pull the year, the month and the day of the month from the string "2019-12-31"
. We can do this with the following regular expression which also contains named capture groups:
var result = /^(?<year>\d{4})-(?<month>\d\d)-(?<day>\d\d)$/.exec("2019-12-31");
Running the above code will assign an augmented array object to result
. The would be more-or-less the equivalent to the following array:
[
"2019-12-31",
"2019",
"12",
"31",
index: 0,
length: 4,
input: "2019-12-31",
groups: {
day: "31",
month: "12",
year: "2019"
}
]
Of course, if you try to run the above code in the console it will fail, but more-or-less that would be the structure of result
. As you can see result[0]
to result[3]
are the normal values you would get with regular capture groups. We also still get access to result.index
and result.input
. What is new is result.groups
. This object contains a key for each named capture group that we defined in our regular expression.
?<name_of_group>
(of course replacing name_of_group
with the desired name of the capture group).
var result = /(.).*?\1/.exec("Where in the world is Carmen Sandiego?");
The value of result
will be something like the following:
[
"here in th",
"h",
index: 1,
length: 2,
input: "Where in the world is Carmen Sandiego?"
]
This representation is once again simply to describe the structure and is not properly formed JavaScript. The first item is the substring that was matched by the regular expression. The second item is the value of the capture group. The purpose of the regular expression is to find the first character in the given string that is repeated again later on. We indicate that we want a character that repeats by using the \1
backreference which essentially references the first capture group found in the regular expression. The regular expression /(.).*?\1/
also allows for other characters that are not the same as those found in the capture group to be in the general match. In this case the first character that is repeated later on is the "h"
. For that reason the actual match is "here in th"
.
How do we backreference a named capture group? Here is an example similar to what we used before but using a named capture group:
var result = /(?<repeat>.).*?\k<repeat>/.exec("Where in the world is Carmen Sandiego?");
The value of result
will be the following:
[
"here in th",
"h",
index: 1,
input: "Where in the world is Carmen Sandiego?",
length: 2,
groups: {
repeat: "h"
}
]
Again the above is a representation of the result
array. The main difference that you will notice is that the groups
property is defined as a blank object with repeat
as its only key-value pair. As far as the structure of the regular expression is concerned, the main difference is that we are now using a named capture group and we are using the named capture group backreference syntax to reference that first named capture group. The one thing that we want to take away from this example is that in order to use a named backreference we need to use the syntax \k<name_of_group>
(of course replacing name_of_group
with the actual name of the desired capture group.
One thing that I do want to mention is that you can still reference named capture groups by number. For example, /(?<repeat>.).*?\1/
will work the same as /(?<repeat>.).*?\1/
.
When I published this article Google Chrome was the only major web browser that supported named capture groups. The good news though is that thanks to RunKit, we can play around with this new feature. Try out the example below and modify it as you like:
There is always more that we can learn, especially when it comes to JavaScript these days. I do have to admit that as I started diving into this new feature the following pages helped a lot:
A great resource for all JavaScripters is MDN. I'm sure there will be more information out there as well as time goes on. One thing that I hope to write about in the future is using String.prototype.replace(…)
in conjunction with a regular expression with named capture groups.
In conclusion I have to say the best way to get to know more is to keep writing code and try to keep up with the advances in ECMAScript (AKA JavaScript). Happy coding!!! 😎