

✽ Java, JavaScript, Python: no special syntax (use \10-knowing that if Group 10 is not set Java will treat this as Group 1 then a literal 0, while JavaScript will treat it as the elusive "backspace character")Īs you probably know, there is no standard across engines to insert capture groups into replacements.

To avoid this kind of ambiguity, here is the proper syntax to create a back-reference to Group 10.
REGEX NON CAPTURING GROUP CODE
If there is no Group 10, however, Java translates \10 as a back-reference to Group 1, followed by a literal 0 Python understands it as a back-reference to Group 10 (which will fail) and C#, PCRE, JavaScript, Perl and Ruby understand it as an instruction to match "the backspace character" (whatever that is)… because 10 is the octal code for the backspace character in the ASCII table! If Group 10 has been set, all major engines treat \10 as a back-reference to Group 10. In fact, the meaning does depend on the regex engine. It looks ambiguous: on the face of it, that could refer either to Group 10, or to Group 1 followed by a zero. So in a regular expression, what does \10 mean? However, if you spend time in the smoky corridors of regex, at one time or another you're sure to wonder what is the correct syntax to create back-references to Groups 10 and higher. In practice, you rarely need to create back-references to groups with numbers above 3 or 4, because when you need to juggle many groups you tend to create named capture groups. Normally, within a pattern, you create a back-reference to the content a capture group previously matched by using a backslash followed by the group number-for instance \1 for Group 1. How do Capture Groups Beyond \9 get Referenced? ✽ Relative Back-References and Forward-References ✽ Resetting Capture Groups like Variables (You Can't!) ✽ Generating New Capture Groups Automatically (You Can't!) ✽ Naming Groups-and referring back to them ✽ How do Capture Groups Beyond \9 get Referenced? But when it comes to numbering and naming, there are a few details you need to know, otherwise you will sooner or later run into situations where capture groups seem to behave oddly.įor easy navigation, here are some jumping points to various sections of the page: Yes, capture groups and back-references are easy and fun. You place a sub-expression in parentheses, you access the capture with \1 or $1… What could be easier?įor instance, the regex \b(\w+)\b\s+\1\b matches repeated words, such as regex regex, because the parentheses in (\w+) capture a word to Group 1 then the back-reference \1 tells the engine to match the characters that were captured by Group 1. Capture groups and back-references are some of the more fun features of regular expressions.
