Nested Backreferences #xe5 tests greediness

Hello,
The last stage of the backreferences extension (#xe5) has a case that test greediness:

echo -n "abc-def is abc-def, not efg, abc, or def" | ./your_program.sh -E "(([abc]+)-([def]+)) is \1, not ([^xyz]+), \2, or \3"

([^xyz]+) matches the rest of the string efg.... To pass the test, you need to take into account greediness and backtracking. This behavior is never tested before. I think this is not a good way to introduce it and would be better in a separated extension.

2 Likes

Thanks @Gamma120 for highlighting this! I’ve shared your feedback with our team, and added an internal +1 for your vote on a new extension.

In the meantime, feel free to personally vote on a new extension.

1 Like

I wound up hitting this particular challenge and have got pretty stuck :confounded_face: I have a horrible feeling how I’ve implemented things just won’t handle this nested case of greediness/backtracking :see_no_evil_monkey:

Before starting this overall challenge, I hadn’t really read up on how regex parsers are implemented ‘properly’ :sweat_smile: For this problem is there any recommended reading to help better understand the optimal approach?

The problem I have is the nested pattern will match greedily, but isn’t able to backtrack before updating the matched capture group, so with this case…

text: not efg, abc, or def

pattern: not ([^xyz]+),

My implementation matches efg, abc, or def, instead of efg, so the overall match fails instead of succeeding.