Hello,
The last stage of the backreferences extension (#xe5) has a case that test greediness:
echo -n "abc-def is abc-def, not efg, abc, or def" | ./your_program.sh -E "(([abc]+)-([def]+)) is \1, not ([^xyz]+), \2, or \3"
([^xyz]+) matches the rest of the string efg.... To pass the test, you need to take into account greediness and backtracking. This behavior is never tested before. I think this is not a good way to introduce it and would be better in a separated extension.
I wound up hitting this particular challenge and have got pretty stuck I have a horrible feeling how I’ve implemented things just won’t handle this nested case of greediness/backtracking
Before starting this overall challenge, I hadn’t really read up on how regex parsers are implemented ‘properly’ For this problem is there any recommended reading to help better understand the optimal approach?
The problem I have is the nested pattern will match greedily, but isn’t able to backtrack before updating the matched capture group, so with this case…
text: not efg, abc, or def
pattern: not ([^xyz]+),
My implementation matches efg, abc, or def, instead of efg, so the overall match fails instead of succeeding.