Small regexp tip

To find the shortest match in a using using regular expressions you can use a “Reluctant” qualifier like “.+?” instead of simply “.+” (greedy qualifier) which will find the longest match.

Wow – did that make my day!! 🙂

4 thoughts on “Small regexp tip”

  1. Hi Mikkel,

    you possibly mean .* instead of .+

    .* : 0, 1 or many (as much as possible)
    .+ : 1 or many
    .+? : 0, 1 or many (see it as (.+)? : (1 or many) or 0)

    Ciao
    Thomas

    Like

  2. No I actually mean it as a write it – consider some input such as:

    Section
      Contents1
    End Section
    Section
      Contents2
    End Section
    

    If I wanted to use regexp to split the input into two parts with a section in each I would need to use “.+?” and not “.+” to indicate the actual contents between the “Section” and “End Section” parts. If I do the result will be what you expect. I a don’t the result will only be a single match with the entire input.

    Try running the below code:

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class Lekkim {
    
      public static void main(String[] args) {
        String input = "Sectionn  Contents1nEnd Sectionn" +
           "nSectionn  Contents2nEnd Sectionn";
    
        Pattern p1 = Pattern.compile("(Section.+End Section)", Pattern.DOTALL);
        Pattern p2 = Pattern.compile("(Section.+?End Section)", Pattern.DOTALL);
    
        System.out.println("Input is:n" + input);
        System.out.println("*** *** *** *** ***");
    
        Matcher m = p1.matcher(input);
        while (m.find()) {
          System.out.println("Match:n" + m.group());
        }
    
        System.out.println("*** *** *** *** ***");
    
        m = p2.matcher(input);
        while (m.find()) {
          System.out.println("Match:n" + m.group());
        }
    
      }
    }
    

    Like

  3. I ran into problems with reluctant qualifiers recently – it seems as if older versions of Python did not implement regex properly. It nearly drove me nuts trying to figure out what was going on. That being said, once I found out about them I went about happily revising old code…

    Like

Comments are closed.