<< Comment feed for developerWork blogs? | Home | Problem managing your feeds? Look to the dinasaur report! >>

Small regexp tip

To find the shortest match in a using using regular expressions you can use a "Reluctant" qualifier like ".+?" instead of simply ".+" (greedy qualifier) which will find the longest match.

Wow - did that make my day!! :-)

Tags :


Avatar: Thomas Bahn

Re: Small regexp tip

Hi Mikkel, you possibly mean .* instead of .+ .* : 0, 1 or many (as much as possible) .+ : 1 or many .+? : 0, 1 or many (see it as (.+)? : (1 or many) or 0) Ciao Thomas
Avatar: Mikkel Heisterberg

Re: Small regexp tip

No I actually mean it as a write it - consider some input such as:

Section
  Contents1
End Section
Section
  Contents2
End Section

If I wanted to use regexp to split the input into two parts with a section in each I would need to use ".+?" and not ".+" to indicate the actual contents between the "Section" and "End Section" parts. If I do the result will be what you expect. I a don't the result will only be a single match with the entire input.

Try running the below code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Lekkim {

  public static void main(String[] args) {
    String input = "Section\n  Contents1\nEnd Section\n" + 
       "\nSection\n  Contents2\nEnd Section\n";
    
    Pattern p1 = Pattern.compile("(Section.+End Section)", Pattern.DOTALL);
    Pattern p2 = Pattern.compile("(Section.+?End Section)", Pattern.DOTALL);
    
    System.out.println("Input is:\n" + input);
    System.out.println("*** *** *** *** ***");
    
    Matcher m = p1.matcher(input);
    while (m.find()) {
      System.out.println("Match:\n" + m.group());
    }
    
    System.out.println("*** *** *** *** ***");
    
    m = p2.matcher(input);
    while (m.find()) {
      System.out.println("Match:\n" + m.group());
    }

  }
}

Avatar: Bernd Hort

Re: Small regexp tip

Thanks! I had problems with greedy regex for quite some time. Never imagined that it would be so easy. :-)

Re: Small regexp tip

I ran into problems with reluctant qualifiers recently - it seems as if older versions of Python did not implement regex properly. It nearly drove me nuts trying to figure out what was going on. That being said, once I found out about them I went about happily revising old code...

Add a comment Send a TrackBack