编程语言
首页 > 编程语言> > 正则表达式(java版)

正则表达式(java版)

作者:互联网

正则表达式

正则表达式是一种强大而灵活的文本处理工具,使用正则表达式能够以编程的方式,构造复杂的文本模式,并对输入的字符串进行搜索,一旦找到匹配这些模式的部分就可以随心所欲对它们进行处理.初学正则表达式时,其语法是一个难点,但它确实是一种简洁、动态的语言.正则表达式提供了一种完全通用的方式,能够解决各种字符串处理相关的问题:匹配、选择、编辑以及验证.

利用String类内建的功能
例:

public class Main {
    public static void main(String[] args) {
        System.out.println("-1234".matches("-?\\d+"));
        System.out.println("5678".matches("-?\\d+"));
        System.out.println("+911".matches("-?\\d+"));
        System.out.println("+911".matches("(-|\\+)?\\d+"));
    }
}

output:

true
true
false
true

String自带split()方法
例:

public class Main {
    public static String knights="Then,when you have found the shrubbery,you must "+
            "cut down the mightiest tree in the forest... "+
            "with... a herring!";
    public static void split(String regex)
    {
        System.out.println(Arrays.toString(knights.split(regex)));
    }
    public static void main(String[] args) {
        split(" ");
        split("\\W+");
        split("n\\W+");
    }
}

output:

[Then,when, you, have, found, the, shrubbery,you, must, cut, down, the, mightiest, tree, in, the, forest..., with..., a, herring!]
[Then, when, you, have, found, the, shrubbery, you, must, cut, down, the, mightiest, tree, in, the, forest, with, a, herring]
[The, whe, you have found the shrubbery,you must cut dow, the mightiest tree i, the forest... with... a herring!]

正则表达式(java)常用参数


字符
B指定字符B
\xhh十六进制值为oxhh的字符
\uhhhh十六进制表示为oxhhhh的Unicode字符
\t制表符Tab
\n换行符
\r回车
\f换页
\e转义(Escape)

字符类
.任意字符
[abc]包含a、b和c的任何字符(和a|b|c作用相同)
[^abc]除了a、b和c之外的任何字符(否定)
[a-zA-Z]从a到z或从A到Z的任何字符(范围)
[abc[hij]]任意a、b、c、h、i、j字符(与a|b|c|h|i|j作用相同)(合并)
[a-z&&[hij]]任意h、i或j(交)
\s空白符(空格、tab、换行、换页和回车)
\S非空白符([^\s])
\d数字[0-9]
\D非数字[^0-9]
\w词字符[a-zA-z0-9]
\W非词字符[^\w]

逻辑操作符
XYY跟在X后面
X|YX或Y
(X)捕获组,可以在表达式中用\i引用第i个捕获组

边界匹配符
^一行的起始
$一行的结束
\b词的边界
\B非词的边界
\G前一个匹配的结束

量词

量词描述了一个模型吸收输入文本的方式:

量词总是贪婪的,除非有其他的选项被设置,贪婪表达式会为所有可能的模式发现尽可能多的匹配,导致此问题的一个典型理由就是假定模式仅能匹配第一个可能的字符组,如果它是贪婪的,那么它就会继续往下匹配.

用问号来指定,这个量词匹配满足模式所需的最少字符数,因此也称作懒惰的、最少匹配的、非贪婪的、或不贪婪的.

这种类型的量词只在Java语言中才可用.当正则表达式被应用于字符串时,它会产生相当多的状态,以便在匹配失败时可以回溯.而"占有的"量词并不保存这些中间状态,因此它们可以防止回溯.它们常常用于防止正则表达式失控,因此可以使正则表达式执行起来更有效.


贪婪型勉强型占有型如何匹配
X?X??X?+一个或零个X
X*X*?X*+零个或多个X
X+X+?X++一个或多个X
X{n}X{n}?X{n}+恰好n次X
X{n,}X{n,}?X{n,}+至少n次X
X{n,m}X{n,m}?X{n,m}+X至少n次,且不超过m次

Pattern和Matcher

比起功能有限的String类,可以构造功能更强大的正则表达式对象,即导入java.util.regex包,然后用static Pattern.compile()方法来编译即可.
例:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {

        Matcher matcher=Pattern.compile("\\w").matcher("Evening is full of the linnest's wings");
        while (matcher.find())
            System.out.println(matcher.group());
        int i=0;
        while (matcher.find(i))
        {
            System.out.println(matcher.group());
            i++;
        }
    }
}

例:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    static public final String POEM="Twas brillig,and the slithy toves\n"+
            "Did gyre and gimble in the wabe.\n"+
            "All mimsy were the borogoves,\n"+
            "And the mome raths outgrabe.\n\n"+
            "Beware the Jabberwock, my son.\n"+
            "The jaws that bite,the claws that catch.\n"+
            "Beware the Jubjub bird,and shun\n"+
            "The frumius Bandersnatch.";
    public static void main(String[] args) {
        Matcher matcher=Pattern.compile("(?m)(\\S+)\\s+((\\S+)\\s+(\\S+))$").matcher(POEM);
        while (matcher.find())
        {
            for (int j=0;j<= matcher.groupCount();j++)
                System.out.println("["+ matcher.group()+"]");
        }
    }
}

#### output:
[the slithy toves]
[the slithy toves]
[the slithy toves]
[the slithy toves]
[the slithy toves]
[in the wabe.]
[in the wabe.]
[in the wabe.]
[in the wabe.]
[in the wabe.]
[were the borogoves,]
[were the borogoves,]
[were the borogoves,]
[were the borogoves,]
[were the borogoves,]
[mome raths outgrabe.]
[mome raths outgrabe.]
[mome raths outgrabe.]
[mome raths outgrabe.]
[mome raths outgrabe.]
[Jabberwock, my son.]
[Jabberwock, my son.]
[Jabberwock, my son.]
[Jabberwock, my son.]
[Jabberwock, my son.]
[claws that catch.]
[claws that catch.]
[claws that catch.]
[claws that catch.]
[claws that catch.]
[Jubjub bird,and shun]
[Jubjub bird,and shun]
[Jubjub bird,and shun]
[Jubjub bird,and shun]
[Jubjub bird,and shun]
[The frumius Bandersnatch.]
[The frumius Bandersnatch.]
[The frumius Bandersnatch.]
[The frumius Bandersnatch.]
[The frumius Bandersnatch.]

Pattern

Pattern Pattern.compile(String regex,int flag)

其中flag参数为以下常量


编译标记效果
Pattern.CANON_EQ两个字符当且仅当它们的完全规范分解相匹配时,就认为它们是匹配的
Pattern.CASE_INSENSITIVE(?i)默认情况下,大小写不敏感的匹配假定只有US-ASCII字符集中的字符才能进行.这个标记允许模式匹配不必考虑大小写
Pattern.COMMENTS(?x)在这种模式下,空格符将被忽略掉,并以#开始直到航模的注释也会被忽略掉,通过嵌入的标记可以开启Unix行模式
Pattern.DOTALL(?s)在此模式下,表达式"."匹配所有字符,包括行终结符,在默认情况下"."不匹配行终结符
Pattern.MULTILINE(?m)在多行模式下,表达式^和$分别匹配一行的开始和结束,^还匹配输入串的开始,$还匹配输入串的结尾,默认情况下仅匹配输入完整字串的开始和结束.
Pattern.UNICODE_CASE(?u)此模式下,大小写不敏感,基于Unicode字符集中进行
Pattern.UNIX_LINES(?d)在这种模式下,在. ^ $行为中只识别行终结符\n

例:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {
    Pattern pattern=Pattern.compile("^java",Pattern.CASE_INSENSITIVE|Pattern.MULTILINE);
    Matcher matcher=pattern.matcher("java has regex\nJava has regex\n"
    +"JAVA has pretty good regular expressions\n" +
            "Regular expressions are in Java");
    while (matcher.find())
    {
        System.out.println(matcher.group());
    }
    }
}

output:

java
Java
JAVA

split()

例:

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {
        String input="This!!unusual use!!of exclamation!!points";
        System.out.println(Arrays.toString(Pattern.compile("!!").split(input)));
        System.out.println(Arrays.toString(Pattern.compile("!!").split(input,3)));
    }
}

output:

[This, unusual use, of exclamation, points]
[This, unusual use, of exclamation!!points]

reset()

reset()方法,可以将现有的Matcher对象应用于一个新的字符序列
例:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {
        Matcher matcher=Pattern.compile("[frb][aiu][gx]").matcher("fix the rug with bags");
        while (matcher.find())
        {
            System.out.println(matcher.group());
        }
        System.out.println("");
        matcher.reset("fix the rig with rags");
        while (matcher.find())
        {
            System.out.println(matcher.group());
        }
    }
}

output:

fix
rug
bag

fix
rig
rag

标签:regex,java,String,正则表达式,Pattern,matcher,public
来源: https://blog.csdn.net/weixin_41489136/article/details/123624520