正则表达式

先了解什么是正则表达式

正则表达式是一种强大的文本处理工具，它能够用于匹配、查找、替换和提取字符串。

以下是一些运用正则表达式的主要原因：

匹配和验证文本：正则表达式能够用于验证和匹配文本，例如电子邮件地址、电话号码、网址等。经过运用正则表达式，能够快速准确地确认字符串是否符合特定的格式要求。
查找和替换文本：正则表达式能够用于查找和替换文本中的特定模式。例如，能够运用正则表达式查找包括特定关键字的文件或文本，并将其替换为其他内容。
数据提取：正则表达式能够用于从文本中提取特定的数据，例如从网页中提取电子邮件地址、电话号码等。经过运用正则表达式，能够快速准确地提取所需的数据。
自动化处理：正则表达式能够用于自动化处理文本，例如自动生成代码、批量更改文件名、批量处理数据等。

总归，正则表达式是一种非常强大和灵活的文本处理工具，能够极大地提高处理文本的效率和准确性。

举个比方:
咱们要验证用户输入的手机号是否符合标准:

public static void main(String[] args) {
    // 随意挑选一个电话号
    String tel = "199999999999";
    int len = tel.length();
    if(len!=11) System.out.println("长度不对劲");
    for(int i= 0; i < len; i++){
        // 取得每个电话号数字
        char cur = tel.charAt(i);
        if(cur <= '0' || cur >= '9'){
            System.out.println("输入错误");
            break;
        }
    }
}

而咱们运用正则表达式:

String tel = "199999999999";
System.out.println(tel.matches("\\d{11}"));

当然，电话号不只是是这些要求，只是举个比方说明运用正则表达式的好处。

运用办法

正则匹配便是匹配的字符串，调用String里的matches办法。
matches办法: 参数 : 正则表达式

详细的运用:

既然正则表达式只是一个字符串，咱们能够简略的这样写: “123”, “abc”, 这样会彻底匹配字符串”123″, “abc”

彻底匹配没意思啊！我还不如用equals呢！你说的对===》所以正则一般都是含糊匹配。

含糊匹配怎样个玩法

我们都知道 “\” 为转义字符, 它是正则表达式的关键。因而，咱们能够运用转移字符”\” 加上 d,D,w,W等表明特别含义。

比方 \d 表明匹配数字0-9， \w匹配字母，下划线以及数字等。

举个比方:

System.out.println("123".matches("\\d\\d\\d")); // true
System.out.println("a_bc".matches("\\w\\w\\w\\w")); // true
System.out.println("a_b$".matches("\\w\\w\\w\\w")); // false

我为什么要写”\\”, 在Java中，写一个\表明转义字符，”\\” 表明普通的 “\”

我为什么要写很多的 \\d, \\w,由于一个\d ，只能匹配一个数字；一个\w，只能匹配一个字母（或下划线或数字）

我每个字母都这样匹配，岂不是很慢？？？？

Java肯定有解决办法的: ===》 重复匹配符

* 表明屡次匹配: >=0
"123".matches("\\d*") //true
+ 表明一到屡次匹配: >=1
"a12".matches("\\w+") // true
System.out.println("ab12".matches("\\w+")); // true
? 表明0或一次匹配: 0 or 1
System.out.println("12".matches("\\w?12")); // true, System.out.println("a12".matches("\\w?12")); // true

那我要匹配指定次数呢??? 运用 {次数}指定匹配次数或许 {次数，次数}指定匹配次数的规模

System.out.println("12".matches("\\d{2}"));  // true
System.out.println("12".matches("\\d{3}"));  // false
System.out.println("a12".matches("\\w{1,3}")); // true
System.out.println("a123".matches("\\w{1,3}")); // false

根底用法小结

来自廖雪峰教师的博客

小贴士

当要表达特别字符时，记住运用 \ 来转义, 比方表明 & 运用 \&.来表明字符本身。^、$、.、|、?、*、+、(、)、{、}、[、]：这些字符需求运用反斜杠进行转义。

到此现已完结根本匹配了。

杂乱匹配

最初与结束

当咱们想要匹配最初与结束时，该运用什么?

在正则表达式中，^表明匹配字符串的最初，$表明匹配字符串的结束。运用^和$能够保证正则表达式只匹配彻底符合要求的字符串，而不是匹配字符串中的某个子串。

例如，如果要匹配一个字符串是否以数字最初，能够运用以下正则表达式：

String regex = "^\\d.*";

在这个正则表达式中，^表明匹配字符串的最初，\d表明匹配一个数字字符，.*表明匹配恣意数量的字符。因而，这个正则表达式能够匹配以数字最初的恣意字符串，但不会匹配包括数字的恣意子串

类似地，如果要匹配一个字符串是否以字母结束，能够运用以下正则表达式：

String regex = ".*[a-zA-Z]$";

在这个正则表达式中，$表明匹配字符串的结束，[a-zA-Z]表明匹配一个字母字符，.*表明匹配恣意数量的字符。因而，这个正则表达式能够匹配以字母结束的恣意字符串，但不会匹配包括字母的恣意子串。

限制规模匹配

当咱们运用 \d 来作匹配时，匹配结果是0到9，而我只想要3到5咋办?
答案是运用中括号[]
详细运用:

// 直接将匹配项写在中括号中
System.out.println("a34".matches("[abc][345][345]")); // true
// 运用-表明 多少 到 多少
System.out.println("a34".matches("[a-c][3-5][3-5]")); // true
// 能够与重复匹配符搭配运用
System.out.println("a34".matches("[a-c][3-5]+"));     // true

那我想要一切字符可是就不要a和c，该怎样写呢，总不能把其它的都写上吧: 咱们能够运用^来表明排除:

System.out.println("a34".matches("[^ac]34")); // false
System.out.println("z34".matches("[^ac]34")); // true

规则匹配

问题: 我想要匹配，一个连续的单词该怎样办? 比方 i love you 或许 i love dog 你会这样写 String.matches(“i love you”) || String.matches(“i love dog”) 或许直接运用equals,对吧？这样太猪了

Java里能够运用 | 来表明或联系

System.out.println("i love you".matches("i love (you|dog)")); // true
System.out.println("i love dog".matches("i love (you|dog)")); // true
// 不能大写
System.out.println("i love dog".matches("i love (you|Dog)")); // false
// 必须要加括号，不然便是i love you 和 dog 二选一
System.out.println("i love dog".matches("i love you|dog")); // false
System.out.println("dog".matches("i love you|dog")); // true

分组匹配

当有一个字符串，结构不是单一的一连串，而是具有前缀，中缀，后缀的形式，咱们又需求取得其间的某一区域，又该怎样办呢? 比方: “110-1340-220”

为此，Java有专门的分组匹配符: ()，需求搭配着Pattern和Matcher 运用

先看看怎么运用:

public class Main {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("(\\d{3})-(\\d{4})-(\\d{3})");
        Matcher m = p.matcher("110-1340-220");
        if (m.matches()) {
            String g1 = m.group(1);
            String g2 = m.group(2);
            String g3 = m.group(3);
            System.out.println(g1);
            System.out.println(g2);
            System.out.println(g3);
        } else {
            System.out.println("匹配失利!");
        }
    }
}

下面讲的是为什么这样调用，能够直接越过

调用Pattern的compile办法来构建Pattern目标，Pattern类里维护了一个String类型的变量pattern，经过调用compile办法来创立并为其赋值。然后调用matcher办法来创立一个Matcher目标;

public Matcher matcher(CharSequence input) {
    if (!compiled) {
        synchronized(this) {
            if (!compiled)
                compile();
        }
    }
    // 别的我看不懂，这个倒是能看出来
    // 将当前目标的实例传进去了
    Matcher m = new Matcher(this, input);
    return m;
}

//创立Matcher目标的同时，取得分组的个数，初始化根本的特点，比方group长度(表明你正则的分块个数)

Matcher(Pattern parent, CharSequence text) {
    this.parentPattern = parent;
    this.text = text;
    // Allocate state storage
    int parentGroupCount = Math.max(parent.capturingGroupCount, 10);
    groups = new int[parentGroupCount * 2];
    locals = new int[parent.localCount];
    localsPos = new IntHashSet[parent.localTCNCount];
    // Put fields into initial states
    reset();
}

然后matcher目标调用matches办法，matches办法又调用match办法，来填充group数组。至此，matcher目标就有了group数组并有分组信息，然后经过group办法来取得数组内容。 我只能看懂是这个道理，详细怎样实现的真看不懂

那咱们运用的String.matches()是哪里的办法? 咱们看看String的源码，发现还是新建一个Pattern目标，然后新建Matcher目标去作匹配，每次String调用matches办法都会新建一个Matcher和Pattern目标，其实没有必要的。

public boolean matches(String regex) {
    return Pattern.matches(regex, this);
}

public static boolean matches(String regex, CharSequence input) {
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(input);
    return m.matches();
}

如果咱们是要匹配一个固定的格式，咱们不用每次都调用String.matches(), 咱们能够直接new 一个Pattern来重复运用。

Pattern p = Pattern.compile("(\d{3})-(\d{4})-(\d{3})");
System.out.println(p.matcher("110-1340-220").matches()); // true
System.out.println(p.matcher("110-150-220").matches()); // false
System.out.println(p.matcher("1120-1340-220").matches()); // false

我发现有个reset()函数，甚至能够重复运用Matcher目标

Pattern p = Pattern.compile("(\d{3})-(\d{4})-(\d{3})");
Matcher matcher = p.matcher("110-1340-220");
System.out.println(matcher.matches()); // true
// 将内部状态初始化
matcher.reset();
matcher = p.matcher("115-1240-33");
System.out.println(matcher.matches()); // false

上面仅供了解

非贪婪匹配

字符串在匹配时是贪婪匹配，比方 \d+会尽可能多的匹配数字
拿廖雪峰教师博客的比方: 取得末尾的一切0数字

public class Main {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(\\d+)(0*)");
        Matcher matcher = pattern.matcher("1230000");
        if (matcher.matches()) {
            System.out.println("group1=" + matcher.group(1)); // "1230000"
            System.out.println("group2=" + matcher.group(2)); // ""
        }
    }
}

按上面的步骤走下去，group1会匹配一切的数字由于\d+很贪婪，会匹配1到之后一切的数字，这明显不是咱们的意图。
为此，咱们运用 ? 来将其限制为非贪婪匹配。

public class Main {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(\\d+?)(0*)");
        Matcher matcher = pattern.matcher("1230000");
        if (matcher.matches()) {
            System.out.println("group1=" + matcher.group(1)); // "123"
            System.out.println("group2=" + matcher.group(2)); // "0000"
        }
    }
}

当添加了非贪婪匹配的符号，后边的0就不会再匹配了，它变得不贪婪了，把0让给后边的 0* 去匹配，它会匹配到刚好能让后边正则匹配到的方位。

又比方 \d{2,8},他会尽可能少的匹配数字。

不加？

Pattern p = Pattern.compile("(\d{2,8})(0*)");
Matcher matcher = p.matcher("2220000000");
boolean matches = matcher.matches();
if (matches){
    System.out.println(matcher.group(1)); // 22200000
    System.out.println(matcher.group(2));// 00
}

加了？

Pattern p = Pattern.compile("(\d{2,8}?)(0*)");
Matcher matcher = p.matcher("2220000000");
boolean matches = matcher.matches();
if (matches){
    System.out.println(matcher.group(1)); // 222
    System.out.println(matcher.group(2)); // 0000000
}

很明显的看到差异吧！！

查找和替换

切割字符串

String.split()，传入的参数正是正则表达式，运用正则表达式，能够剔除混乱的不符合标准的字符串。

"a b c".split("\\s"); // { "a", "b", "c" }
"a b  c".split("[\\s]+"); // { "a", "b", "c" }
"a, b ;; c".split("[\\,\\;\\s]+"); // { "a", "b", "c" }

查找字符串

我们可能会想到Strnig.indexof(), 这种办法匹配是很常用的，可是不够灵活:
当咱们想要一个字符串形如 xox,doc,wop (中间为o，两头为字母的该怎样办?) 咱们能够运用

Pattern p = Pattern.compile("\wo\w");
Matcher matcher = p.matcher("i dog fox wo od hhh opp ppo and");
while (matcher.find()){
    System.out.println(matcher.group()); // dog fox
}

反向引用

当咱们在运用String.replaceAll

String s = "the quick brown fox jumps over the lazy dog.";
String r = s.replaceAll("\\s([a-z]{4})\\s", " <b>$1</b> ");
System.out.println(r); // the quick brown fox jumps <b>over</b> the <b>lazy</b> dog.

上面表达式作用是把匹配到的内容用 <b></b> 括起来

那个 $1 又是什么, 为什么这样写?

只是看外表，并不能加深我的了解，因而看源码剖析: String的replaceAll便是调用的matcher目标的replaceAll办法，并且每次都要创立新的目标Pattern和Matcher。

public String replaceAll(String regex, String replacement) {
    return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}

进入matcher目标的replaceAll办法:

public String replaceAll(String replacement) {
    // 清空当前matcher目标状态
    reset();
    // 调用find()取得匹配的字符串或子串
    boolean result = find();
    if (result) {
    // 找到了就履行操作，
        StringBuilder sb = new StringBuilder();
        do {
            appendReplacement(sb, replacement);
            result = find();
            // 将一切匹配的悉数进行替换
        } while (result);
        appendTail(sb);
        return sb.toString();
    }
    return text.toString();
}

一层层的点击，终于进入了一个叫appendExpandedReplacement的办法(里面详细的写了匹配办法)：当读取到\时，将\后边的字符拼上；读到$,还要判别一下是哪种捕获方法，{这种是命名组捕获(依据界说时命名，我在下面写)，呈现数字是数字捕获组(依据圆括号方位进行捕获), 对其数字捕获组进行拼接，顺次处理。

命名组捕获:

形如 (?<year>\\d{4}),以阔号包起来，在里面以?<命名>表明。

举个比方吧:
我把括号里的表达式命名为four，在外面以${名字}捕获

String s = "the quick brown fox jumps over the lazy dog.";
String r = s.replaceAll("\\s(?<four>[a-z]{4})\\s", " <b>${four}</b> ");
System.out.println(r); // the quick brown fox jumps <b>over</b> the <b>lazy</b> dog.

数字捕获组:

以 $ 加数字来匹配，按照圆括号的方位。

举例: 圆括号捕获组一组二并将其交流方位（匹配一对）

String input = "Hello, world! How are you?";
Pattern pattern = Pattern.compile("(\\w+),\\s+(\\w+)!");
Matcher matcher = pattern.matcher(input);
String output = matcher.replaceAll("$2, $1!");
System.out.println(output); // "world, Hello! How are you?"

圆括号捕获组一组二并将其交流方位（匹配两对）

public class Main {
    public static void main(String[] args) throws IOException {
        String input = "Hello, world! How, are you?";
        Pattern pattern = Pattern.compile("(\\w+),\\s+(\\w+)");
        Matcher matcher = pattern.matcher(input);
        String output = matcher.replaceAll("$2, $1");
        System.out.println(output); // world, Hello! are, How you?
    }
}

圆括号捕获组，为每一个匹配到的字符串添加<b></b>

String s = "the quick brown fox jumps over the lazy dog.";
String r = s.replaceAll("\\s([a-z]{4})\s", " <b>$1</b> ");
System.out.println(r); // the quick brown fox jumps <b>over</b> the <b>lazy</b> dog.

差不多便是这些了，再见。

【一文通关】Java正则表达式(看完这一篇就够了)

正则表达式

先了解什么是正则表达式

运用办法

详细的运用:

含糊匹配怎样个玩法

根底用法小结

小贴士

到此现已完结根本匹配了。

杂乱匹配

最初与结束

限制规模匹配

规则匹配

分组匹配

下面讲的是为什么这样调用，能够直接越过

上面仅供了解

非贪婪匹配

查找和替换

切割字符串

查找字符串

反向引用

命名组捕获:

数字捕获组:

作者信息

推广

【一文通关】Java正则表达式(看完这一篇就够了)

正则表达式

先了解什么是正则表达式

运用办法

详细的运用:

含糊匹配怎样个玩法

根底用法小结

小贴士

到此现已完结根本匹配了。

杂乱匹配

最初与结束

限制规模匹配

规则匹配

分组匹配

下面讲的是为什么这样调用，能够直接越过

上面仅供了解

非贪婪匹配

查找和替换

切割字符串

查找字符串

反向引用

命名组捕获:

数字捕获组:

相关文章

PlantUML指北：用UML设计和规划你的项目

5个接口性能提升的通用技巧

DeepAI 实践｜多图警告⚠️

Android 使用模板提高开发效率

作者信息

推广