当前位置: 首页 > 编程日记 > 正文

翻译:java.util.regex.Pattern

java.util.regex.Pattern

A compiled representation of a regular expression.

A regular expression(正则表达式), specified as a string, must first be compiled into an instance of this class首先编译成Pattern对象). The resulting pattern can then be used to create a Matcher objectPattern对象可以创建Matcher对象) that can match arbitrary character sequences和任意的字符串进行匹配) against the regular expression. All of the state involved in(涉及) performing a match resides in(存在,属于) the matcher, so many matchers can share(共享) the same pattern.

A typical invocation(调用) sequence is thus

 Pattern p = Pattern.compile("a*b");Matcher m = p.matcher("aaaaab");boolean b = m.matches();

A matches method is defined by this class as a convenience for(为方便) when a regular expression is used just once(只使用一次). This method compiles an expression and matches an input sequence(编译正则表达式、匹配输入的字符串) against it in a single invocation. The statement

 boolean b = Pattern.matches("a*b", "aaaaab");

is equivalent to the three statements above(这一条语句等价于上面的三条语句), though for(虽然) repeated matches it is less efficient(效率较低) since it does not allow the compiled pattern to be reused.

Instances of this class are immutable(不变的) and are safe for use by multiple concurrent threads(多个并发线程使用是安全的). Instances of the Matcher class are not safe for such use(Matcher对象不是线程安全的).

Summary of regular-expression constructs(正则表达式构造)

ConstructMatches
 
Characters
xThe character x
\\The backslash character(反斜杠)
\0nThe character with octal(八进制) value 0n (0 <= n <= 7)
\0nnThe character with octal value 0nn (0 <= n <= 7)
\0mnnThe character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhhThe character with hexadecimal(十六进制) value 0xhh
\uhhhhThe character with hexadecimal value 0xhhhh
\tThe tab character ('\u0009')
\nThe newline (line feed)(新的一行) character ('\u000A')
\rThe carriage-return character(回车换行) ('\u000D')
\fThe form-feed character ('\u000C')(制表字符)
\aThe alert (bell) character ('\u0007')(报警字符)
\eThe escape character ('\u001B')(转义字符)
\cxThe control character corresponding to x(控制字符)
 
Character classes
[abc]a, b, or c (simple class)(单一字符a或b或c)
[^abc]Any character except a, b, or c (negation)(除abc以外的字符)
[a-zA-Z]a through z or A through Z, inclusive(包含在内) (range)(单一字符a到z或者A到Z)
[a-d[m-p]]a through d, or m through p: [a-dm-p] (union)(并集)
[a-z&&[def]]d, e, or f (intersection)(交集)
[a-z&&[^bc]]a through z, except for b and c: [ad-z] (subtraction)(减去)
[a-z&&[^m-p]]a through z, and not m through p: [a-lq-z](subtraction)
 
Predefined character classes
.Any character (may or may not match line terminators)(任何字符)
\dA digit: [0-9](0到9的数字)
\DA non-digit: [^0-9](非数字)
\sA whitespace character: [ \t\n\x0B\f\r](空白字符)
\SA non-whitespace character: [^\s](非空白字符)
\wA word character: [a-zA-Z_0-9](字母下划线数字)
\WA non-word character: [^\w](非字母下划线数字)
 
POSIX character classes (US-ASCII only)
\p{Lower}A lower-case alphabetic character: [a-z]
\p{Upper}An upper-case alphabetic character:[A-Z]
\p{ASCII}All ASCII:[\x00-\x7F]
\p{Alpha}An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}A decimal digit: [0-9]
\p{Alnum}An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct}Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph}A visible character: [\p{Alnum}\p{Punct}]
\p{Print}A printable character: [\p{Graph}\x20]
\p{Blank}A space or a tab: [ \t]
\p{Cntrl}A control character: [\x00-\x1F\x7F]
\p{XDigit}A hexadecimal digit: [0-9a-fA-F]
\p{Space}A whitespace character: [ \t\n\x0B\f\r]
 
java.lang.Character classes (simple java character type)
\p{javaLowerCase}Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase}Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace}Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored}Equivalent to java.lang.Character.isMirrored()
 
Classes for Unicode blocks and categories
\p{InGreek}A character in the Greek block (simple block)
\p{Lu}An uppercase letter (simple category)
\p{Sc}A currency symbol
\P{InGreek}Any character except one in the Greek block (negation)
[\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction)
 
Boundary matchers
^The beginning of a line(标记开始)
$The end of a line(标记结束)
\bA word boundary
\BA non-word boundary
\AThe beginning of the input
\GThe end of the previous match
\ZThe end of the input but for the final terminator, if any
\zThe end of the input
 
Greedy quantifiers
X?X, once or not at all
X*X, zero or more times
X+X, one or more times
X{n}X, exactly n times
X{n,}X, at least n times
X{n,m}X, at least n but not more than m times
 
Reluctant quantifiers
X??X, once or not at all
X*?X, zero or more times
X+?X, one or more times
X{n}?X, exactly n times
X{n,}?X, at least n times
X{n,m}?X, at least n but not more than m times
 
Possessive quantifiers
X?+X, once or not at all
X*+X, zero or more times
X++X, one or more times
X{n}+X, exactly n times
X{n,}+X, at least n times
X{n,m}+X, at least n but not more than m times
 
Logical operators
XYX followed by Y
X|YEither X or Y
(X)X, as a capturing group
 
Back references
\nWhatever the nthcapturing group matched
 
Quotation
\Nothing, but quotes the following character
\QNothing, but quotes all characters until \E
\ENothing, but ends quoting started by \Q
 
Special constructs (non-capturing)
(?:X)X, as a non-capturing group
(?idmsux-idmsux) Nothing, but turns match flags i d m s u x on - off
(?idmsux-idmsux:X) X, as a non-capturing group with the given flags i d m s u x on - off
(?=X)X, via zero-width positive lookahead
(?!X)X, via zero-width negative lookahead
(?<=X)X, via zero-width positive lookbehind
(?<!X)X, via zero-width negative lookbehind
(?>X)X, as an independent, non-capturing group


Backslashes, escapes, and quoting

The backslash character ('\') serves to introduce escaped constructs, as defined in the table above, as well as to quote characters that otherwise would be interpreted as unescaped constructs. Thus the expression \\ matches a single backslash and \{ matches a left brace.

It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.

Backslashes within string literals in Java source code are interpreted as required by the

Java Language Specification as either Unicode escapes or other character escapes. It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler. The string literal "\b", for example, matches a single backspace character when interpreted as a regular expression, while "\\b" matches a word boundary. The string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.

Character Classes

Character classes may appear within other character classes, and may be composed by the union operator (implicit) and the intersection operator (&&). The union operator denotes a class that contains every character that is in at least one of its operand classes. The intersection operator denotes a class that contains every character that is in both of its operand classes.

The precedence of character-class operators is as follows, from highest to lowest:

1Literal escape\x
2Grouping[...]
3Rangea-z
4Union[a-e][i-u]
5Intersection[a-z&&[aeiou]]

Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression . loses its special meaning inside a character class, while the expression - becomes a range forming metacharacter.

Line terminators

A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:

  • A newline (line feed) character ('\n'),
  • A carriage-return character followed immediately by a newline character ("\r\n"),
  • A standalone carriage-return character ('\r'),
  • A next-line character ('\u0085'),
  • A line-separator character ('\u2028'), or
  • A paragraph-separator character ('\u2029).

If

UNIX_LINES mode is activated, then the only line terminators recognized are newline characters.

The regular expression . matches any character except a line terminator unless the DOTALL flag is specified.

By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence.

Groups and capturing

Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups:

1((A)(B(C)))
2(A)
3(B(C))
4(C)

Group zero always stands for the entire expression.

Capturing groups are so named because, during a match, each subsequence of the input sequence that matches such a group is saved. The captured subsequence may be used later in the expression, via a back reference, and may also be retrieved from the matcher once the match operation is complete.

The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.

Groups beginning with (? are pure, non-capturing groups that do not capture text and do not count towards the group total.

Unicode support

This class is in conformance with Level 1 of

Unicode Technical Standard #18: Unicode Regular Expression Guidelines, plus RL2.1 Canonical Equivalents.

Unicode escape sequences such as \u2014 in Java source code are processed as described in �3.3 of the Java Language Specification. Such escape sequences are also implemented directly by the regular-expression parser so that Unicode escapes can be used in expressions that are read from files or from the keyboard. Thus the strings "\u2014" and "\\u2014", while not equal, compile into the same pattern, which matches the character with hexadecimal value 0x2014.

Unicode blocks and categories are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property. Blocks are specified with the prefix In, as in InMongolian. Categories may be specified with the optional prefix Is: Both \p{L} and \p{IsL} denote the category of Unicode letters. Blocks and categories can be used both inside and outside of a character class.

The supported categories are those of

The Unicode Standard in the version specified by the Character class. The category names are those defined in the Standard, both normative and informative. The block names supported by Pattern are the valid block names accepted and defined by UnicodeBlock.forName.

Categories that behave like the java.lang.Character boolean ismethodname methods (except for the deprecated ones) are available through the same \p{prop} syntax where the specified property has the name javamethodname.

Comparison to Perl 5

The Pattern engine performs traditional NFA-based matching with ordered alternation as occurs in Perl 5.

Perl constructs not supported by this class:

  • The conditional constructs (?{X}) and (?(condition)X|Y),

  • The embedded code constructs (?{code}) and (??{code}),

  • The embedded comment syntax (?#comment), and

  • The preprocessing operations \l \u, \L, and \U.

Constructs supported by this class but not by Perl:

  • Possessive quantifiers, which greedily match as much as they can and do not back off, even when doing so would allow the overall match to succeed.

  • Character-class union and intersection as described

  • above.

Notable differences from Perl:

  • In Perl, \1 through \9 are always interpreted as back references; a backslash-escaped number greater than 9 is treated as a back reference if at least that many subexpressions exist, otherwise it is interpreted, if possible, as an octal escape. In this class octal escapes must always begin with a zero. In this class, \1 through \9 are always interpreted as back references, and a larger number is accepted as a back reference if at least that many subexpressions exist at that point in the regular expression, otherwise the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit.

  • Perl uses the g flag to request a match that resumes where the last match left off. This functionality is provided implicitly by the Matcher class: Repeated invocations of the find method will resume where the last match left off, unless the matcher is reset.

  • In Perl, embedded flags at the top level of an expression affect the whole expression. In this class, embedded flags always take effect at the point at which they appear, whether they are at the top level or within a group; in the latter case, flags are restored at the end of the group just as in Perl.

  • Perl is forgiving about malformed matching constructs, as in the expression *a, as well as dangling brackets, as in the expression abc], and treats them as literals. This class also accepts dangling brackets but is strict about dangling metacharacters like +, ? and *, and will throw a PatternSyntaxException if it encounters them.

For a more precise description of the behavior of regular expression constructs, please see Mastering Regular Expressions, 3nd Edition, Jeffrey E. F. Friedl, O'Reilly and Associates, 2006.

Since:
1.4
See Also:
String.split(String, int), String.split(String), Serialized Form

相关文章:

学习笔记53—Wilcoxon检验和Mann-whitney检验的区别

Wilcoxon signed-rank test应用于两个related samples Mann–Whitney U test也叫Wilcoxon rank-sum test&#xff0c;应用于两个independent samples的情samples size小的时候&#xff0c;是有列表的&#xff0c;sample size大到20左右时&#xff0c;就可以使用正态分布来近似…

s-systemtap工具使用图谱(持续更新)

整体的学习思维导图如下&#xff0c;后续持续更新完善 文章目录安装简介执行流程执行方式stap脚本语法探针语法API函数探针举例变量使用基本应用1. 定位函数位置2. 查看文件能够添加探针的位置3. 打印函数参数&#xff08;结构体&#xff09;4. 打印函数局部变量5. 修改函数局…

2014 Super Training #8 C An Easy Game --DP

原题&#xff1a;ZOJ 3791 http://acm.zju.edu.cn/onlinejudge/showProblem.do?problemCode3791 题意&#xff1a;给定两个0-1序列s1, s2&#xff0c;操作t次&#xff0c;每次改变m个位置&#xff0c;求把s1改变为s2的方法总数。解法&#xff1a; DP&#xff0c;s1和s2哪些位置…

qq分享组件 android,移动端,分享插件

移动端&#xff0c;分享插件发布时间&#xff1a;2018-06-26 10:03,浏览次数&#xff1a;762最近有一个活动页需要在移动端浏览器分享网页到微信&#xff0c;QQ。虽然每一个浏览器都有分享到微信的能力&#xff0c;但不是每个都提供接口供网页来调用。及时有提供&#xff0c;浏…

MySQL中更改表操作

2019独角兽企业重金招聘Python工程师标准>>> 添加一列&#xff1a; alter table t_stu add tel char(20); 删除一个列&#xff1a; alter table t_stu drop column tel; 添加唯一约束&#xff1a; alter table t_stu add constraint uk_srname unique(scode); 添加主…

Maven-环境配置

二更 可算搞好了&#xff0c;除了下面外&#xff0c;我找到了setting.xml里面的东西&#xff0c;给出来就好了&#xff0c;简单就是mvn -v弄好后&#xff0c;setting.xml里面写好的话&#xff0c;直接加入&#xff0c;然后让eclipse下载jar包 然后就可以运行网上的基本项目了 1…

分布式存储(ceph)技能图谱(持续更新)

一下为个人结合其他人对分布式存储 所需的技能进行总结&#xff0c;绘制成如下图谱&#xff0c;方便针对性学习。 这里对分布式存储系统接触较多的是ceph,所以在分布式存储系统分支上偏向ceph的学习。 如有分类有问题或者分支不合理&#xff0c;欢迎大家批评指正&#xff0c;目…

android如何查看方法属于哪个类,Android Studio查看类中所有方法和属性

css-关于位置当你设置一个你想要相对的模块为relative 你这个模块为absolute 则你的这个absolute会相对relative的那个模块进行移动.微信公众平台自动回复wechatlib&period;jar的生成及wechatlib解析微信公众平台出来有一段时日了,官方提供的自动回复的接口调用大致是这么些…

读javascript高级程序设计03-函数表达式、闭包、私有变量

一、函数声明和函数表达式 定义函数有两种方式&#xff1a;函数声明和函数表达式。它们之间一个重要的区别是函数提升。 1.函数声明会进行函数提升&#xff0c;所以函数调用在函数声明之前也不会报错&#xff1a; test(); function test(){ alert(1); } 2.函数表达式不会进行函…

集成公司内部的多个子系统(兼容B/S和C/S),实现单点登录功能的多系统的统一入口功能...

有一句话也挺有意思的&#xff0c;一直在模仿但从未超越过&#xff0c;文章里的技术也都是相对简单的技术&#xff0c;但是实实在在能解决问题&#xff0c;提高效率。现在人都懒得瞎折腾&#xff0c;能多简单就多简单&#xff0c;谁都不希望总是做一些重复的工作&#xff0c;我…

mysql无法远程连接

在mysql的mysql数据库下&#xff1a; select user,host from user;(查看&#xff0c;没有本机的访问权限) grant all privileges on *.* to root"xxx.xxx.xxx.xxx" identified by "密码";(xx为本机ip,%为所有IP) flush privileges; select user,host from …

哈希表的分类,创建,查找 以及相关问题解决

总体的hash学习导图如下&#xff1a; 文章目录定义分类字符hash排序hash链式hash&#xff08;解决hash冲突&#xff09;创建链式hash查找指定数值STL map(hash)哈希分类 完整测试代码应用&#xff08;常见题目&#xff09;1. 回文字符串&#xff08;Longest Palindrome&#x…

android 自定义音乐圆形进度条,Android自定义View实现音频播放圆形进度条

本篇文章介绍自定义View配合属性动画来实现如下的效果实现思路如下&#xff1a;根据播放按钮的图片大小计算出圆形进度条的大小根据音频的时间长度计算出圆形进度条绘制的弧度通过Handler刷新界面来更新圆形进度条的进度具体实现过程分析&#xff1a;首先来看看自定义View中定义…

jsp error-page没有生效

1、首页检查web.xml中的配置&#xff0c;确保路径是正确的 <error-page> <error-code>404</error-code> <location>/error.jsp</location> </error-page> 2、然后再检查error.jsp文件内容是否有问题&#xff0c;比如只有<head>&…

CTO(首席技术官)

CTO&#xff08;首席技术官&#xff09;英文Chief Technology Officer&#xff0c;即企业内负责技术的最高负责人。这个名称在1980年代从美国开始时兴。起于做很多研究的大公司&#xff0c;如General Electric&#xff0c;AT&T&#xff0c;ALCOA&#xff0c;主要责任是将科…

把数组排成最小的数

题目 输入一个正整数数组&#xff0c;把数组里所有数字拼接起来排成一个数&#xff0c;打印能拼接出的所有数字中最小的一个。例如输入数组{3&#xff0c;32&#xff0c;321}&#xff0c;则打印出这三个数字能排成的最小数字为321323。 思路 一  需要找到字典序最小的哪个排列…

shell脚本自动执行,top命令无输出

shell脚本在系统启动时推后台自动执行&#xff0c;发现其中/usr/bin/top -n 1 -c -b -u ceph 命令并无输出 但是系统启动之后手动执行脚本&#xff0c;&推后台脚本中的top仍然能够正常输出&#xff0c;仅仅是系统发生重启&#xff0c;该功能就不生效了 stackoverflow 推荐…

0709 C语言常见误区----------函数指针问题

1.函数指针的定义 对于函数 void test(int a, int b){ // } 其函数指针类型是void (* ) (int , int)&#xff0c; 注意这里第一个括号不能少&#xff0c; 定义一个函数指针&#xff0c;void (* pfunc)(int , int) ,其中pfunc就是函数指针类型&#xff0c; 它指向的函数类型必须…

android 定时换图片,android 视频和图片切换并进行自动轮播

刚入android没多久&#xff0c;遇到的比较郁闷的问题就是子线程主线程的问题&#xff0c;更改UI界面&#xff0c;本想做一个图片的轮播但是比较简单&#xff0c;然后就试试实现视频跟图片切换播放进行不停的循环播放。也用过不少控件&#xff0c;但是知其然不知其所以然&#x…

Win8:Snap 实现

Win8允许分屏的操作&#xff0c;所以我们必须让我们App能对Snap模式视图做出反应&#xff0c;这样也便于通过Store的审核。如果项目中在Snap展现的功能不大&#xff0c;我们可以仅用一张logo代替&#xff0c;类似系统的商店应用。 我的项目实现效果&#xff1a; 实现思路是在你…

ping命令使用及其常用参数

PING (Packet Internet Groper)&#xff0c;因特网包探索器&#xff0c;用于测试网络连接量检查网络是否连通&#xff0c;可以很好地帮助我们分析和判定网络故障。Ping发送一个ICMP(Internet Control Messages Protocol&#xff09;即因特网信报控制协议&#xff1b;回声请求消…

g-gdb工具使用图谱(持续更新)

如下为整个GDB的学习导图

android bitmap 转drawable,android Drawable转换成Bitmap失败

错误代码&#xff1a;08-07 06:42:30.482 28497-28497/app.tianxiayou E/AndroidRuntime﹕ FATAL EXCEPTION: mainProcess: app.tianxiayou, PID: 28497java.lang.RuntimeException: Unable to start activity ComponentInfo{app.tianxiayou/app.tianxiayou.AppInfoActivity}: …

微软职位内部推荐-Software Development Engineer II

微软近期Open的职位:Job Title:Software Development EngineerIIDivision: Server & Tools Business - Commerce Platform GroupWork Location: Shanghai, ChinaAre you looking for a high impact project that involves processing of billions of dollars, hundreds of …

Lync与Exchange 2013 UM集成:Exchange 配置

有点长时间没有更新文章了&#xff0c;也是由于工作的原因确实忙不过来&#xff0c;好在博客上还有这么多朋友支持&#xff0c;非常的欣慰啊。很久没有给大家带来新东西&#xff0c;真的非常抱歉&#xff0c;今天跟大家带来的是Lync Server 2013与Exchange Server 2013统一消息…

「Java基本功」一文读懂Java内部类的用法和原理

内部类初探 一、什么是内部类&#xff1f; 内部类是指在一个外部类的内部再定义一个类。内部类作为外部类的一个成员&#xff0c;并且依附于外部类而存在的。内部类可为静态&#xff0c;可用protected和private修饰&#xff08;而外部类只能使用public和缺省的包访问权限&#…

从一致性hash到ceph crush算法演进图谱(持续更新)

参考文档&#xff1a; https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf Ceph剖析&#xff1a;数据分布之CRUSH算法与一致性Hash

[原]unity3d之http多线程异步资源下载

郑重声明&#xff1a;转载请注明出处 U_探索 本文诞生于乐元素面试过程&#xff0c;被面试官问到AssetBundle多线程异步下载时&#xff0c;愣了半天&#xff0c;同样也被深深的鄙视一回&#xff08;做了3年多u3d 这个都没用过&#xff09;&#xff0c;所以发誓要实现出来填补一…

android首页图片轮播效果,Android_Android自动播放Banner图片轮播效果,先看一下效果图支持本地图 - phpStudy...

Android自动播放Banner图片轮播效果先看一下效果图支持本地图片以及网络图片or本地网络混合。使用方式&#xff1a;android:id"id/banner"android:layout_width"match_parent"android:layout_height"230dip">核心代码&#xff1a;int length …

mongodb 入门

在NOSQL的多个数据库版本中,mongodb相对比较成熟,把学mongodb笔记整理在这&#xff0c;方便以后回顾。这笔记预计分三部分&#xff1a; 一&#xff0c;基础操作&#xff0c;二、增删改查详细操作&#xff0c;三、高级应用。一、在linux在安装mongodb&#xff0c;在linux下安装m…