当前位置: 首页 > 编程日记 > 正文

正则表达式语法规则收集

 turnmissile 的 Blog http://blog.csdn.net/turnmissile/

Microsoft已经把正则表达式的规则收录在了msdn里面了,有兴趣的朋友可以自己去研究一下(ms-help://MS.MSDNQTR.2003OCT.1033/cpgenref/html/cpconRegularExpressionsLanguageElements.htm),这里罗列一些我找到的语法元素功能表,大家自己研究吧!

转意字符表

Escaped character

Description

ordinary characters

Characters other than . $ ^ { [ ( | ) * + ? / match themselves.

/a

Matches a bell (alarm) /u0007.

/b

Matches a backspace /u0008 if in a [] character class; otherwise, see the note following this table.

/t

Matches a tab /u0009.

/r

Matches a carriage return /u000D.

/v

Matches a vertical tab /u000B.

/f

Matches a form feed /u000C.

/n

Matches a new line /u000A.

/e

Matches an escape /u001B.

/040

Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. (For more information, see Backreferences.) For example, the character /040 represents a space.

/x20

Matches an ASCII character using hexadecimal representation (exactly two digits).

/cC

Match+es an ASCII control character; for example, /cC is control-C.

/u0020

Matches a Unicode character using hexadecimal representation (exactly four digits).

/

When followed by a character that is not recognized as an escaped character, matches that character. For example, /* is the same as /x2A.

Note   The escaped character /b is a special case. In a regular expression, /b denotes a word boundary (between /w and /W characters) except within a [] character class, where /b refers to the backspace character. In a replacement pattern, /b always denotes a backspace.

字符集

A character class is a set of characters that will find a match if any one of the characters included in the set matches. The following table summarizes character matching syntax.

Character class

Description

.

Matches any character except /n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options.

[aeiou]

Matches any single character included in the specified set of characters.

[^aeiou]

Matches any single character not in the specified set of characters.

[0-9a-fA-F]

Use of a hyphen () allows specification of contiguous character ranges.

/p{name}

Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.

/P{name}

Matches text not included in groups and block ranges specified in {name}.

/w

Matches any word character. Equivalent to the Unicode character categories
[/p{Ll}/p{Lu}/p{Lt}/p{Lo}/p{Nd}/p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, /w is equivalent to [a-zA-Z_0-9].

/W

Matches any nonword character. Equivalent to the Unicode categories [^/p{Ll}/p{Lu}/p{Lt}/p{Lo}/p{Nd}/p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, /W is equivalent to [^a-zA-Z_0-9].

/s

Matches any white-space character. Equivalent to the Unicode character categories [/f/n/r/t/v/x85/p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, /s is equivalent to [ /f/n/r/t/v].

/S

Matches any non-white-space character. Equivalent to the Unicode character categories [^/f/n/r/t/v/x85/p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, /S is equivalent to [^ /f/n/r/t/v].

/d

Matches any decimal digit. Equivalent to /p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.

/D

Matches any nondigit. Equivalent to /P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.

You can find the Unicode category a character belongs to with the method

正则表达式选项

and ECMAScript are not allowed inline.

RegexOption member

Inline character

Description

None

N/A

Specifies that no options are set.

IgnoreCase

i

Specifies case-insensitive matching.

Multiline

m

Specifies multiline mode. Changes the meaning of ^ and $ so that they match at the beginning and end, respectively, of any line, not just the beginning and end of the whole string.

ExplicitCapture

n

Specifies that the only valid captures are explicitly named or numbered groups of the form (?<name>…). This allows parentheses to act as noncapturing groups without the syntactic clumsiness of (?:…).

Compiled

N/A

Specifies that the regular expression will be compiled to an assembly. Generates Microsoft intermediate language (MSIL) code for the regular expression; yields faster execution at the expense of startup time.

Singleline

s

Specifies single-line mode. Changes the meaning of the period character (.) so that it matches every character (instead of every character except /n).

IgnorePatternWhitespace

x

Specifies that unescaped white space is excluded from the pattern and enables comments following a number sign (#). (For a list of escaped white-space characters, see Character Escapes.) Note that white space is never eliminated from within a character class.

RightToLeft

N/A

Specifies that the search moves from right to left instead of from left to right. A regular expression with this option moves to the left of the starting position instead of to the right. (Therefore, the starting position should be specified as the end of the string instead of the beginning.) This option cannot be specified in midstream, to prevent the possibility of crafting regular expressions with infinite loops. However, the (?<) lookbehind constructs provide something similar that can be used as a subexpression.

RightToLeft changes the search direction only. It does not reverse the substring that is searched for. The lookahead and lookbehind assertions do not change: lookahead looks to the right; lookbehind looks to the left.

ECMAScript

N/A

Specifies that ECMAScript-compliant behavior is enabled for the expression. This option can be used only in conjunction with the IgnoreCase and Multiline flags. Use of ECMAScript with any other flags results in an exception.

CultureInvariant

N/A

Specifies that cultural differences in language is ignored. See Performing Culture-Insensitive Operations in the RegularExpressions Namespace for more information.

Atomic Zero-Width Assertions

Assertion

Description

^

Specifies that the match must occur at the beginning of the string or the beginning of the line. For more information, see the Multiline option in Regular Expression Options.

$

Specifies that the match must occur at the end of the string, before /n at the end of the string, or at the end of the line. For more information, see the Multiline option in Regular Expression Options.

/A

Specifies that the match must occur at the beginning of the string (ignores the Multiline option).

/Z

Specifies that the match must occur at the end of the string or before /n at the end of the string (ignores the Multiline option).

/z

Specifies that the match must occur at the end of the string (ignores the Multiline option).

/G

Specifies that the match must occur at the point where the previous match ended. When used with Match.NextMatch(), this ensures that matches are all contiguous.

/b

Specifies that the match must occur on a boundary between /w (alphanumeric) and /W (nonalphanumeric) characters. The match must occur on word boundaries — that is, at the first or last characters in words separated by any nonalphanumeric characters.

/B

Specifies that the match must not occur on a /b boundary.

数量

Quantifier

Description

*

Specifies zero or more matches; for example, /w* or (abc)*. Equivalent to {0,}.

+

Specifies one or more matches; for example, /w+ or (abc)+. Equivalent to {1,}.

?

Specifies zero or one matches; for example, /w? or (abc)?. Equivalent to {0,1}.

{n}

Specifies exactly n matches; for example, (pizza){2}.

{n,}

Specifies at least n matches; for example, (abc){2,}.

{n,m}

Specifies at least n, but no more than m, matches.

*?

Specifies the first match that consumes as few repeats as possible (equivalent to lazy *).

+?

Specifies as few repeats as possible, but at least one (equivalent to lazy +).

??

Specifies zero repeats if possible, or one (lazy ?).

{n}?

Equivalent to {n} (lazy {n}).

{n,}?

Specifies as few repeats as possible, but at least n (lazy {n,}).

{n,m}?

Specifies as few repeats as possible between n and m (lazy {n,m}).

组构造

Grouping constructs allow you to capture groups of subexpressions and to increase the efficiency of regular expressions with noncapturing lookahead and lookbehind modifiers. The following table describes the Regular Expression Grouping Constructs.

Grouping construct

Description

(   )

Captures the matched substring (or noncapturing group; for more information, see the ExplicitCapture option in Regular Expression Options). Captures using () are numbered automatically based on the order of the opening parenthesis, starting from one. The first capture, capture element number zero, is the text matched by the whole regular expression pattern.

(?<name>   )

Captures the matched substring into a group name or number name. The string used for name must not contain any punctuation and it cannot begin with a number. You can use single quotes instead of angle brackets; for example, (?'name').

(?<name1-name2> )

Balancing group definition. Deletes the definition of the previously defined group name2 and stores in group name1 the interval between the previously defined name2 group and the current group. If no group name2 is defined, the match backtracks. Because deleting the last definition of name2 reveals the previous definition of name2, this construct allows the stack of captures for group name2 to be used as a counter for keeping track of nested constructs such as parentheses. In this construct, name1 is optional. You can use single quotes instead of angle brackets; for example, (?'name1-name2').

(?:   )

Noncapturing group.

(?imnsx-imnsx:   )

Applies or disables the specified options within the subexpression. For example, (?i-s: ) turns on case insensitivity and disables single-line mode. For more information, see Regular Expression Options.

(?=   )

Zero-width positive lookahead assertion. Continues match only if the subexpression matches at this position on the right. For example, /w+(?=/d) matches a word followed by a digit, without matching the digit. This construct does not backtrack.

(?!   )

Zero-width negative lookahead assertion. Continues match only if the subexpression does not match at this position on the right. For example, /b(?!un)/w+/b matches words that do not begin with un.

(?<=   )

Zero-width positive lookbehind assertion. Continues match only if the subexpression matches at this position on the left. For example, (?<=19)99 matches instances of 99 that follow 19. This construct does not backtrack.

(?<!   )

Zero-width negative lookbehind assertion. Continues match only if the subexpression does not match at the position on the left.

(?>   )

Nonbacktracking subexpression (also known as a "greedy" subexpression). The subexpression is fully matched once, and then does not participate piecemeal in backtracking. (That is, the subexpression matches only strings that would be matched by the subexpression alone.)

Named captures are numbered sequentially, based on the left-to-right order of the opening parenthesis (like unnamed captures), but numbering of named captures starts after all unnamed captures have been counted. For instance, the pattern ((?<One>abc)/d+)?(?<Two>xyz)(.*) produces the following capturing groups by number and name. (The first capture (number 0) always refers to the entire pattern).

Number

Name

Pattern

0

0 (default name)

((?<One>abc)/d+)?(?<Two>xyz)(.*)

1

1 (default name)

((?<One>abc)/d+)

2

2 (default name)

(.*)

3

One

(?<One>abc)

4

Two

(?<Two>xyz)

Backreference Constructs

The following table lists optional parameters that add backreference modifiers to a regular expression.

Backreference construct

Definition

/number

Backreference. For example, (/w)/1 finds doubled word characters.

/k<name>

Named backreference. For example, (?<char>/w)/k<char> finds doubled word characters. The expression (?<43>/w)/43 does the same. You can use single quotes instead of angle brackets; for example, /k'char'.

Note the ambiguity between octal escape codes and /number backreferences that use the same notation. See Backreferences for details on how the regular expression engine resolves the ambiguity.

其他

The following table lists subexpressions that modify a regular expression.

Construct

Definition

(?imnsx-imnsx)

Sets or disables options such as case insensitivity to be turned on or off in the middle of a pattern. For information on specific options, see Regular Expression Options. Option changes are effective until the end of the enclosing group. See also the information on the grouping construct (?imnsx-imnsx: ), which is a cleaner form.

(?# )

Inline comment inserted within a regular expression. The comment terminates at the first closing parenthesis character.

# [to end of line]

X-mode comment. The comment begins at an unescaped # and continues to the end of the line. (Note that the x option or the RegexOptions.IgnorePatternWhitespace enumerated option must be activated for this kind of comment to be recognized.)

相关文章:

Python实现信息自动配对爬虫排版程序

作者 | 李秋键责编 | 晋兆雨头图 | CSDN付费下载自视觉中国在很多的公司项目中&#xff0c;常常有很多对office项目的比较机械化的操作&#xff0c;在这里就可以借助python实现对office的合理排版。而这里我们就将借助海尔公司的出货表爬取对应图片信息&#xff0c;并重新排版成…

关于2012年度土建工程专业中级专业技术资格考试有关问题的通知

关于2012年度土建工程专业中级专业技术资格考试有关问题的通知 现将2012年度土建工程专业初、中级专业技术资格考试、报名、培训工作有关事项通知如下&#xff1a; 一、考试组织 湖南省住房和城乡建设厅、省人力资源和社会保障厅联合成立领导小组&#xff0c;负责对考试工作进行…

Android自定义控件NumberCircleProgressBar(圆形进度条)的实现

Android自定义控件NumberCircleProgressBar(圆形进度条)的实现

做出的C++选择以及背后的原因

要让出资人明白你做出的C选择以及背后的原因。也许出资人会有更容易操作、更快实现的好主意。3、为你提供的日期说明信心范围。很可能管理层不明白你的估算意味着什么&#xff0c;而且你也有可能不理解他们所要的东西。 2&#xff0e;变量的C使用范围每个变量的使用范围只在定义…

拖放 DataGrid 列--来自MSDN

发布日期&#xff1a; 09/19/2004| 更新日期&#xff1a; 09/19/2004Chris SanoMicrosoft Corporation 摘要&#xff1a;了解如何利用基本的 GDI 功能&#xff0c;从而通过 DataGrid 控件获得可视化效果。通过跨越托管边界进行调用&#xff0c;可以利用本机 GDI 功能来执行屏幕…

最近很火的最新一代国际视频标准 VVC 到底是什么?阿里专家为你揭秘

作者 | 叶琰&#xff0c;阿里巴巴达摩院XG实验室视频标准团队负责人责编 | 夕颜头图 | CSDN付费下载自视觉中国2020年7月1日晚上&#xff08;日内瓦时间&#xff09;&#xff0c;第十九次JVET会议在线上落下帷幕&#xff0c;新一代国际视频编码标准VVC第一版&#xff08;Versat…

nesC编程入门

1.接口 NesC程序主要由各式组件&#xff08;component&#xff09;构成&#xff0c;组件和组件之间通过特定的接口&#xff08;interface&#xff09;互相沟通。一个接口内声明了提供相关服务的方法&#xff08;C语言函数&#xff09;。例如数据读取接口&#xff08;Read&#…

用asp.net实现的把本文推荐给好友功能

作者&#xff1a; 飞鹰 www.ASPCool.com 时间:2001-11-25 17:39:07 ///<summary> ///<author>飞鹰ASPCool.com</author> ///<description>本文用asp.net实现把此文推荐给好友的功能。</desciption> ///<copyright>ASP酷技术资讯…

Access sql语句创建表及字段类型

创建一张空表&#xff1a; Sql"Create TABLE [表名]" 创建一张有字段的表&#xff1a; Sql"Create TABLE [表名]([字段名1] MEMO NOT NULL, [字段名2] MEMO, [字段名3] COUNTER NOT NULL, [字段名4] DATETIME, [字段名5] TEXT(200), [字段名6] TEXT(200)) 字段类…

“刚毕业1年,做Python能挣多少?”网友:吹的不多..

01现状揭秘&#xff1a;Python的火持续燃烧程序员&#xff1a;心态崩了&#xff01;2020年转眼已经大半&#xff0c;在近几个月的榜单中&#xff0c;Python已经连续走上卫冕的道路&#xff0c;并且与Java的差距拉得更远了一些。以往与Java常呈现你追我赶之势&#xff0c;而这一…

Java编程的逻辑 (39) - 剖析LinkedList

本系列文章经补充和完善&#xff0c;已修订整理成书《Java编程的逻辑》&#xff0c;由机械工业出版社华章分社出版&#xff0c;于2018年1月上市热销&#xff0c;读者好评如潮&#xff01;各大网店和书店有售&#xff0c;欢迎购买&#xff0c;京东自营链接&#xff1a;http://it…

运用.NET读写Windows注册编辑表

作者&#xff1a; 冉林仓 www.ASPCool.com 时间:2001-11-9 如果你曾经使用过RegOpenKeyEx、RegCreateKeyEx、RegCloseKey等Win32 API函数读写过注册编辑表&#xff0c;你肯定非常熟悉这些复杂的Registry函数。相反&#xff0c;在.NET框架中&#xff0c;Registry和RegistryK…

使用正则表达式抽取新闻/BBS网页发表时间

package org.apache.nutch.parse.html; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * 分析时间戳 * * author xum * */ public class Publish…

为什么Python没有main函数?

作者 | 豌豆花下猫来源 | Python猫&#xff08;ID:python_cat&#xff09;众所周知&#xff0c;Python中没有所谓的main函数&#xff0c;但是网上经常有文章提到“ Python的main函数”和“建议编写main函数”。其实&#xff0c;可能他们是想模仿真正的main函数&#xff0c;但是…

HTTP访问服务的相关解释

一、访问网站的基本流程第一步&#xff1a;客户端用户在浏览器输入www.51cto.com网站&#xff0c;回车后&#xff0c;系统首先会查找系统本地的DNS缓存及hosts文件信息&#xff0c;确定是否存在www.51cto.com余名对应的IP解析记录&#xff0c;如果有就直接获取IP地址&#xff0…

关于ASP.Net中的时间处理

作者&#xff1a; 飞刀 www.ASPCool.com 时间:2001-8-8 这里我想谈谈ASP.Net中对时间的处理 在ASP.Net中&#xff0c;M$为我们提供一种名为DateTime的对象&#xff0c;我们用这个对象来取得当前的时间。比如&#xff1a; DateTime dtDateTime.Now; 在上面…

还缺30万人!程序员2020年要过好日子了……

最近&#xff0c;程序员届有一个重大好消息&#xff0c;可能很多人还不知道&#xff0c;那就是&#xff1a;国内某些城市已经开始程序员人才补贴了&#xff01;对于人工智能公司的项目开发、人才引进、科技研发&#xff0c;最高按照国拨经费的30%给予配套支持&#xff0c;单个项…

淘宝海量数据库之二:一致性选择

众所周知&#xff0c;一致性是数据最关键的属性之一。2000年&#xff0c;Eric Brewer教授在ACM分布式计算年会上指出了著名的CAP理论&#xff1a; Brewer, E. A. 2000. Towards robust distributed systems. In Proceedings of the 19th Annual ACM Symposium on Principles of…

Linux 小记录!

rmdir与 rm -r 的不同处前者这能删除目录 后者目录和文件都可以删除cp 和echo 都会覆盖原有的内容ctrl &#xff0b; c 强制中断这条命令/前后是没有空格的快捷键&#xff1a;TAB 命令 路径补全符号&#xff1a;; 多个命令的分隔符/ 根或者路径的分隔符。> 标准输出重定向…

Session 详解

作者&#xff1a; heallven www.ASPCool.com 时间:2004-8-28 阅读本文章之前的准备 阅读本文章前&#xff0c;需要读者对以下知识有所了解。否则&#xff0c;阅读过程中会在相应的内容上遇到不同程度的问题。 懂得ASP/ASP.NET编程 了解ASP/ASP.NET的S…

实现一个模拟CMD.exe命令编辑模式执行与显示的Delphi控件

cmd.exe这个东西是Windows系统自带的执行Dos的一个灰常好的人机命令交互的执行方式&#xff0c;现在很多脚本语言也都带有这种即时解释的人机模式。当下由于工程的需要&#xff0c;也要做一个类似命令解释显示的编辑器&#xff0c;基本上完全模拟Cmd.exe的这种交互模式&#xf…

谷歌这波大动作,暴露了什么信号?

我们都知道谷歌爸爸收购了Cask Data一家公司。长期以来&#xff0c;谷歌致力于推动围绕 GoogleCloud 的企业业务&#xff0c;但在这方面一直被亚马逊和微软吊打&#xff0c;这次的收购正是为了弥补自身的短板。被收购的 Cask Data 是一家专门提供基于Hadoop的大型数据分析服务解…

OSChina 周一乱弹 ——喝不到放心奶

2019独角兽企业重金招聘Python工程师标准>>> 【今日歌曲推荐】 陈李雨声 : 梦想还是要有的 万一实现了呢。《secret base》 《secret base》- 茅野愛衣 / 戸松遥 / 早見沙織 手机党少年们想听歌&#xff0c;请使劲儿戳&#xff08;这里&#xff09;. 紫King : 这个大…

Assembly学习心得

http://blog.csdn.net/etmonitor/Assembly学习心得说明&#xff1a;最近开始准备把学到的.NET知识重新整理一遍&#xff0c;眼过千遍不如手过一遍&#xff0c;所以我准备记下我的学习心得&#xff0c;已备参考。J各位都是大虾了&#xff0c;如果有哪些错误或者不完整的地方&…

Oracle profile 用户资源限制 说明

一. 官网说明CREATE PROFILEhttp://download.oracle.com/docs/cd/E11882_01/server.112/e17118/statements_6010.htm#SQLRF01310Oracle recommends that you use the Database Resource Manager rather than this SQL statement to establish resource limits. The Database Re…

刚发布!2020年AI人才发展报告,这三个暗示程序员一定要知道!

最近&#xff0c;程序员届有一个重大好消息&#xff0c;可能很多人还不知道&#xff0c;那就是&#xff1a;国内某些城市已经开始程序员人才补贴了&#xff01;对于人工智能公司的项目开发、人才引进、科技研发&#xff0c;最高按照国拨经费的30%给予配套支持&#xff0c;单个项…

阿里巴巴开源技术汇总:115个软件(一)

阿里巴巴开源技术汇总&#xff1a;115个软件 摘要&#xff1a; 云栖社区近期策划了多期和开源产品相关的内容&#xff0c;如GitHub最流行的开源机器学习、大数据等项目&#xff0c;揭秘阿里Weex项目&#xff0c;Hilo开源分析等。深入挖掘&#xff0c;发现开源中国已经收集了数年…

Globalization Resources

http://blog.csdn.net/etmonitor/.NET系统学习----Globalization & Resources l 前言l 了解资源文件l 创建资源文件l 在程序中使用资源文件l 资源文件的命名和部署l 参考前言&#xff1a;在学习如何使用.NET资源文件…

用 Python 可以实现侧脸转正脸?我也要试一下!

作者 | 李秋键责编 | Carol封图 | CSDN 下载自视觉中国近几年来GAN图像生成应用越来越广泛&#xff0c;其中主要得益于GAN 在博弈下不断提高建模能力&#xff0c;最终实现以假乱真的图像生成。GAN 由两个神经网络组成&#xff0c;一个生成器和一个判别器组成&#xff0c;其中生…

Hive SQL 监控系统 - Hive Falcon

1.概述 在开发工作当中&#xff0c;提交 Hadoop 任务&#xff0c;任务的运行详情&#xff0c;这是我们所关心的&#xff0c;当业务并不复杂的时候&#xff0c;我们可以使用 Hadoop 提供的命令工具去管理 YARN 中的任务。在编写 Hive SQL 的时候&#xff0c;需要在 Hive 终端&am…