Find certain colons in string using Regex(使用正则表达式查找字符串中的某些冒号)

I’m trying to search for colons in a given string so as to split the string at the colon for preprocessing based on the following conditions

  • Preceeded or followed by a word e.g A Book: Chapter 1 or A Book :Chapter 1
  • Do not match if it is part of emoticons i.e 🙁 or ): or :/ or 🙂 etc
  • Do not match if it is part of a given time i.e 16:00 etc

I’ve come up with a regex as such

(\:)(?=\w)|(?<=\w)(\:)

which satisfies conditions 2 & 3 but still fails on condition 3 as it matches the colon present in the string representation of time. How do I fix this?

edit: it has to be in a single regex statement if possible

Solution:

You can use

(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)

See the regex demo. Details:

  • (:\b|\b:) – Group 1: a : that is either preceded or followed with a word char
  • (?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b) - there should be no one or two digits right after : (followed with a word boundary) if the : is preceded with a single or two digits (preceded with a word boundary).

Note :\b is equal to :(?=\w) and \b: is equal to (?<=\w):.

If you need to get the same capturing groups as in your original pattern, replace (:\b|\b:) with (?:(:)\b|\b(:)).

————————

我试图搜索给定字符串中的冒号,以便根据以下条件在冒号处拆分字符串进行预处理

  • 在单词前面或后面,例如一本书:第一章或一本书:第一章
  • 如果它是表情符号的一部分,即:(或):或:/或:-)等,则不匹配
  • 如果是给定时间的一部分,即16:00等,则不匹配

我想出了一个这样的正则表达式

(\:)(?=\w)|(?<=\w)(\:)

它满足条件2和3,但在条件3中仍然失败,因为它与时间的字符串表示形式中的冒号匹配。我该怎么解决这个问题?

编辑:如果可能,它必须在单个正则表达式语句中

解决方法:

你可以用

(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)

请看regex演示。细节:

  • (:\b |\b:)-group1:a:前面或后面跟一个单词char
  • (?!(?:(?<=\b\d:)|(?<=\b\d{2}:)\d{1,2}\b)-如果:前面有一个或两个数字(前面有一个单词边界),那么后面不应该有一个或两个数字。

注意:\b等于:(?=\w)和\b:等于(?<=\w):。

如果需要获得与原始模式相同的捕获组,请将(:\b |\b:)替换为(?:(:)\b |\b(:)。