# Split String at First Occurrence of an Integer using R(使用R在第一次出现整数时拆分字符串)-r

## Split String at First Occurrence of an Integer using R(使用R在第一次出现整数时拆分字符串)

Note I have already read Split string at first occurrence of an integer in a string however my request is different because I would like to use R.

Suppose I have the following example data frame:

``````> df = data.frame(name_and_address =
c("Mr. Smith12 Some street",
"Mr. Jones345 Another street",
"Mr. Anderson6 A different street"))
> df
1          Mr. Smith12 Some street
2      Mr. Jones345 Another street
3 Mr. Anderson6 A different street
``````

I would like to split the string at the first occurrence of an integer. Notice that the integers are of varying length.

The desired output can be like the following:

``````[[1]]
[1] "Mr. Smith"
[2] "12 Some street",

[[2]]
[1] "Mr. Jones"
[2] "345 Another street",

[[3]]
[1] "Mr. Anderson"
[2] "6 A different street"
``````

I have tried the following but I can not get the regular expression correct:

``````# Attempt 1 (Does not work)
library(data.table)
tstrsplit(df,'(?=\\d+)', perl=TRUE, type.convert=TRUE)

# Attempt 2 (Does not work)
library(stringr)
str_split(fha_ltc, "\\d+")
``````

### Solution:

You can use tidyr::extract:

``````library(tidyr)
df <- df %>%
## => df
## 1    Mr. Smith       12 Some street
## 2    Mr. Jones   345 Another street
## 3 Mr. Anderson 6 A different street
``````

The (\D*)(\d.*) regex matches the following:

• (\D*) – Group 1: any zero or more non-digit chars
• (\d.*) – Group 2: a digit and then any zero or more chars as many as possible.

Another solution with stringr::str_split is also possible:

``````str_split(df\$name_and_address, "(?=\\d)", n=2)
## => [[1]]
## [1] "Mr. Smith"      "12 Some street"

## [[2]]
## [1] "Mr. Jones"          "345 Another street"

## [[3]]
## [1] "Mr. Anderson"         "6 A different street"
``````

The (?=\d) positive lookahead finds a location before a digit, and n=2 tells stringr::str_split to only split into 2 chunks max.

————————

``````> df = data.frame(name_and_address =
c("Mr. Smith12 Some street",
"Mr. Jones345 Another street",
"Mr. Anderson6 A different street"))
> df
1          Mr. Smith12 Some street
2      Mr. Jones345 Another street
3 Mr. Anderson6 A different street
``````

``````[[1]]
[1] "Mr. Smith"
[2] "12 Some street",

[[2]]
[1] "Mr. Jones"
[2] "345 Another street",

[[3]]
[1] "Mr. Anderson"
[2] "6 A different street"
``````

``````# Attempt 1 (Does not work)
library(data.table)
tstrsplit(df,'(?=\\d+)', perl=TRUE, type.convert=TRUE)

# Attempt 2 (Does not work)
library(stringr)
str_split(fha_ltc, "\\d+")
``````

### 解决方法:

``````library(tidyr)
df <- df %>%
## => df
## 1    Mr. Smith       12 Some street
## 2    Mr. Jones   345 Another street
## 3 Mr. Anderson 6 A different street
``````

（\D*）（\D*）正则表达式与以下内容匹配：

• （\D*）-第1组：任何零个或更多非数字字符
• （\d.*）第2组：一个数字，然后是尽可能多的零个或多个字符。

stringr:：str_split的另一个解决方案也是可能的：

``````str_split(df\$name_and_address, "(?=\\d)", n=2)
## => [[1]]
## [1] "Mr. Smith"      "12 Some street"

## [[2]]
## [1] "Mr. Jones"          "345 Another street"

## [[3]]
## [1] "Mr. Anderson"         "6 A different street"
``````

（=\d）正向前瞻在一个数字之前找到一个位置，n=2告诉stringr:：str_split最多只能拆分为两个块。