# biopython Sequence相关()-python

## biopython Sequence相关()

### 1.构建Seq()对象

``````from Bio.Seq import Seq
myseq = Seq("AGTACACTCA")
print(myseq)  #AGTACACTCA
print(type(myseq)) #<class 'Bio.Seq.Seq'>``````

注：Seq()对象与标准python字符串不同。

### 2.Seq对象支持的方法

2.1 seq()与标准python字符串均支持字符的遍历/长度计算/获取/截取/连接、.count()检索特定字符、.join()、字母大小写转换

``````for i, letter in enumerate(myseq):
print(i, letter)

print(len(myseq)) #10
print(myseq[0])   #A
print(myseq[0::2])  #ATCCC
print(myseq[::-1])  #ACTCACATGA
print(myseq + Seq("AGTAA")) #AGTACACTCAAGTAA
print(myseq.count("CA")) #2
print(myseq.lower())  #agtacactca``````

``````0 A
1 G
2 T
3 A
4 C
5 A
6 C
7 T
8 C
9 A
10
A
ATCCC
2``````

2.2 计算seq对象的GC含量

``````from Bio.SeqUtils import GC
print(GC(myseq)) #40.0``````

``````"""
biopython 1.80及之后的版本将求GC含量的函数GC改为了gc.fraction
Bio.SeqUtils.gc_fraction(seq, ambiguous='remove')

Ambiguous核苷酸指的是ATCGSW (S is G or C, and W is A or T)以外的

"""
from Bio.SeqUtils import gc_fraction
print(gc_fraction(myseq)) #0.4``````

2.3 将Seq对象转换为字符串

``````myseq1 = Seq("CACTCA")
print(str(myseq1))
print(">name\n%s\n" % myseq1)``````

``````CACTCA
>name
CACTCA``````

2.4 获取核苷酸Seq对象的互补序列

``````myseq2 = Seq("CGATAA")
print(myseq2.complement()) #GCTATT
print(myseq2.reverse_complement()) #TTATCG``````

2.5 转录

``````coding_seq = Seq("GCAATCGAT")
template_seq = coding_seq.reverse_complement()
print(template_seq)  #ATCGATTGC
messenger_seq = coding_seq.transcribe() #转录
print(messenger_seq) #GCAAUCGAU
back_messenger_seq = messenger_seq.back_transcribe() #反转录
print(back_messenger_seq) #GCAATCGAT``````
————————

### 1.构建Seq()对象

``````from Bio.Seq import Seq
myseq = Seq("AGTACACTCA")
print(myseq)  #AGTACACTCA
print(type(myseq)) #<class 'Bio.Seq.Seq'>``````

注：Seq()对象与标准python字符串不同。

### 2.Seq对象支持的方法

2.1 seq()与标准python字符串均支持字符的遍历/长度计算/获取/截取/连接、.count()检索特定字符、.join()、字母大小写转换

``````for i, letter in enumerate(myseq):
print(i, letter)

print(len(myseq)) #10
print(myseq[0])   #A
print(myseq[0::2])  #ATCCC
print(myseq[::-1])  #ACTCACATGA
print(myseq + Seq("AGTAA")) #AGTACACTCAAGTAA
print(myseq.count("CA")) #2
print(myseq.lower())  #agtacactca``````

``````0 A
1 G
2 T
3 A
4 C
5 A
6 C
7 T
8 C
9 A
10
A
ATCCC
2``````

2.2 计算seq对象的GC含量

``````from Bio.SeqUtils import GC
print(GC(myseq)) #40.0``````

``````"""
biopython 1.80及之后的版本将求GC含量的函数GC改为了gc.fraction
Bio.SeqUtils.gc_fraction(seq, ambiguous='remove')

Ambiguous核苷酸指的是ATCGSW (S is G or C, and W is A or T)以外的

"""
from Bio.SeqUtils import gc_fraction
print(gc_fraction(myseq)) #0.4``````

2.3 将Seq对象转换为字符串

``````myseq1 = Seq("CACTCA")
print(str(myseq1))
print(">name\n%s\n" % myseq1)``````

``````CACTCA
>name
CACTCA``````

2.4 获取核苷酸Seq对象的互补序列

``````myseq2 = Seq("CGATAA")
print(myseq2.complement()) #GCTATT
print(myseq2.reverse_complement()) #TTATCG``````

2.5 转录

``````coding_seq = Seq("GCAATCGAT")
template_seq = coding_seq.reverse_complement()
print(template_seq)  #ATCGATTGC
messenger_seq = coding_seq.transcribe() #转录
print(messenger_seq) #GCAAUCGAU
back_messenger_seq = messenger_seq.back_transcribe() #反转录
print(back_messenger_seq) #GCAATCGAT``````