1.6 생물학적 μ„œμ—΄μ˜ 클래슀Permalink

🧬 μ„œμ—΄μ„ μ²˜λ¦¬ν•˜λŠ” ν•¨μˆ˜λ“€λ‘œ 이루어진 MySeq 클래슀 생성

# μ½”λˆμ„ μ•„λ―Έλ…Έμ‚°μœΌλ‘œ λ²ˆμ—­ν•˜κΈ° μœ„ν•œ ν‘œμ€€ μœ μ „ μ½”λ“œ λ”•μ…”λ„ˆλ¦¬ 

def translate_codon(cod):
    
    tc = {"GCT":"A", "GCC":"A", "GCG":"A",
          "TGT":"C", "TGC":"C",
          "GAT":"D", "GAC":"D",
          "GAA":"E", "GAG":"E",
          "TTT":"F", "TTC":"F",
          "GGT":"G", "GGC":"G", "GGA":"G", "GGG":"G",
          "CAT":"H", "CAC":"H",
          "ATA":"I", "ATT":"I", "ATC":"I",
          "AAA":"K", "AAG":"K",
          "TTA":"L", "TTG":"L", "CTT":"L", "CTC":"L", "CTA":"L", "CTG":"L",
          "ATG":"M", 
          "AAT":"N", "AAC":"N",
          "CCT":"P", "CCC":"P", "CCA":"P", "CCG":"P",
          "CAA":"Q", "CAG":"Q",
          "CGT":"R", "CGC":"R", "CGA":"R", "CGG":"R", "AGA":"R", "AGG":"R",
          "TCT":"S", "TCC":"S", "TCA":"S", "TCG":"S", "AGT":"S", "AGC":"S",
          "ACT":"T", "ACC":"T", "ACA":"T", "ACG":"T",
          "GTT":"V", "GTC":"V", "GTA":"V", "GTG":"V",
          "TGG":"W",
          "TAT":"Y", "TAC":"Y",
          "TAA":"_", "TAG":"_", "TGA":"_"}
          
    if cod in tc: return tc[cod]
    else: return None
# 생물학적 μ„œμ—΄μ˜ 클래슀

class MySeq:

    # MySeq 클래슀의 μš”μ†Œ : seq, seq_type-default : DNA
    def __init__(self, seq, seq_type = "DNA"):         
        self.seq = seq.upper()
        self.seq_type = seq_type
    
    # seq의 길이λ₯Ό λ°˜ν™˜
    def __len__(self):                                           
        return len(self.seq)
    
    # seq의 n번쨰 μš”μ†Œλ₯Ό λ°˜ν™˜
    def __getitem__(self, n):                                   
        return self.seq[n]
    
    # seq을 μŠ¬λΌμ΄μ‹±
    def __getslice__(self, i, j):                               
        return self.seq[i:j]
    
    # seq λ°˜ν™˜
    def __str__(self):                                          
        return self.seq
    
    # seq의 seq_type λ°˜ν™˜
    def get_seq_biotype(self):                                      
        return self.seq_type
    
    # seq의 정보 λ°˜ν™˜ - seq, biotype
    def show_info_seq(self):                                     
        print("Sequence: " + self.seq + " biotype: " + self.seq_type)
        
    # μ„œμ—΄ μ’…λ₯˜μ— λ”°λ₯Έ ν—ˆμš©λ¬Έμž 
    def alphabet(self):
        if(self.seq_type == "DNA"): return "ACGT"
        elif (self.seq_type == "RNA"): return "ACGU"
        elif (self.seq_type == "PROTEIN"): return "ACDEFGHIKLMNPQRSTVWY"
        else: return None

    # μ„œμ—΄ 검증 
    def validate(self):
        alp = self.alphabet()                             # alphabet() λ©”μ„œλ“œλ₯Ό λ°›μŒ
        res = True
        i = 0
        while i < len(self.seq) and res:
            if self.seq[i] not in alp: res = False        # μ„œμ—΄μ΄ ν—ˆμš©λ¬Έμž 내에 μ—†λ‹€λ©΄ res = False
            else: i += 1                                  # 있으면 계속 진행
        return res
    
    # DNAμ„œμ—΄μ„ RNAμ„œμ—΄λ‘œ λ°”κΏ”μ£ΌλŠ” μ „μ‚¬ν•¨μˆ˜ 
    def transcription(self):
        if(self.seq_type == "DNA"):
            return MySeq(self.seq.replace("T","U"), "RNA")  # seq_type을 RNA둜 replace
        else:
            return None
    
    # DNAμ„œμ—΄μ˜ μ—­μƒλ³΄μ„œμ—΄μ„ κ΅¬ν•˜λŠ” ν•¨μˆ˜ 
    def reverse_comp(self):
        if(self.seq_type != "DNA"): return None
        comp = ""
        for c in self.seq:
            if (c == "A"): comp = "T" + comp
            elif (c == "T"): comp = "A" + comp
            elif (c == "C"): comp = "G" + comp
            elif (c == "G"): comp = "C" + comp
        return MySeq(comp, "DNA") 
    
  # λ‹¨λ°±μ§ˆμ„ λ§Œλ“œλŠ” λ²ˆμ—­ν•¨μˆ˜ 
    def translate(self, iniPos = 0):
        if(self.seq_type != "DNA"): return None
        seq_aa = ""
        for pos in range(iniPos, len(self.seq)-2, 3):
            cod = self.seq[pos:pos+3]
            seq_aa += translate_codon(cod)      # MySeq 클래슀의 μ™ΈλΆ€ν•¨μˆ˜ translate_codon() - 클래슀 μ™ΈλΆ€ν•¨μˆ˜ μ ‘κ·Ό κ°€λŠ₯
        return MySeq(seq_aa, "PROTEIN")         # seq_type : PROTEIN

1.6.1. μ„œμ—΄μ˜ μœ νš¨μ„± νŒλ³„Permalink


🧬 MySeq 클래슀의 validate( ) λ©”μ„œλ“œ

s1 = MySeq("ATGGGATCGTAGTCGTACTAGCTAGCTGATGGTACTCGATAGTCTACGTAGCTAGTGGTACTGGATGGTACTCAGTAACAT")
s2 = MySeq("MKVVLSVQERSVVSLL", "PROTEIN")
print(s1.validate(), s2.validate())    
>> True True

μœ νš¨ν•˜μ§€ μ•Šμ€ μ„œμ—΄μ— λŒ€ν•΄μ„œλŠ” μ•„λž˜μ™€ 같이 False 값을 λ°˜ν™˜ν•œλ‹€.

s3 = MySeq("GTYSAFADASDBASDAF")
print(s3.validate())
>> False

1.6.2. 전사 / 정보Permalink


🧬 MySeq 클래슀의 transcription( ) / show_info_seq( ) λ©”μ„œλ“œ

s1_rna = s1.transcription()
s1_rna.show_info_seq()
>> Sequence: AUGGGAUCGUAGUCGUACUAGCUAGCUGAUGGUACUCGAUAGUCUACGUAGCUAGUGGUACUGGAUGGUACUCAGUAACAU biotype: RNA

1.6.3. μ—­μƒλ³΄μ„œμ—΄Permalink


🧬 MySeq 클래슀의 reverse_comp( ) λ©”μ„œλ“œ

s1_reverse = s1.reverse_comp()
s1_reverse.show_info_seq()
>> Sequence: ATGTTACTGAGTACCATCCAGTACCACTAGCTACGTAGACTATCGAGTACCATCAGCTAGCTAGTACGACTACGATCCCAT biotype: DNA

1.6.4. λ²ˆμ—­Permalink


🧬 MySeq 클래슀의 translate( ) λ©”μ„œλ“œ

s1_prot_2 = s1.translate()
s1_prot_2.show_info_seq() 
>> Sequence: MGS_SY_LADGTR_ST_LVVLDGTQ_H biotype: PROTEIN

🧬 주어진 μ„œμ—΄μ΄ μ£Όν˜•κ°€λ‹₯인지 λΉ„μ£Όν˜•κ°€λ‹₯인지 ν™•μ‹€ν•˜μ§€ μ•ŠμœΌλ©΄ μ—­μƒλ³΄μ„œμ—΄λ„ λ²ˆμ—­ν•΄μ„œ ν™•μΈν•œλ‹€.

# μ—­μƒλ³΄μ„œμ—΄ λ²ˆμ—­
s1_prot = s1_reverse.translate()
s1_prot.show_info_seq() 
>> Sequence: MLLSTIQYH_LRRLSSTIS_LVRLRSH biotype: PROTEIN

🧬 이번 ν¬μŠ€νŒ…μœΌλ‘œ 1μž₯을 λͺ¨λ‘ λ³΅μŠ΅ν•΄λ³΄μ•˜λ‹€. 생물학적 μ„œμ—΄μ„ 기본적으둜 μ²˜λ¦¬ν•΄μ„œ 이제 또 λ‹€λ₯Έ μž‘μ—…λ“€μ„ κ±°μΉ˜κ² μ§€λ§Œ, 뭐든지 μ²˜μŒλΆ€ν„° 잘 λΌμ›Œλ†”μ•Ό λ’€μ˜ 일도 잘 ν’€λ¦°λ‹€κ³  μƒκ°ν•œλ‹€. κ±°μ°½ν•œ μ•Œκ³ λ¦¬μ¦˜λ“€μ€ μ•„λ‹ˆμ—ˆμ§€λ§Œ 생λͺ…λΏλ§Œ μ•„λ‹ˆλΌ λ‹€λ₯Έ μ˜μ—­μ—λ„ μ‚¬μš©ν•  수 μžˆλŠ” 아이디어가 μ—¬λŸ¬κ°œ λ‚˜μ˜¨ 것 κ°™λ‹€. κ³΅λΆ€ν•˜λ‹€κ°€ μ’…μ’… μ™€μ„œ 읽어봐야겠닀!!


πŸ’‘ Bioinformatics Algorithms(μ—μ΄μ½˜μΆœνŒ, 2020)λ₯Ό κ³΅λΆ€ν•˜κ³  개인 ν•™μŠ΅μš©μœΌλ‘œ μ •λ¦¬ν•œ μžλ£Œμž…λ‹ˆλ‹€.

Leave a comment