1.2 전사와 역상보

1.2.1.DNA μ„œμ—΄μ΄ μœ νš¨ν•œμ§€ 체크


🧬 μ•žμ„  ν¬μŠ€νŒ…μ—λ„ λ‚˜μ˜¨ λ‚΄μš©μ΄μ§€λ§Œ 계속 μ“°μ΄λŠ” ν•¨μˆ˜μ΄κΈ° λ•Œλ¬Έμ— 적어두겠닀.

def validate_dna(dna_seq):
    seqm = dna_seq.upper()
    valid = seqm.count("A") + seqm.count("T") + seqm.count("G") + seqm.count("C")
    if valid == len(seqm): return True
    else: return False

1.2.2.μž…λ ₯ν•œ DNA μ„œμ—΄μ„ μ „μ‚¬ν•œ RNA μ„œμ—΄μ„ λ§Œλ“œλŠ” ν•¨μˆ˜


🧬 transcription() ν•¨μˆ˜ 생성
🧬 assert ꡬ문으둜 validate_dna() ν•¨μˆ˜λ₯Ό λΆˆλŸ¬μ™€ μ„œμ—΄μ˜ μœ νš¨μ„± 확인 : μœ νš¨ν•˜μ§€ μ•ŠμœΌλ©΄ Invalid DNA sequence λ°˜ν™˜
🧬 Tλ₯Ό U둜 replace ν•΄μ„œ μ „μ‚¬ν•œ RNA μ„œμ—΄ 생성

def transcription(dna_seq):
    assert validate_dna(dna_seq), "Invalid DNA sequence"
    return dna_seq.upper().replace("T","U")

print(transcription("ATGGGATCGTAGTCGTACTAGCTAGCTGATGGTACTCGATAGTCTACGTAGCTAGTGGTACTGGATGGTACTCAGTAACAT"))
>> AUGGGAUCGUAGUCGUACUAGCUAGCUGAUGGUACUCGAUAGUCUACGUAGCUAGUGGUACUGGAUGGUACUCAGUAACAU

1.2.3.DNA μ„œμ—΄μ˜ μ—­μƒλ³΄μ„œμ—΄


🧬 reverse_complement() ν•¨μˆ˜ 생성
🧬 assert ꡬ문으둜 validate_dna() ν•¨μˆ˜λ₯Ό λΆˆλŸ¬μ™€ μ„œμ—΄μ˜ μœ νš¨μ„± 확인 : μœ νš¨ν•˜μ§€ μ•ŠμœΌλ©΄ Invalid DNA sequence λ°˜ν™˜
🧬 compλΌλŠ” 빈 λ¬Έμžμ—΄μ— 상보적인 μ„œμ—΄ λŒ€μž… : 역상보 μ„œμ—΄μ΄λ―€λ‘œ μƒˆλ‘œ λ§Œλ“€μ–΄μ§€λŠ” 상보 μ—ΌκΈ°κ°€ 기쑴의 comp μ•žμ— 뢙도둝 μ„€μ •
🧬 주어진 DNA μ„œμ—΄μ΄ μ£Όν˜•κ°€λ‹₯ ν˜Ήμ€ λΉ„μ£Όν˜•κ°€λ‹₯인지 λͺ°λΌμ„œ 두 가지 경우λ₯Ό λͺ¨λ‘ κ³ λ €ν•΄μ•Ό ν•˜λŠ” κ²½μš°μ— μ‚¬μš©

def reverse_complement(dna_seq):
    assert validate_dna(dna_seq), "Invalid DNA sequence"
    comp = ""
    for c in dna_seq.upper():
        if c == "A":
            comp = "T" + comp
        elif c == "T":
            comp = "A" + comp
        elif c == "G":
            comp = "C" + comp
        elif c == "C":
            comp = "G" + comp
    return comp

print(reverse_complement("ATGGGATCGTAGTCGTACTAGCTAGCTGATGGTACTCGATAGTCTACGTAGCTAGTGGTACTGGATGGTACTCAGTAACAT"))
>> ATGTTACTGAGTACCATCCAGTACCACTAGCTACGTAGACTATCGAGTACCATCAGCTAGCTAGTACGACTACGATCCCAT

1.3 λ²ˆμ—­

1.3.1.μ½”λˆμ„ μ•„λ―Έλ…Έμ‚°μœΌλ‘œ λ²ˆμ—­ν•˜κΈ° μœ„ν•œ ν‘œμ€€ μœ μ „ μ½”λ“œ λ”•μ…”λ„ˆλ¦¬


🧬 translate_codon() ν•¨μˆ˜ 생성
🧬 μ’…κ²°μ½”λˆμ€ _ 으둜 μ„ μ–Έ
🧬 μœ νš¨ν•˜μ§€ μ•Šμ€ μ½”λˆμ— λŒ€ν•΄μ„œλŠ” None을 λ°˜ν™˜

def translate_codon(cod):
    
    tc = {"GCT":"A", "GCC":"A", "GCG":"A",
          "TGT":"C", "TGC":"C",
          "GAT":"D", "GAC":"D",
          "GAA":"E", "GAG":"E",
          "TTT":"F", "TTC":"F",
          "GGT":"G", "GGC":"G", "GGA":"G", "GGG":"G",
          "CAT":"H", "CAC":"H",
          "ATA":"I", "ATT":"I", "ATC":"I",
          "AAA":"K", "AAG":"K",
          "TTA":"L", "TTG":"L", "CTT":"L", "CTC":"L", "CTA":"L", "CTG":"L",
          "ATG":"M", 
          "AAT":"N", "AAC":"N",
          "CCT":"P", "CCC":"P", "CCA":"P", "CCG":"P",
          "CAA":"Q", "CAG":"Q",
          "CGT":"R", "CGC":"R", "CGA":"R", "CGG":"R", "AGA":"R", "AGG":"R",
          "TCT":"S", "TCC":"S", "TCA":"S", "TCG":"S", "AGT":"S", "AGC":"S",
          "ACT":"T", "ACC":"T", "ACA":"T", "ACG":"T",
          "GTT":"V", "GTC":"V", "GTA":"V", "GTG":"V",
          "TGG":"W",
          "TAT":"Y", "TAC":"Y",
          "TAA":"_", "TAG":"_", "TGA":"_"}
          
    if cod in tc: return tc[cod]
    else: return None

1.3.2. DNA μ„œμ—΄μ„ μ•„λ―Έλ…Έμ‚° μ„œμ—΄λ‘œ λ²ˆμ—­


🧬 translate_seq() ν•¨μˆ˜ 생성
🧬 assert ꡬ문으둜 validate_dna() ν•¨μˆ˜λ₯Ό λΆˆλŸ¬μ™€ μ„œμ—΄μ˜ μœ νš¨μ„± 확인 : μœ νš¨ν•˜μ§€ μ•ŠμœΌλ©΄ Invalid DNA sequence λ°˜ν™˜
🧬 ini_pos 인수 : 처음 λ²ˆμ—­μ„ μ‹œμž‘ν•˜λŠ” μœ„μΉ˜
🧬 ini_pos λΆ€ν„° 3κ°œμ”© λŠμ–΄κ°€λ©΄μ„œ μ•„λ―Έλ…Έμ‚° μ„œμ—΄λ‘œ λ²ˆμ—­ : translate_codon() μ—μ„œ κ°€μ Έμ˜΄

β­β­μ£Όν˜•κ°€λ‹₯ / λΉ„μ£Όν˜•κ°€λ‹₯⭐⭐

πŸ’Š mRNA 의 λ²ˆμ—­μ€ 5’ - 3’ λ°©ν–₯으둜 일어남
πŸ’Š μš°λ¦¬κ°€ μ£ΌλŠ” DNA μ„œμ—΄μ΄ λΉ„μ£Όν˜•κ°€λ‹₯(5’- 3’)이면 μ΄λŠ” μ£Όν˜•κ°€λ‹₯(3’- 5’)의 mRNA μ„œμ—΄κ³Ό T/U 차이 λΉΌκ³ λŠ” λ™μΌν•˜λ―€λ‘œ λ°”λ‘œ μ£Όν˜•κ°€λ‹₯의 μ•„λ―Έλ…Έμ‚° μ„œμ—΄λ‘œ λ²ˆμ—­μ΄ κ°€λŠ₯함
πŸ’Š λ°˜λŒ€λ‘œ μ£Όν˜•κ°€λ‹₯인 κ²½μš°μ—λŠ” κ΅¬ν•˜λ €λŠ” μ•„λ―Έλ…Έμ‚° μ„œμ—΄μ€ λΉ„μ£Όν˜•κ°€λ‹₯의 μ•„λ―Έλ…Έμ‚° μ„œμ—΄κ³Ό κ°™μœΌλ―€λ‘œ μ£Όν˜•κ°€λ‹₯의 μƒλ³΄μ„œμ—΄μ„ λ²ˆμ—­ν•΄μ•Ό 함
πŸ’Š λ”°λΌμ„œ μ£Όν˜•κ°€λ‹₯인지 λΉ„μ£Όν˜•κ°€λ‹₯인지 νŒλ‹¨μ΄ μ„œμ§€ μ•ŠλŠ” κ²½μš°μ—λŠ” μ—­μƒλ³΄μ„œμ—΄μ„ μ‚¬μš©ν•˜μž.

def translate_seq(dna_seq, ini_pos = 0):
    assert validate_dna(dna_seq), "Invalid DNA sequence"
    seqm = dna_seq.upper()
    seq_aa = ""
    
    for pos in range(ini_pos, len(seqm)-2, 3):
        cod = seqm[pos:pos+3]
        seq_aa += translate_codon(cod)
    return seq_aa

print(translate_seq("ATGGGATCGTAGTCGTACTAGCTAGCTGATGGTACTCGATAGTCTACGTAGCTAGTGGTACTGGATGGTACTCAGTAACAT"))
#μ—­μƒλ³΄μ„œμ—΄
print(translate_seq(reverse_complement("ATGGGATCGTAGTCGTACTAGCTAGCTGATGGTACTCGATAGTCTACGTAGCTAGTGGTACTGGATGGTACTCAGTAACAT")))
>> 
MGS_SY_LADGTR_ST_LVVLDGTQ_H
MLLSTIQYH_LRRLSSTIS_LVRLRSH

1.3.3.주어진 아미노산을 μ•”ν˜Έν™”ν•˜κ³  μžˆλŠ” 각 μ½”λˆμ˜ λΉ„μœ¨μ„ DNAμ„œμ—΄λ‘œ ν‘œν˜„


🧬 codon_usage() ν•¨μˆ˜ 생성
🧬 assert ꡬ문으둜 validate_dna() ν•¨μˆ˜λ₯Ό λΆˆλŸ¬μ™€ μ„œμ—΄μ˜ μœ νš¨μ„± 확인 : μœ νš¨ν•˜μ§€ μ•ŠμœΌλ©΄ Invalid DNA sequence λ°˜ν™˜
🧬 주어진 아미노산을 μ•”ν˜Έν™”ν•˜λŠ” μ„œμ—΄μ΄ 있으면 λ”•μ…”λ„ˆλ¦¬μ— μΆ”κ°€
🧬 μΆ”κ°€ν•  λ•Œλ§ˆλ‹€ total에 1μ”© μΆ”κ°€ν•˜μ—¬ 주어진 μ•„λ―Έλ…Έμ‚°μ˜ 전체 개수λ₯Ό ꡬ함
🧬 ν•΄λ‹Ή 아미노산을 μ•”ν˜Έν™”ν•˜λŠ” μ„œμ—΄κ³Ό 각 μ½”λˆμ˜ λΉ„μœ¨, 전체 개수λ₯Ό λ°˜ν™˜

def codon_usage(dna_seq, aa):
    assert validate_dna(dna_seq), "Invalid DNA sequence"
    seqm = dna_seq.upper()
    dic = {}
    total = 0
    for i in range(0, len(seqm)-2, 3):
        cod = seqm[i:i+3]
        if translate_codon(cod) == aa:
            if cod in dic:
                dic[cod] += 1
            else:
                dic[cod] = 1
            total += 1
    if total > 0:
        for k in dic:
            dic[k] /= total
    return (dic, total)

print(codon_usage("atagataactcgcatagc", "S"))
>> 
({'TCG': 0.5, 'AGC': 0.5}, 2)

πŸ’‘ Bioinformatics Algorithms(μ—μ΄μ½˜μΆœνŒ, 2020)λ₯Ό κ³΅λΆ€ν•˜κ³  개인 ν•™μŠ΅μš©μœΌλ‘œ μ •λ¦¬ν•œ μžλ£Œμž…λ‹ˆλ‹€.

Leave a comment