1.5 생물학적 μ„œμ—΄μ˜ κΈ°λ³Έ 처리

🧬 μ΄λ²ˆμ—λŠ” μ•žμ„œμ„œ λ§Œλ“  μ—¬λŸ¬ ν•¨μˆ˜λ“€μ„ 가지고 DNA μ„œμ—΄μ„ λ°›μœΌλ©΄ 기본적으둜 μ²˜λ¦¬ν•΄μ•Ό ν•  것이 무엇이 μžˆλŠ”μ§€ μ •λ¦¬ν•΄λ³΄μž.

1.5.1. 기본적 처리 μš”μ•½


🧬 λ‹€μŒ μ½”λ“œλ₯Ό μœ„ν•΄μ„œλŠ” μ•žμ„  ν¬μŠ€νŒ…μ—μ„œ κ΅¬ν˜„ν•œ ν•¨μˆ˜λ“€μ΄ sequences.py λΌλŠ” νŒŒμΌμ— λͺ¨λ‘ λ“€μ–΄κ°€ μžˆμ–΄μ•Ό ν•œλ‹€. 🧬 1. validate_dna( ) : seq μ„œμ—΄μ˜ DNA μœ νš¨μ„±μ„ 검사
🧬 2. transcription( ) : DNA μ„œμ—΄μ„ μ „μ‚¬ν•œ RNA μ„œμ—΄ 생성
🧬 3. reverse_complement( ) : μ—­μƒλ³΄μ„œμ—΄ 생성
🧬 4. gc_content( ) : DNA μ„œμ—΄μ—μ„œ GC μ—ΌκΈ°μ˜ λΉ„μœ¨ 확인
🧬 5. translate_seq( ) : DNA μ„œμ—΄μ„ μ•„λ―Έλ…Έμ‚° μ„œμ—΄λ‘œ λ²ˆμ—­
🧬 6. all_orfs_ord( ) : DNA μ„œμ—΄μ„ λ‹¨λ°±μ§ˆ μ„œμ—΄λ‘œ λ³€ν™˜

from sequences import *
seq = input("Insert DNA sequence: ")
if validate_dna(seq):
    print("Valid sequence")
    print("Transcription: ", transcription(seq))
    print("Reverse complement: ", reverse_complement(seq))
    print("GC content: ", gc_content(seq))
    print("Direct translation: ", translate_seq(seq))
    print("All proteins in ORFs(decreasing size): ", all_orfs_ord(seq))
else:
    print("DNA sequence is not valid")
>>
Insert DNA sequence: ATGGGATCGTAGTCGTACTAGCTAGCTGATGGTACTCGATAGTCTACGTAGCTAGTGGTACTGGATGGTACTCAGTAACAT  
Valid sequence  
Transcription: AUGGGAUCGUAGUCGUACUAGCUAGCUGAUGGUACUCGAUAGUCUACGUAGCUAGUGGUACUGGAUGGUACUCAGUAACAU  
Reverse complement: ATGTTACTGAGTACCATCCAGTACCACTAGCTACGTAGACTATCGAGTACCATCAGCTAGCTAGTACGACTACGATCCCAT  
GC content: 0.4567901234567901  
Direct translation: MGS_SY_LADGTR_ST_LVVLDGTQ_H  
All proteins in ORFs(decreasing size): ['MLLSTIQYH', 'MVLDSLRS', 'MGS']

1.5.2. νŒŒμΌμ„ 읽고 μ“°κΈ°


🧬 read_seq_from_file( ) : 주어진 νŒŒμΌμ„ 읽기 λͺ¨λ“œλ‘œ λΆˆλŸ¬μ™€μ„œ μ—¬λŸ¬ 쀄에 μžˆλŠ” λ‚΄μš©μ„ ν•œ μ€„λ‘œ μ½μ–΄λ“€μž„ - \n 을 μΌλ°˜κ°„κ²©μœΌλ‘œ replace ν•΄μ„œ 읽음

🧬 DNA sequence read.txt λΌλŠ” νŒŒμΌμ— 미리 μ„Έ μ€„μ˜ DNAμ„œμ—΄μ„ μž…λ ₯ν•΄ λ‘μ—ˆλ‹€.

def read_seq_from_file(filename):
    fh = open(filename, "r")
    lines = fh.readlines()
    seq=""
    for l in lines:
        seq += l.replace("\n", "")
    fh.close()
    return seq

print(read_seq_from_file('DNA sequence read.txt'))
>> ATGGGATCGTAGTCGTACTAGCTAGCTGATGGTACTCGATAGTCTACGTAGCTAGTGGTACTGGATGGTACTCAGTAACAT

DNA sequence read.txt 파일의 μ„œμ—΄μ„ μ½μ–΄μ˜¨ 것을 확인할 수 μžˆλ‹€.

🧬 write_seq_to_file( ) : 주어진 νŒŒμΌμ„ μ“°κΈ° λͺ¨λ“œλ‘œ λΆˆλŸ¬μ˜€κ±°λ‚˜ νŒŒμΌμ„ μƒμ„±ν•΄μ„œ ν…μŠ€νŠΈ νŒŒμΌμ— λ‚΄μš©μ„ μž‘μ„±

def write_seq_to_file(seq, filename):
    fh = open(filename, "w")
    fh.write(seq)
    fh.close()
    return None

write_seq_to_file("ATGGGATCGTAGTCGTACTAGCTAGCTGATGGTACTCGATAGTCTACGTAGCTAGTGGTACTGGATGGTACTCAGTAACAT", 'DNA sequence write.txt')

DNA sequence write.txt νŒŒμΌμ— μ„œμ—΄μ΄ μž…λ ₯λ˜λŠ” 것을 ν™•μΈν•˜μž.

1.5.3. DNA의 μ΅œμ’… 기본적 처리


🧬 .txt 파일의 DNA μ„œμ—΄μ„ read_seq_from_file( ) ν•¨μˆ˜λ‘œ μ½μ–΄μ˜΄
🧬 μ½μ–΄μ˜¨ DNA μ„œμ—΄μ—μ„œ μ΅œμ’…μ μœΌλ‘œ μ–»κ³ μž ν•˜λŠ” 것은 κ²°κ΅­ λ°œν˜„λ˜λŠ” λ‹¨λ°±μ§ˆμ΄κΈ° λ•Œλ¬Έμ— 이 λ‹¨λ°±μ§ˆ μ„œμ—΄μ„ νŒŒμΌμ— μž‘μ„±ν•΄μ„œ μ΅œμ’… μ²˜λ¦¬ν•¨
🧬 all_orfs_ord( ) ν•¨μˆ˜λ₯Ό ν˜ΈμΆœν•˜μ—¬ λͺ¨λ“  λ¦¬λ”©ν”„λ ˆμž„μ— λŒ€ν•΄μ„œ κ°œμ‹œμ½”λˆκ³Ό μ’…κ²°μ½”λˆμ„ κ³ λ €ν•œ λ‹¨λ°±μ§ˆλ§Œ κ°€μ Έμ˜¨λ‹€.
🧬 all_orfs_ord( ) ν•¨μˆ˜μ—μ„œ 얻은 λ‹¨λ°±μ§ˆ μ„œμ—΄μ„ write_seq_to_file( ) ν•¨μˆ˜κ°€ orf-i.txt μ΄λ¦„μœΌλ‘œ μƒμ„±ν•œ νŒŒμΌμ— μž‘μ„±ν•΄μ€Œ

from sequences import *

fname = input("Insert input filename: ")
seq = read_seq_from_file(fname)
if validate_dna(seq):
    print("Valid sequence")
    print("Transcription: ", transcription(seq))
    print("Reverse complement: ", reverse_complement(seq))
    print("GC content: ", gc_content(seq))
    print("Direct translation: ", translate_seq(seq))
    orfs = all_orfs_ord(seq)
    i = 1
    for orf in orfs:
        write_seq_to_file(orf, "orf-"+ str(i) + ".txt")
        i += 1
else:
    print("DNA sequence is not valid")
>>
Insert input filename: DNA sequence read.txt  
Valid sequence  
Transcription:  AUGGGAUCGUAGUCGUACUAGCUAGCUGAUGGUACUCGAUAGUCUACGUAGCUAGUGGUACUGGAUGGUACUCAGUAACAU  
Reverse complement:  ATGTTACTGAGTACCATCCAGTACCACTAGCTACGTAGACTATCGAGTACCATCAGCTAGCTAGTACGACTACGATCCCAT  
GC content:  0.4567901234567901  
Direct translation:  MGS_SY_LADGTR_ST_LVVLDGTQ_H

μœ„ μ‚¬μ§„μ—μ„œ 확인할 수 μžˆλ“―μ΄ orf-i.txt 파일이 μƒμ„±λ˜μ—ˆλ‹€.

각 νŒŒμΌμ— λ‹¨λ°±μ§ˆ μ„œμ—΄μ΄ ν¬κΈ°μˆœμ„œλŒ€λ‘œ μž…λ ₯된 것을 확인할 수 μžˆλ‹€.

1.5.4. μš”μ•½


🧬 μœ„μ˜ ν•¨μˆ˜λ“€μ„ ν†΅ν•΄μ„œ μš°λ¦¬κ°€ 직접 μ„œμ—΄μ„ μž…λ ₯ν•  μˆ˜λ„ μžˆμ§€λ§Œ νŠΉμ • νŒŒμΌμ— μžˆλŠ” μ„œμ—΄μ„ μ½μ–΄μ˜€λŠ” 방법도 λ°°μ› λ‹€. 특히 μœ μ „ μ„œμ—΄μ€ κ·Έ κΈΈμ΄λ‚˜ 크기가 ꡉμž₯히 크기 λ•Œλ¬Έμ— 일일이 μž…λ ₯ν•˜κΈ°λ³΄λ‹€λŠ” μ½μ–΄μ˜€λŠ” 것이 더 νŽΈν•˜λ‹€κ³  μƒκ°ν•œλ‹€.

🧬 μ΄λ ‡κ²Œ DNA μ„œμ—΄μ„ λ°›μ•„μ˜€λ©΄ μš°λ¦¬λŠ” μš°μ„  μ„œμ—΄μ˜ μœ νš¨μ„±μ„ κ²€μ‚¬ν•˜κ³  이λ₯Ό μ „μ‚¬ν•œ RNA μ„œμ—΄μ„ λ§Œλ“€μ–΄λ³Έλ‹€. 그리고 μ„œμ—΄μ˜ GC μ—ΌκΈ° λΉ„μœ¨μ„ μ•Œμ•„λ³΄κ³ , 이 μ„œμ—΄μ΄ μ£Όν˜•κ°€λ‹₯인지 λΉ„μ£Όν˜•κ°€λ‹₯인지 확인이 νž˜λ“  κ²½μš°μ—λŠ” μ—­μƒλ³΄μ„œμ—΄μ„ λ§Œλ“€μ–΄μ„œ DNA μ„œμ—΄μ„ μ•„λ―Έλ…Έμ‚° μ„œμ—΄λ‘œ λ²ˆμ—­ν•œλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ 이 μ•„λ―Έλ…Έμ‚° μ„œμ—΄μ„ λ‹¨λ°±μ§ˆ μ„œμ—΄λ‘œ λ°˜ν™˜ν•˜λ©΄ DNA μ„œμ—΄μ„ 가지고 ν•  수 μžˆλŠ” μ „μ²˜λ¦¬λŠ” μ–΄λŠμ •λ„ 끝났닀고 ν•  수 μžˆλ‹€.


πŸ’‘ Bioinformatics Algorithms(μ—μ΄μ½˜μΆœνŒ, 2020)λ₯Ό κ³΅λΆ€ν•˜κ³  개인 ν•™μŠ΅μš©μœΌλ‘œ μ •λ¦¬ν•œ μžλ£Œμž…λ‹ˆλ‹€.

Leave a comment