python 中实现切除fastq文件序列的前后若干碱基
作者:互联网
001、
root@PC1:/home/test# ls a.fastq test.py root@PC1:/home/test# cat test.py ## 测试程序 #!/usr/bin/python in_file = open("a.fastq", "r") out_file = open("result.txt", "w") dict1 = {} idx = 0 for i in in_file: idx += 1 i = i.strip() if idx % 4 == 1: key = i dict1[key] = [] elif idx % 4 == 2: dict1[key].append(i[1:-3]) ## 切除序列的第1个碱基和最后的三个碱基 else: dict1[key].append(i) for i,j in dict1.items(): out_file.write(i + "\n") for k in j: out_file.write(k + "\n") in_file.close() out_file.close() root@PC1:/home/test# cat a.fastq ## 测试fastq文件 @DJB775P1:248:D0MDGACXX:7:1202:12362:49613 TGCTTACTCTGCGTTGATACCACTGCTTAGATCGGAAGAGCACACGTCTGAA + JJJJJIIJJJJJJHIHHHGHFFFFFFCEEEEEDBD?DDDDDDBDDDABDDCA @DJB775P1:248:D0MDGACXX:7:1202:12782:49716 CTCTGCGTTGATACCACTGCTTACTCTGCGTTGATACCACTGCTTAGATCGG + IIIIIIIIIIIIIIIHHHHHHFFFFFFEECCCCBCECCCCCCCCCCCCCCCC root@PC1:/home/test# python test.py ## 执行程序 root@PC1:/home/test# ls a.fastq result.txt test.py root@PC1:/home/test# cat result.txt ## 程序运行结果 @DJB775P1:248:D0MDGACXX:7:1202:12362:49613 GCTTACTCTGCGTTGATACCACTGCTTAGATCGGAAGAGCACACGTCT + JJJJJIIJJJJJJHIHHHGHFFFFFFCEEEEEDBD?DDDDDDBDDDABDDCA @DJB775P1:248:D0MDGACXX:7:1202:12782:49716 TCTGCGTTGATACCACTGCTTACTCTGCGTTGATACCACTGCTTAGAT + IIIIIIIIIIIIIIIHHHHHHFFFFFFEECCCCBCECCCCCCCCCCCCCCCC
参考:https://www.jianshu.com/p/5ee54bea4cb0
标签:PC1,python,fastq,碱基,file,test,home,root 来源: https://www.cnblogs.com/liujiaxin2018/p/16588571.html