首页 > 其他分享> > 嵌套循环在单个文件上迭代

嵌套循环在单个文件上迭代

2019-11-19 09:59:46 作者：互联网

我想删除文件中的某些特定行.
我要删除的部分包含在两行之间(也将被删除),分别命名为STARTING_LINE和CLOSING_LINE.如果文件末尾没有结束行,则该操作应停止.

例：

...blabla...
[Start] <-- # STARTING_LINE
This is the body that I want to delete
[End] <-- # CLOSING_LINE
...blabla...

我想出了三种不同的方法来实现相同的目标(加上下面的tdelaney的回答提供了一种方法),但是我想知道哪种方法最好.请注意,我并不是在寻求主观意见：我想知道是否有某些真正的原因使我选择一种方法而不是另一种方法.

1.很多if条件(仅一个for循环)：

def delete_lines(filename):
    with open(filename, 'r+') as my_file:
        text = ''
        found_start = False
        found_end = False

        for line in my_file:
            if not found_start and line.strip() == STARTING_LINE.strip():
                found_start = True
            elif found_start and not found_end:
                if line.strip() == CLOSING_LINE.strip():
                    found_end = True
                continue
            else:
                print(line)
                text += line

        # Go to the top and write the new text
        my_file.seek(0)
        my_file.truncate()
        my_file.write(text)

2.在打开的文件上嵌套for循环：

def delete_lines(filename):
    with open(filename, 'r+') as my_file:
        text = ''
        for line in my_file:
            if line.strip() == STARTING_LINE.strip():
                # Skip lines until we reach the end of the function
                # Note: the next `for` loop iterates on the following lines, not
                # on the entire my_file (i.e. it is not starting from the first
                # line). This will allow us to avoid manually handling the
                # StopIteration exception.
                found_end = False
                for function_line in my_file:
                    if function_line.strip() == CLOSING_LINE.strip():
                        print("stop")
                        found_end = True
                        break
                if not found_end:
                    print("There is no closing line. Stopping")
                    return False
            else:
                text += line

        # Go to the top and write the new text
        my_file.seek(0)
        my_file.truncate()
        my_file.write(text)

3. while True和next()(带有StopIteration异常)

def delete_lines(filename):
    with open(filename, 'r+') as my_file:
        text = ''
        for line in my_file:
            if line.strip() == STARTING_LINE.strip():
                # Skip lines until we reach the end of the function
                while True:
                    try:
                        line = next(my_file)
                        if line.strip() == CLOSING_LINE.strip():
                            print("stop")
                            break
                    except StopIteration as ex:
                        print("There is no closing line.")
            else:
                text += line

        # Go to the top and write the new text
        my_file.seek(0)
        my_file.truncate()
        my_file.write(text)

4. itertools(来自tdelaney的回答)：

def delete_lines_iter(filename):
    with open(filename, 'r+') as wrfile:
        with open(filename, 'r') as rdfile:
            # write everything before startline
            wrfile.writelines(itertools.takewhile(lambda l: l.strip() != STARTING_LINE.strip(), rdfile))
            # drop everything before stopline.. and the stopline itself
            try:
                next(itertools.dropwhile(lambda l: l.strip() != CLOSING_LINE.strip(), rdfile))
            except StopIteration:
                pass
            # include everything after
            wrfile.writelines(rdfile)
        wrfile.truncate()

似乎这四个实现实现了相同的结果.所以…

问题：我应该使用哪一个？哪个是最Python语言的？哪一个是最有效的？

有更好的解决方案吗？

编辑：我试图使用timeit评估大文件上的方法.为了在每次迭代中使用相同的文件,我删除了每个代码的编写部分；这意味着评估主要考虑读取(和打开文件)任务.

t_if = timeit.Timer("delete_lines_if('test.txt')", "from __main__ import delete_lines_if")
t_for = timeit.Timer("delete_lines_for('test.txt')", "from __main__ import delete_lines_for")
t_while = timeit.Timer("delete_lines_while('test.txt')", "from __main__ import delete_lines_while")
t_iter = timeit.Timer("delete_lines_iter('test.txt')", "from __main__ import delete_lines_iter")

print(t_if.repeat(3, 4000))
print(t_for.repeat(3, 4000))
print(t_while.repeat(3, 4000))
print(t_iter.repeat(3, 4000))

结果：

# Using IF statements:
[13.85873354100022, 13.858520206999856, 13.851908310999988]
# Using nested FOR:
[13.22578497800032, 13.178281234999758, 13.155530822999935]
# Using while:
[13.254994718000034, 13.193942980999964, 13.20395484699975]
# Using itertools:
[10.547019549000197, 10.506679693000024, 10.512742852999963]

解决方法:

您可以使用itertools使其精美.我会对时序比较感兴趣.

import itertools
def delete_lines(filename):
    with open(filename, 'r+') as wrfile:
        with open(filename, 'r') as rdfile:
            # write everything before startline
            wrfile.writelines(itertools.takewhile(lambda l: l.strip() != STARTING_LINE.strip(), rdfile))
            # drop everything before stopline.. and the stopline itself
            next(itertools.dropwhile(lambda l: l.strip() != CLOSING_LINE.strip(), rdfile))
            # include everything after 
            wrfile.writelines(rdfile)
        wrfile.truncate()

标签：performance,while-loop,if-statement,for-loop,python
来源： https://codeday.me/bug/20191119/2035150.html