Python has a lot of magic with memory under the hood and, in most cases, Inefficiently.

A simple task that reads a file in chunks is inefficient if you don’t take some things in the count.

Let’s say we have this code:


def read_binary():

    with open("my_file.bin", "rb") as source:
        content = source.read(1024 * 10000)
        
        # Read the rest of the file in memory. This is inefficient
        content_to_write = content[1024:]

    with open("output.bin", "wb") as target:
        target.write(content_to_write)

# Call the function
read_binary()

The reading operation allocates 10 MB, and the second operation, content_to_write, creates a memory copy of the first 1024 elements in memory. Remember the Python magic? Here it is :)

We can use Zero-copy to improve these kinds of tasks.

Zero-copy is a method to copy data from the disk/network to the memory without passing through the CPU. So, it’s speedy and efficient.

In Python, you can use Zero-copy by using memoryview.

Consider the above code using zero-copy:

def read_binary():
    
    content_to_write = None
    
    with open("my_file.bin", "rb") as source:
        content = source.read(1024 * 10000)
        
        # Zero-copy here
        content_to_write = memoryview(content)[1024:]
        
    with open("output.bin", "wb") as target:
        target.write(content_to_write)

read_binary()

In this case, the reading operation allocates 10 MB and reuses it because it doesn’t copy data ton content_to_write. Instead, it copies a reference.