Efficient memory usage in Python with Zero-copy
Python has a lot of magic with memory under the hood and, in most cases, Inefficiently.
A simple task that reads a file in chunks is inefficient if you don’t take some things in the count.
Let’s say we have this code:
def read_binary():
with open("my_file.bin", "rb") as source:
content = source.read(1024 * 10000)
# Read the rest of the file in memory. This is inefficient
content_to_write = content[1024:]
with open("output.bin", "wb") as target:
target.write(content_to_write)
# Call the function
read_binary()
The reading operation allocates 10 MB, and the second operation, content_to_write
, creates a memory copy of the first 1024 elements in memory. Remember the Python magic? Here it is :)
We can use Zero-copy to improve these kinds of tasks.
Zero-copy is a method to copy data from the disk/network to the memory without passing through the CPU. So, it’s speedy and efficient.
In Python, you can use Zero-copy by using memoryview
.
Consider the above code using zero-copy:
def read_binary():
content_to_write = None
with open("my_file.bin", "rb") as source:
content = source.read(1024 * 10000)
# Zero-copy here
content_to_write = memoryview(content)[1024:]
with open("output.bin", "wb") as target:
target.write(content_to_write)
read_binary()
In this case, the reading operation allocates 10 MB and reuses it because it doesn’t copy data ton content_to_write
. Instead, it copies a reference.