Memory Management
Memory Management
del df # does not release memory to the OS
gc.collect() # still does not release memory
- Python objects have high-water mark. It is expensive to pull memory from OS so Python interpreter reserves it for future use so in
htop
it looks like memory is being used. - So in theory when you continue working in the python process you should be able to reclaim this memory with other python objects.
- Yes but unfortunately mostly no... In practice data is fragmented and is not usable unless you close the process (close the interactive terminal)
- Often you will have a data leak or fragmentation, meaning the objects are still in memory but not accessible to you. They are scattered around and not usable.
Hacks
import os
import psutil
def usage():
process = psutil.Process(os.getpid())
return process.memory_info()[0] / float(2**20)
def huge_intermediate_calc(something):
...
huge_df = pd.DataFrame(...)
...
return some_aggregate
import multiprocessing
result = multiprocessing.Pool(1).map(huge_intermediate_calc, [something_])[0]
with multiprocessing.Pool(1) as pool:
result = pool.map(huge_intermediate_calc, [something])
# However in a ipython environment (like jupyter notebook) I found that you need to .close() and .join() or .terminate() the pool to get rid of the spawned process.
# Tested example:
def func():
df = pd.read_parquet(TRAIN_DATA_PATH)
return df
import concurrent.futures
# max_workers = number of CPUs (threads on your machine, e.g 2 threads per core with8 cores is 16)
print("Number of logical CPUs:", os.cpu_count())
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
start_usage = usage()
print('Start: ', start_usage)
result = executor.submit(func,).result()
print('End: ', usage())
# close process
executor.shutdown(wait=True)
print(usage())
"Then the function is executed at a different process. When that process completes, the OS retakes all the resources it used. Python, pandas, the garbage collector"
- no one can do anything to stop that.
Some notes and definitions of the aboe tricks
- Each process is a separate Python interpreter on your local machine (not remote machines).
- ProcessPoolExecutor will not work in the interactive interpreter docs
Tracing Python Memory
- python library to trace memory allocations tracemalloc
del and Garbage Collection
"Objects are never explicitly destroyed; however when they become unreachable hey may be garbage-collected."
1 == (1) # returns true
x and (x) usually mean the same thing
del x # this is a statement not a function
del(x) # would do the same thing
(1,) # tuple
del
deletes the reference, not the object
l1 = [1,2] # l1 reference to the object [1,2]
l2 = l1 # l2 reference to the object
del l1 # deletes reference l1
print(l2) # [1,2]
# example of a life of an object
import weakref
def bye():
print('... goodbye my lover')
ender = weakref.finalize(s1, bye)
ender.alive # True
del s1
ender.alive # True
s2 = '' # ... goodbye my lover
ender.alive # False
Objects may be deleted by the garbage collector once they become unreachable! In CPython When the reference count of an object reaches zero, the garbage collector disposes of it.
== vs is
==
compares values,is
compares if it referencing to the same object
l1 = [1,2]
l2 = l1[::] # make copy
l2 == l1 # true
l1 is l2 # false
l3 = [1,2]
l3 == l2 # true
Garbage Collector
reference-counting - when it becomes zero it is unreachable and collected.
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.
- simple assignment does not create copies - references to the same object
- function parameters are passed as aliases, which means the function may change any mutable object received as an argument. Need to make local copy to prevent this.
- Using mutable objects as default values for function parameters is dangerous because if parameters are changed in-place, the default value is changed!