Deep into Python

Overview

Why is Python slow?

  1. Dynamically Typed (interpreted lang) vs Statically Typed (compiled lang)
    • All Python data are PyObject.
    • To interpret data type, Python should check PyObject_HEAD and set its type by its method.
  2. Compiled language has a powerful compile optimization
    • PyObject may lead to inefficient memory access
    • List type of Python doesn’t use contiguous memory, but each object may scatter across the memory.
    • NumPy, however, uses a continguous allocation of memory for efficient memory use, smiliar to C.

How does Python manage memory?

  1. Python uses ‘reference counting’ to manage memeory. When Python Object is refered by others, refcount get increased. When refcount = 0, Python deallocates memory for that object.

  2. This mechanism has a problem of ‘Self reference (reference cycle)’, meaning refcount never reaches 0 when a PyObject refers to itself (self.me = self). To avoid this problem, Python uses Garbage Collector (GC). Python uses ‘generational garabage collection’ which means it manages Python objects in 3 generations: 0, 1 and 2. 0 is the youngest generation and 2 is the oldest generation. Each generation has threshold and count. When a new PyObject is created, the count of generation 0 is increased. When the count exceeds the threshold of the generation, GC works. The alive objects goes to the next generation, and the count of the next generation is incremented by one. For each generation, GC works whenever the count exceeds the threshold. This mechanism is based on ‘generational hypothesis’: young objects are much more likely to die than old objects.

  3. In Python official document it says, if developers are sure that their source code has no reference cycle issues, GC may be disabled.

Dismissing Python Garbage Collection at Instagram

The problem was that as Django application was running with multiple child processes, the size of shared memory decreased to 2/3, meaning each child process doesn’t use memory efficiently with shared memory mechanism. It finally resulted in inefficient memory use.

  1. The team investigated to find out the reason. First, they suspected refcount in Python object and Copy-on-Write mechanism in Linux. As Copy-on-Write is a process of copying pages of child processes only when it’s modified. This mechanism enables memory to be used efficiently with shared memory. Python, however, uses refcount in each PyObject, so it’s always Copy-on-Read, not Copy-on-Write. The team disabled refcount mechanism in CPython, but this approach was totally wrong. There was no way to find out the update working properly.

  2. The team searched more about Copy-on-Write (CoW) and found out CoW triggers page-fault in processes. To find out what makes CoW, they monitored page-fault event, and they found that gc.collect() method was the main reason for page fault. They thought GC threshold was too low, and it resulted in frequent object shuffling. (Here, shuffling is considered as moving alive objects to the next generation. This action results in updating next, prev pointer in linked list.) So, GC was disabled, but third party libfrary still called gc.collect(). To avoid this issue, they just set GC threshold to 0 (gc.set_threshold(0))

  3. However, this method made server restart slower: from less than 10 sec to larger than 60 sec. Even if this event was hard to be reproduced, they found out this event happened with no free memory and 100% disk usage. This case seemed to happen when Python ran final GC when destroying. The team commented out Py_Finalize in uWSGI, but this could not be the solution as clean-up was necessary for their application (atexit). They finally gave up running Python cleanup by adding atexit.register (os._exit(0)). This skipped the process of Python final cleanup when process was killed. This worked because an old process will be killed and new process will run, so cleaning up the old process was not necessary.

Finally, by removing GC, they could reduce page fault, which triggers cache miss, and this mechanism improved their memory usage by 10% by maximizing shared memory.



References
  1. https://medium.com/dmsfordsm/garbage-collection-in-python-777916fd3189
  2. http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/
  3. https://dc7303.github.io/python/2019/08/06/python-memory/
  4. https://www.memorymanagement.org/glossary/g.html#term-generational-hypothesis
  5. https://blog.winterjung.dev/2018/02/18/python-gc
  6. https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172#.koitdzt7n