How Python Manages Memory: A Simple Introduction

Uncover the Secrets of Python's Memory Management System

The Basics

Python manages memory using a private heap space. When a program runs, Python first allocates a small portion of memory for itself. This is where all the program's data structures are stored. Python uses references and garbage collection to manage this memory.

Whenever an object is created, Python allocates memory for it. The size of the memory block depends on the object's type. If the object is no longer needed, Python keeps the memory allocated for a short time to see if the object will be used again. If it isn't, Python frees the memory.

Python's garbage collector finds and removes objects that are no longer referenced from memory. It looks for objects that the program can't access anymore and frees up that memory. This means you don't have to manually free the memory used by objects in Python.

The Unsung Heroes of Python's Memory Management Saga

Memory allocation and deallocation in Python are handled automatically by the Python interpreter, meaning that the developer does not need to explicitly allocate or deallocate memory. This is part of the dynamic nature of Python and is designed to make programming with the language simpler and more approachable.

The garbage collector(GC) in Python is responsible for finding and removing unreferenced objects from memory. It periodically checks memory for objects that are no longer referenced by the program, and releases the memory allocated for those objects. This mechanism ensures that memory is not wasted and that performance is not affected by memory leaks. Keep in mind that whenever GC runs it pauses your current program execution.

Key Components of Memory Management

  1. Object Allocation: When you create an object, Python allocates memory for it in the heap.

  2. Memory Pools: Python uses memory pools to manage small objects efficiently. Small objects of the same size are allocated in memory blocks, allowing for quicker memory operations.

  3. Reference Counting: Python keeps track of the number of references to each object in memory. When the reference count drops to zero, the memory for the object is eligible for garbage collection.

  4. Garbage Collection: Python’s garbage collector reclaims memory that is no longer in use, freeing up memory to avoid memory leaks.

Counting 1..2..3,

Python employs reference counting as its memory management mechanism. This process enables Python to keep track of the number of references or pointers to an object. Whenever the reference count reaches zero, the object is automatically deleted by the Python interpreter, freeing up the memory used by that object.

This approach is an essential feature that allows Python to efficiently manage memory without the need for developers to worry about manual memory allocation, a common practice in other programming languages. In essence, Python's memory management system is self-sufficient, making the language a popular choice among developers.

As an example here is a code snippet written in Python illustrating reference counting in action:

import sys

a = [1, 2, 3]

print(sys.getrefcount(a))  # Output 2

# Reference count of 'a'
b = a # Create another reference to the same object

print(sys.getrefcount(a)) # Reference count increases, Output 3

del b # Remove one reference

print(sys.getrefcount(a)) # Reference count decreases, Output 2

Things ain’t that simple…

As you learned about the reference counting mechanism, you might be thinking, "Oh, it's so simple and clever for managing memory!"

However, there is one problem that cannot be solved by reference counting alone. Here is an example...

class Person:
    def __init__(self, name):
        self.name = name
        self.friend = None  # Will store reference to another Person

def show_circular_reference():
    # Create two people who are friends with each other
    alice = Person("Alice")
    bob = Person("Bob")

    # Create circular reference
    alice.friend = bob
    bob.friend = alice

    print(f"Alice's friend is: {alice.friend.name}")
    print(f"Bob's friend is: {bob.friend.name}")

    # Remove our references to the objects
    del alice
    del bob

    # These objects still exist in memory due to circular reference

This is the problem of circular references. Circular References: When two objects reference each other, they create a circular reference. In this situation, the reference count does not drop to zero, which can cause a memory leak if Python relied only on reference counting.

Python, being a well-designed language, handles this issue effectively. So, how does Python solve this mystery?

Who is alive & Who is dead?

Let me explain how Python's garbage collector handles circular references. Think of it like a detective solving a mystery - it needs to figure out which objects are truly "dead" even though they appear to be alive.

Python's garbage collector, specifically the cyclic garbage collector, uses an algorithm called "mark-and-sweep" that works in three main phases:

Phase 1: Collection First, the collector creates a list of all objects it needs to track. Python only tracks containers (like lists, dictionaries, custom objects) that could potentially create cycles. It doesn't track simple objects like integers or strings since they can't create circular references. Imagine this like making a list of all rooms in a building that might be connected to each other.

class Node:
    def __init__(self):
        self.next = None

# These objects are tracked by the collector
node1 = Node()
node2 = Node()
node1.next = node2
node2.next = node1

# This integer isn't tracked since it can't form cycles
x = 42

Phase 2: Finding Root Objects Next, the collector identifies all objects that are definitely alive - these are called "root objects." These are objects that your program can still reach directly through variables in your current scope or through the system. Think of this like finding all rooms that have doors to the outside world.

def demonstrate_roots():
    # This node is a root object - we can reach it directly
    root_node = Node()

    # These nodes form a cycle but aren't connected to any root
    orphan1 = Node()
    orphan2 = Node()
    orphan1.next = orphan2
    orphan2.next = orphan1

    # Remove our reference to the orphan nodes
    del orphan1
    del orphan2
    # Now these nodes form a cycle but can't be reached from any root

Phase 3: Mark and Sweep - This is where the real detective work happens. The collector:

  1. Marks all objects reachable from root objects as "alive"

  2. Looks at all the tracked objects it found in Phase 1

  3. Any tracked objects that weren't marked "alive" must be garbage.

Here's how this works in practice:

import gc

class Node:
    def __init__(self, name):
        self.name = name
        self.next = None

def show_collection_process():
    # Create a root node we can reach
    root = Node("root")

    # Create a cycle that will become garbage
    temp1 = Node("temp1")
    temp2 = Node("temp2")
    temp1.next = temp2
    temp2.next = temp1

    # Remove our references to the cycle
    del temp1
    del temp2

    # At this point:
    # - root is marked alive (reachable)
    # - temp1 and temp2 form a cycle but aren't marked alive

    collected = gc.collect()
    print(f"Collected {collected} objects")

show_collection_process()

Think about it like a game of tag. The garbage collector starts at all the root objects and "tags" everything it can touch. Anything that didn't get tagged must be unreachable, even if those untagged objects can reach each other.

They die at young age!

So now you understand how Python handles the case of circular references, but this is not all. The collector (gc) has some clever optimizations too.

For example:

  • It divides objects into three generations (0, 1, and 2) based on how many collections they've survived.

  • It collects younger generations more frequently than older ones.

  • It uses various flags and state bits to avoid having to rescan objects unnecessarily.

This generational approach is like having three different age groups: 0 is like a newborn baby that gets inspected frequently, 1 is like a teenager that gets checked occasionally, and 2 is an adult man that rarely needs inspection.

So objects with generation 0 are garbage collected more frequently than the other two generations.

A practical implication of this system is that while reference counting handles most memory management immediately, circular references might stick around until the next garbage collection cycle. This is why you might want to explicitly break circular references in some cases.

Here is simple code snippet to demonstrate how collection process works, you can run this code by yourself to see the magic

class CircularRef:
    def __init__(self, name):
        self.name = name
        self.ref = None

    def __repr__(self):
        return f"CircularRef({self.name})"

def watch_collection_process():
    # Disable automatic collection to see the process clearly
    gc.disable()

    # Create some circular references
    objects = []
    for i in range(3):
        obj1 = CircularRef(f"obj{i}a")
        obj2 = CircularRef(f"obj{i}b")
        obj1.ref = obj2
        obj2.ref = obj1
        objects.append((obj1, obj2))

    print("Before collection:", gc.get_count())

    # Now remove our references
    for obj1, obj2 in objects:
        print(f"Breaking reference to {obj1} and {obj2}")

    objects.clear()

    # Run collection and see what happens
    collected = gc.collect()
    print(f"Collection found {collected} objects")

    gc.enable()  # Re-enable automatic collection

watch_collection_process()