Python Lists- how it is created, stored in memory, and how inbuilt methods work โ€” including internal implementation details

by lochan2014 | May 30, 2025 | Python | 0 comments

In Python, a list is a mutable, ordered collection of items. Let’s break down how it is created, stored in memory, and how inbuilt methods work โ€” including internal implementation details.


๐Ÿ”น 1. Creating a List

my_list = [1, 2, 3, 4]
  • This creates a list of 4 integers.
  • Lists can contain elements of mixed data types:
mixed = [1, 'hello', 3.14, [10, 20]]

๐Ÿ”น 2. How Python List is Stored in Memory

Python lists are implemented as dynamic arrays (not linked lists like in some languages).

โœ… Internals:

  • A list is an array of pointers (references) to objects.
  • When you create a list like [1, 2, 3], Python stores references to the integer objects, not the values directly.

Letโ€™s dig in and demystify how Python manages integer objects and where the actual โ€œinteger valueโ€ lives. ๐Ÿš€


๐ŸŽฏ The Key Idea

When you do:

my_list = [1, 2, 3]

โœ… Python doesnโ€™t store the integer values directly in the list.
โœ… Instead, it stores references (pointers) to integer objects in memory.


๐Ÿ”Ž Where are the integer objects themselves?

โœ… The integer objects (like 1, 2, 3) live in the heap memory (dynamically allocated memory managed by the Python runtime).
โœ… Each of them is an instance of the int type (in CPython, theyโ€™re PyLongObject).

โœ… The list itself is an object in memory (with its own structure), which has an array of pointers to these integer objects.


๐Ÿ’ก Visualizing it:

Letโ€™s think of it as:

my_list โ†’  [  ref1,  ref2,  ref3 ]
             |       |       |
             v       v       v
           int(1)  int(2)  int(3)
  • my_list has slots to store references.
  • Each reference points to an integer object (allocated in the heap).

โšก๏ธ So where is the integerโ€™s โ€œvalueโ€ itself?

โœ… Inside the int object in the heap, thereโ€™s an internal field (like ob_digit in CPython) that holds the actual number.

For example, in CPython:

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1]; // The actual number stored here!
};

โœ… So the actual 1, 2, 3 digits are inside the int object in memory.


๐Ÿ”ฅ Quick Summary

โœ… When you do [1, 2, 3], Python:
1๏ธโƒฃ Allocates a list object (array of references).
2๏ธโƒฃ Each slot in the list points to an integer object (int instance).
3๏ธโƒฃ Each int object stores the actual numeric value inside its own memory layout (like ob_digit).


โœ… Memory Allocation:

import sys
lst = []
print(sys.getsizeof(lst))  # โ†’ 56 bytes on 64-bit Python
lst.append(1)
print(sys.getsizeof(lst))  # โ†’ Grows when resized

Python over-allocates memory to avoid resizing every time you append. The overall strategy is based on amortized constant time for append().

๐Ÿ“Œ C Code Insight (CPython):

Internally, a list is a PyListObject:

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;
  • ob_item is a pointer to an array of pointers (each pointing to a Python object).
  • allocated is the size of the allocated memory (can be more than the actual number of items).

๐Ÿ”น 3. How Python Inbuilt List Methods Work

โœ… append()

my_list.append(10)
  • Adds an element to the end of the list.
  • Internally checks if capacity is enough; if not, resizes the array.
  • Resize typically increases size by about 1.125ร— current size.
static int
list_append(PyListObject *self, PyObject *object)
{
    if (self->ob_size == self->allocated) {
        list_resize(self, newsize);  // increase capacity
    }
    Py_INCREF(object); // Increase reference count
    self->ob_item[self->ob_size++] = object;
}

โœ… pop()

my_list.pop()
  • Removes the last element.
  • Decreases size, but does not shrink the allocated space immediately.

โœ… insert(index, value)

my_list.insert(2, 100)
  • Internally shifts elements to the right to make room at the given index.
  • O(n) time complexity because of the shifting.

โœ… remove(value)

my_list.remove(3)
  • Finds the first occurrence of the value and deletes it (O(n)).
  • Internally uses PySequence_Contains() to search and then memmove to shift items.

๐Ÿ”น 4. Example: Custom Implementation Mimicking List

Hereโ€™s a Python version of a basic dynamic array to help understand:

class MyList:
    def __init__(self):
        self.capacity = 4
        self.length = 0
        self.array = [None] * self.capacity

    def append(self, item):
        if self.length == self.capacity:
            self._resize()
        self.array[self.length] = item
        self.length += 1

    def _resize(self):
        self.capacity *= 2
        new_array = [None] * self.capacity
        for i in range(self.length):
            new_array[i] = self.array[i]
        self.array = new_array

    def __getitem__(self, index):
        if 0 <= index < self.length:
            return self.array[index]
        raise IndexError("Index out of bounds")

    def __str__(self):
        return str([self.array[i] for i in range(self.length)])

# Usage
ml = MyList()
ml.append(10)
ml.append(20)
ml.append(30)
print(ml)  # โ†’ [10, 20, 30]

๐Ÿ”น 5. Time Complexity of Common List Operations

OperationAverage Time
appendO(1)
pop (end)O(1)
insertO(n)
removeO(n)
indexO(n)
iterationO(n)

How size or length function of list work internally or implemetation code?

Great question! Let’s explore how the len() function works for a Python list internally, especially in CPython (the standard Python implementation), and what it actually does under the hood.


๐Ÿ”น 1. Using len() in Python

my_list = [1, 2, 3, 4]
print(len(my_list))  # Output: 4

Looks simple โ€” but internally it uses a very efficient method that does not iterate through the list.


๐Ÿ”น 2. How len() Works Under the Hood

When you call len(my_list), Python internally does something like:

my_list.__len__()

This is implemented via a C function called PyObject_Length() in CPython.


๐Ÿ”น 3. CPython Source Code Path

The actual C implementation can be summarized as:

โœ… len(obj) calls:

Py_ssize_t PyObject_Length(PyObject *o) {
    if (o == NULL)
        return -1;

    PyTypeObject *type = Py_TYPE(o);

    if (type->tp_as_sequence && type->tp_as_sequence->sq_length) {
        return type->tp_as_sequence->sq_length(o);
    }

    if (type->tp_as_mapping && type->tp_as_mapping->mp_length) {
        return type->tp_as_mapping->mp_length(o);
    }

    return -1;
}

โœ… For a list object, this points to:

static Py_ssize_t
list_length(PyListObject *a) {
    return Py_SIZE(a);  // Very fast โ€” just returns the size field
}

๐Ÿ”น 4. What is Py_SIZE()?

This is a macro defined in CPython:

#define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size)

So basically:

  • Each Python object has a header (PyVarObject) with a field ob_size that stores the number of elements.
  • For list, this ob_size is updated every time you append, remove, or change the list.

โœ… So calling len(my_list) is O(1) โ€” constant time.


๐Ÿ”น 5. What Does This Mean for You?

  • You never need to manually count elements.
  • len() is instantaneous, no matter how big the list is.

๐Ÿ”น 6. Replicating This in a Custom Class

If you want to mimic this in a custom class:

class MyList:
    def __init__(self):
        self.data = []
        self.length = 0

    def append(self, item):
        self.data.append(item)
        self.length += 1

    def __len__(self):
        return self.length

# Usage
ml = MyList()
ml.append(10)
ml.append(20)
print(len(ml))  # Output: 2

Excellent question! Let’s dive deep into how sorting works internally on a Python list, what algorithm it uses, and walk through pseudo-code to understand it.


๐Ÿ”น 1. How to Sort in Python

arr = [5, 2, 9, 1]
arr.sort()               # In-place sort
sorted_arr = sorted(arr) # Returns a new sorted list

Both use the same underlying C function, just with a different approach (in-place vs. copy).


๐Ÿ”น 2. What Algorithm Does Python Use to Sort?

โœ… Python uses a sorting algorithm called Timsort โ€” a hybrid algorithm derived from:

  • Merge Sort (stable)
  • Insertion Sort (efficient on small or partially sorted data)

๐Ÿ”น 3. Why Timsort?

  • Optimized for real-world data
  • Detects runs (already sorted sublists) and merges them efficiently
  • Stable: maintains relative order of equal elements
  • Time complexity:
    • Best case: O(n) (if already sorted)
    • Average/Worst case: O(n log n)

๐Ÿ”น 4. How Timsort Works (Simplified Overview)

  1. Find runs (contiguous increasing or decreasing sequences)
  2. If run is too short (< 32), extend using insertion sort
  3. Merge runs using merge sort logic while maintaining certain invariants

๐Ÿ”น 5. Timsort Pseudocode (Simplified)

function timsort(array):
    MIN_RUN = 32
    n = length(array)

    # Step 1: Sort small pieces with insertion sort
    for i = 0 to n in steps of MIN_RUN:
        insertion_sort(array, i, min(i + MIN_RUN - 1, n-1))

    # Step 2: Merge sorted runs
    size = MIN_RUN
    while size < n:
        for start = 0 to n in steps of 2*size:
            mid = start + size - 1
            end = min((start + 2*size - 1), (n-1))

            if mid < end:
                merge(array, start, mid, end)
        size *= 2

insertion_sort(arr, left, right)

for i from left+1 to right:
    key = arr[i]
    j = i - 1
    while j >= left and arr[j] > key:
        arr[j + 1] = arr[j]
        j -= 1
    arr[j + 1] = key

merge(arr, l, m, r)

  • Standard merge function from merge sort

๐Ÿ”น 6. CPython Internal File

Pythonโ€™s list sort is implemented in:

CPython/Objects/listobject.c

Main function:

list.sort() โ†’ listsort_impl()

Core sorting engine (Timsort) is in:

CPython/Objects/listobject.c โ†’ timsort.h

๐Ÿ”น 7. Python Behavior with Custom Keys

arr = ["apple", "Banana", "cherry"]
arr.sort(key=str.lower)
  • Timsort also works with a key function by precomputing a decorated version: decorated = [(key(x), x) for x in arr] decorated.sort() arr = [x for (key, x) in decorated]

๐Ÿ”น 8. Why Not Quicksort?

  • Quicksort is not stable
  • Bad worst-case performance (O(nยฒ))
  • Timsort is more predictable and optimized for Python use cases

Would you like to see a real working code implementation of Timsort in Python?

How n = length(array) happens internally for above?

Great follow-up! When you write:

n = len(array)

or in your Timsort pseudo-code:

n = length(array)

Python uses the built-in len() function to get the number of elements. Hereโ€™s how it works internally, especially in the context of CPython, the standard Python interpreter.


๐Ÿ”น 1. User Perspective

arr = [5, 2, 9, 1]
n = len(arr)  # returns 4

Seems simple โ€” but under the hood, len(arr) is not counting the items one by one. Instead, it accesses a stored size field inside the list object directly.


๐Ÿ”น 2. How len() Works Internally (CPython)

Python objects (like lists) are implemented as C structs. Lists are of type PyListObject.

Hereโ€™s the simplified C structure:

typedef struct {
    PyObject_VAR_HEAD       // contains ob_size
    PyObject **ob_item;     // pointer to the array of items
    Py_ssize_t allocated;   // capacity (allocated space)
} PyListObject;

โœ… ob_size is the actual number of elements in the list.

So len(arr) is equivalent to reading this field:

#define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size)

โœ… The len() function in C:

Py_ssize_t PyObject_Length(PyObject *o) {
    if (o == NULL)
        return -1;

    if (PyList_Check(o))
        return Py_SIZE(o);  // O(1)
    
    // for other objects, look up tp_as_sequence, etc.
}

๐Ÿ”น 3. What Happens in Timsortโ€™s n = len(array)?

In Python:

n = len(array)

Internally, it is:

  1. Calls PyObject_Length(array)
  2. For list objects: returns Py_SIZE(array) directly (O(1))

โœ… This means even in large lists with millions of elements, len() is instant โ€” it never loops through the list.


๐Ÿ”น 4. If You Create a Custom Object

You must implement the __len__() method to make len(obj) work:

class MyArray:
    def __init__(self, data):
        self.data = data

    def __len__(self):
        return len(self.data)

arr = MyArray([1, 2, 3])
print(len(arr))  # 3

Summary:

AspectHow it works
SpeedO(1), constant time
Where storedob_size field in C structure
CPython implPy_SIZE() macro (direct access)
Custom classMust define __len__()

Great catch โ€” letโ€™s explain the full path of how len() works in Python for any object, including custom classes and built-in types.


๐Ÿ”น 1. What does this line mean?

// for other objects, look up tp_as_sequence, etc.

In CPython, every object has a type (PyTypeObject), and that type defines how functions like len(), +, indexing, etc., behave.

So when you call:

len(obj)

Python does not always know how to compute the length unless:

  • The object is a built-in like list, tuple, dict (which define tp_as_sequence or tp_as_mapping)
  • OR it defines the __len__() method in a custom class

๐Ÿ”น 2. CPython Internal: PyObject_Length() (Simplified)

Hereโ€™s the actual C logic behind len():

Py_ssize_t PyObject_Length(PyObject *o) {
    PyTypeObject *type = Py_TYPE(o);

    if (type->tp_as_sequence && type->tp_as_sequence->sq_length) {
        return type->tp_as_sequence->sq_length(o);
    }
    if (type->tp_as_mapping && type->tp_as_mapping->mp_length) {
        return type->tp_as_mapping->mp_length(o);
    }

    // If __len__ is defined in Python class, use generic fallback
    return _PyObject_LengthHint(o, -1);
}

๐Ÿ”น 3. What Are tp_as_sequence and tp_as_mapping?

These are C structs in CPython that represent behavior for:

  • Sequences (like list, tuple, str)
  • Mappings (like dict)

Sequence Type Struct (Example):

typedef struct {
    lenfunc sq_length;       // len(obj)
    binaryfunc sq_concat;    // obj1 + obj2
    ssizeargfunc sq_item;    // obj[i]
    ...
} PySequenceMethods;

Mapping Type Struct (Example):

typedef struct {
    lenfunc mp_length;       // len(obj)
    binaryfunc mp_subscript; // obj[key]
    ...
} PyMappingMethods;

๐Ÿ”น 4. Custom Object with __len__()

When Python doesnโ€™t find tp_as_sequence or tp_as_mapping, it tries calling __len__() using Python-level attribute lookup:

class MyCustom:
    def __len__(self):
        return 42

x = MyCustom()
print(len(x))  # 42

CPython calls the equivalent of:

PyObject_CallMethod(o, "__len__", NULL);

Which internally resolves and calls the Python __len__ method.


๐Ÿ”น 5. Fallback Mechanism

If the object doesn’t support length (list, dict, or __len__()), Python raises:

TypeError: object of type 'X' has no len()

Summary Table:

Object TypeHow len() is computed
List/Tuple/Strtp_as_sequence->sq_length (fast, direct)
Dicttp_as_mapping->mp_length
Custom ClassCalls __len__() method
No supportRaises TypeError: object of type 'X' has no len()

new_array = [None] * self.capacity,,, what is significance of [None ] Here

Great question! Let’s break down this common Python pattern:

new_array = [None] * self.capacity

๐Ÿ”น What this line does

It creates a new list of size self.capacity, where every element is initialized to None.

Example:

capacity = 5
new_array = [None] * capacity
print(new_array)  # Output: [None, None, None, None, None]

๐Ÿ”น Significance of None

None is a special singleton in Python that means “no value” or “empty”.

Using [None] * capacity helps in preallocating space in a list-based data structure like:

  • A dynamic array
  • A manual implementation of a stack, queue, or buffer
  • Mimicking lower-level behavior like in C arrays

It acts as a placeholder, so that positions are available to fill later.


๐Ÿ”น Why not just use []?

Using [] (empty list) starts with no space:

arr = []             # length is 0
arr.append(1)        # slow if done many times

Using [None] * n is useful when:

  • You know the maximum capacity ahead of time
  • You want fast random access (like setting arr[2] = 10)
  • You are implementing array resizing logic like Pythonโ€™s own list does

๐Ÿ”น Example in Dynamic Array Implementation

Hereโ€™s a simplified dynamic array class:

class DynamicArray:
    def __init__(self):
        self.capacity = 4
        self.length = 0
        self.array = [None] * self.capacity

    def append(self, item):
        if self.length == self.capacity:
            self._resize()
        self.array[self.length] = item
        self.length += 1

    def _resize(self):
        self.capacity *= 2
        new_array = [None] * self.capacity
        for i in range(self.length):
            new_array[i] = self.array[i]
        self.array = new_array

Here, [None] * self.capacity pre-allocates space efficiently.


๐Ÿ”น Summary

PartMeaning
NonePlaceholder for “no value yet”
[None] * capacityCreates a fixed-size list of empty slots
Used inDynamic array, buffer, queue, stack, etc.
AdvantageEfficient pre-allocation and random access

self.capacity *= 2
new_array = [None] * self.capacity.. why we are increasing capacity by double even if we are adding a single list item?

Fantastic question โ€” this goes into the heart of how dynamic arrays (like Python’s list) manage memory efficiently.


๐Ÿ”น Why do we double the capacity?

Even if weโ€™re only adding one item, we double the capacity to make future appends much faster.

Without doubling:

  • Each .append() would require allocating a new list and copying all elements.
  • Thatโ€™s O(n) per append โ†’ very inefficient if you do many appends.

With doubling:

  • Resizing happens rarely.
  • Most appends take O(1) time โ€” called amortized constant time.

๐Ÿ”น Real Example

Letโ€™s simulate it:

capacity = 1
arr = [None] * capacity
length = 0

# Suppose we insert items one by one:
for i in range(10):
    if length == capacity:
        capacity *= 2
        new_arr = [None] * capacity
        # copy old elements (O(n))
        for j in range(length):
            new_arr[j] = arr[j]
        arr = new_arr
    arr[length] = i
    length += 1

Even though we added 10 items, resizing only happened logโ‚‚10 โ‰ˆ 4 times.


๐Ÿ”น Visual Explanation (Capacity Doubling)

Append #CapacityResize Needed?New Capacity
11Yes2
22Yes4
34No4
54Yes8
98Yes16

๐Ÿ”น Why not increase by 1, or 10?

StrategyProblem
Increase by 1Frequent resizing โ†’ O(nยฒ) total time
Increase by 10Wastes space if you add few elements
Double (ร—2)Balance: few resizes, good space usage

๐Ÿ”น Python List Internals (CPython)

Python lists are over-allocated behind the scenes. They follow a more nuanced strategy than just ร—2, but it’s very similar in spirit.

In CPython (source: listobject.c):

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) + newsize;

Which behaves approximately like 1.125ร— to 2ร— increase based on size.


๐Ÿ”น Summary

ConceptReason
Double capacityAvoid frequent, expensive resizing
Amortized O(1)Most appends are cheap, only occasional O(n) copies
Optimized tradeoffBetween memory usage and append performance

suppose i have to sort a python list of 6000 items how sorting will work here, will it use merge and insertion sort both together

Excellent question โ€” and yes, you’re right to think both merge sort and insertion sort are involved. Letโ€™s walk through how Python sorts a list of 6000 items internally using Timsort, which is a hybrid algorithm.


๐Ÿ”น Recap: What sorting algorithm does Python use?

Python uses Timsort, a combination of:

  • Insertion Sort for small chunks (runs)
  • Merge Sort for combining those sorted chunks

It’s optimized for real-world data that often contains partially ordered sequences.


๐Ÿ”น What happens when you sort a 6000-item list?

Hereโ€™s a step-by-step breakdown of how sorted(list_of_6000) or list.sort() works internally:


โœ… Step 1: Identify Runs

  • Timsort first scans the list to find ascending or descending runs (sequences that are already sorted).
  • These runs are sorted individually (using insertion sort if they are small โ€” typically < 64 elements).

Example: [10, 20, 30, 25, 15, 40] โ†’ two runs: [10, 20, 30] and [25, 15] (will reverse it to [15, 25])


โœ… Step 2: Use Insertion Sort for Small Runs

  • For runs smaller than 64, Timsort uses binary insertion sort (efficient for short or nearly-sorted data).
Run size โ‰ˆ 32โ€“64 โ†’ Use Insertion Sort

โœ… Step 3: Merge Runs Using Merge Sort

  • Once multiple runs are sorted individually, Timsort merges them in a smart sequence (based on stack of runs and specific rules to maintain balance).
  • Merging is stable (keeps equal elements in original order).

๐Ÿ”น So, yes โ€” for 6000 elements:

PhaseAlgorithm UsedHow Many Times?
Initial sortingInsertion SortDozens (on small runs)
Final sortingMerge SortFor merging the runs

๐Ÿ”น Visualization (Simplified for 6000 items)

  1. ๐Ÿ” Scan: Split list into ~100 runs of ~60 items
  2. โœ๏ธ Sort each run with insertion sort
  3. ๐Ÿ”„ Merge them using merge sort logic in optimal order

๐Ÿ”น Timsort: Why Hybrid?

Insertion SortMerge SortTimsort Combo
Fast on smallGood for bigBest of both
Simple logicStable mergingReal-world performance is superb
Cache friendlyDivide & conquerAvoids unnecessary comparisons

๐Ÿ”น Bonus: Want to See Pseudo-code?

Hereโ€™s a stripped-down view of Timsort for 6000 items:

def timsort(arr):
    MIN_RUN = 64
    n = len(arr)

    # Step 1: Break list into runs and sort them using insertion sort
    for start in range(0, n, MIN_RUN):
        end = min(start + MIN_RUN - 1, n - 1)
        insertion_sort(arr, start, end)

    # Step 2: Merge runs using merge sort strategy
    size = MIN_RUN
    while size < n:
        for left in range(0, n, 2 * size):
            mid = min(n - 1, left + size - 1)
            right = min((left + 2 * size - 1), (n - 1))
            merge(arr, left, mid, right)
        size *= 2

โœ… Summary

AspectExplanation
List size6000 elements
Initial sortingSplit into small runs (32โ€“64 elements), insertion sort
Merging phaseStable merge (like merge sort)
Sorting usedYes: Both insertion + merge sort used in Timsort
PerformanceO(n log n) worst case, but fast on real-world data

Would you like to run a trace on an actual Python list of 6000 items with some partially sorted regions to see how many runs it finds?

Written By

undefined

Related Posts

Submit a Comment

Your email address will not be published. Required fields are marked *