1/1. Pandas DataFrames mutate inside functions

This post covers a common pandas behavior: DataFrames are mutable, so a function can modify the caller’s object even if you do not return it. The behavior is documented in the copy vs view guide and the DataFrame API.

Downloads at the end: go to Downloads.

At a glance

  • A DataFrame is mutable; in-place ops modify the original object.
  • The bug looks like “nothing was returned, but things changed.”
  • Fix it by copying, returning, or making mutation explicit.

Minimal repro (runnable)

Run this as a plain script. The function changes raw, even if you ignore the return value.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import pandas as pd

def clean_prices(df):
    df["price"] = df["price"].astype(float)
    df.dropna(inplace=True)
    return df

raw = pd.DataFrame({"price": ["10", None, "20"]})
_ = clean_prices(raw)

print(raw)

Expected output:

1
2
3
  price
0  10.0
2  20.0

Why it happens

DataFrame is mutable. When you pass it into a function, you pass a reference. Any in‑place operation (dropna(inplace=True), assignment to a column, rename(inplace=True)) changes that same object in memory. Pandas warns about this in the copy vs view section.

Safer patterns (pick one)

1) Return a new object

Make a copy and return the new DataFrame.

1
2
3
4
5
def clean_prices(df):
    out = df.copy()
    out["price"] = out["price"].astype(float)
    out = out.dropna()
    return out

2) Mutate on purpose

If you want in‑place behavior, make that explicit.

1
2
3
4
def clean_prices_inplace(df):
    df["price"] = df["price"].astype(float)
    df.dropna(inplace=True)
    return None

3) Make intent visible in the name

Callers should know what the function does.

1
2
def clean_prices_inplace(df):
    ...

Practical checklist

  • Avoid inplace=True unless you truly want to mutate the input.
  • If you mutate, name it *_inplace.
  • If you return a new object, copy first.

Downloads