Data Structures for TAs

30 min read Python 3.10+

Lists, Tuples, and When to Use Each

Lists and tuples are Python's core sequence types. Both store ordered collections, but they serve different purposes in TA code.

Lists - Mutable Collections

Lists are your go-to for collections that change: asset queues, selected objects, file paths to process.

# A list of assets to export
export_queue = ["hero_sword", "shield_01", "potion_health"]

# Add items
export_queue.append("armor_plate")
export_queue.insert(0, "hero_body")  # insert at the beginning

# Remove items
export_queue.remove("shield_01")   # remove by value
last_item = export_queue.pop()     # remove and return the last item

# Sort
export_queue.sort()                # alphabetical (in-place)
export_queue.sort(reverse=True)    # reverse alphabetical

# Filter with a condition
all_assets = ["hero_sword", "hero_body", "env_tree_01", "env_rock_03", "hero_shield"]
hero_assets = [a for a in all_assets if a.startswith("hero_")]
print(hero_assets)  # ['hero_sword', 'hero_body', 'hero_shield']

# Slicing
first_three = all_assets[:3]
last_two = all_assets[-2:]
every_other = all_assets[::2]

print(f"Queue: {export_queue}")
print(f"Hero assets: {hero_assets}")

Tuples - Immutable Collections

Tuples are for data that should not change after creation: coordinates, color values, version info.

# Tuples for fixed data
position = (3.14, 0.0, -7.5)      # XYZ position
color_red = (1.0, 0.0, 0.0, 1.0)  # RGBA color
resolution = (1920, 1080)          # width, height
version = (2, 3, 1)                # major, minor, patch

# Unpacking - extract values into named variables
x, y, z = position
width, height = resolution
major, minor, patch = version

print(f"Position: x={x}, y={y}, z={z}")
print(f"Resolution: {width}x{height}")
print(f"Version: {major}.{minor}.{patch}")

# Tuples as dict keys (lists can't be dict keys!)
# Useful for caching results by coordinates
texture_cache = {}
texture_cache[(512, 512)] = "low_res_atlas.png"
texture_cache[(2048, 2048)] = "high_res_atlas.png"
texture_cache[(4096, 4096)] = "ultra_res_atlas.png"

# Return multiple values from functions (implicitly a tuple)
def get_bounding_box(vertices):
    """Calculate axis-aligned bounding box from vertex positions."""
    xs = [v[0] for v in vertices]
    ys = [v[1] for v in vertices]
    zs = [v[2] for v in vertices]
    min_point = (min(xs), min(ys), min(zs))
    max_point = (max(xs), max(ys), max(zs))
    return min_point, max_point

verts = [(0, 0, 0), (1, 2, 3), (-1, 5, 2)]
bbox_min, bbox_max = get_bounding_box(verts)
print(f"Bounding box: {bbox_min} to {bbox_max}")

When to Use Which

List when the collection will change (add/remove items, reorder). Tuple when the data is fixed (coordinates, colors, return values). Tuples are also slightly faster and use less memory - they're Python's way of saying "this data won't change."

Dictionaries for Asset Metadata

Dictionaries are arguably the most important data structure for TAs. They map keys to values and are perfect for representing asset metadata, settings, and any structured data.

# Asset metadata as a dictionary
asset = {
    "name": "hero_sword",
    "type": "weapon",
    "poly_count": 15432,
    "lods": [15432, 7200, 2100],
    "materials": ["sword_blade", "sword_handle", "sword_gem"],
    "bounding_box": {
        "min": (-0.3, 0.0, -0.1),
        "max": (0.3, 1.2, 0.1),
    },
    "export_ready": True,
}

# Access values
print(asset["name"])           # hero_sword
print(asset["lods"][0])        # 15432 (LOD0 polycount)
print(asset["bounding_box"]["max"])  # (0.3, 1.2, 0.1)

# Safe access with .get() - returns None (or a default) instead of crashing
artist = asset.get("artist", "unknown")
print(f"Artist: {artist}")  # Artist: unknown

# Update / add fields
asset["artist"] = "Jane"
asset["version"] = 3
asset["export_ready"] = False

# Check membership
if "materials" in asset:
    print(f"Materials: {', '.join(asset['materials'])}")

# Iterate
print("\nAll metadata:")
for key, value in asset.items():
    print(f"  {key}: {value}")

Nested Dictionaries - Scene Graph Example

"""Represent a simple scene hierarchy using nested dicts."""

scene = {
    "characters": {
        "hero": {
            "mesh": "hero_body_v003.fbx",
            "rig": "hero_rig_v005.ma",
            "textures": {
                "albedo": "hero_albedo_4k.png",
                "normal": "hero_normal_4k.png",
                "roughness": "hero_roughness_4k.png",
            },
        },
        "villain": {
            "mesh": "villain_body_v002.fbx",
            "rig": "villain_rig_v003.ma",
            "textures": {
                "albedo": "villain_albedo_4k.png",
                "normal": "villain_normal_4k.png",
            },
        },
    },
    "environment": {
        "terrain": {"mesh": "terrain_v001.fbx"},
        "props": {
            "barrel": {"mesh": "barrel_v002.fbx", "instances": 15},
            "crate": {"mesh": "crate_v001.fbx", "instances": 8},
        },
    },
}

def collect_all_files(data, files=None):
    """Recursively collect all file paths from a nested dict."""
    if files is None:
        files = []

    if isinstance(data, dict):
        for key, value in data.items():
            if isinstance(value, str) and ("." in value):
                files.append(value)
            else:
                collect_all_files(value, files)
    elif isinstance(data, list):
        for item in data:
            collect_all_files(item, files)

    return files

all_files = collect_all_files(scene)
print(f"Scene contains {len(all_files)} files:")
for f in sorted(all_files):
    print(f"  {f}")

Sets for Unique Collections

Sets store unique values with fast lookups. They're perfect for deduplication, membership testing, and finding differences between collections.

# Deduplicate a list of texture references
texture_refs = [
    "hero_albedo.png", "hero_normal.png", "hero_albedo.png",
    "env_ground.png", "hero_normal.png", "env_sky.hdr",
    "env_ground.png", "hero_roughness.png",
]

unique_textures = set(texture_refs)
print(f"Total references: {len(texture_refs)}")
print(f"Unique textures: {len(unique_textures)}")

# Set operations - compare what's on disk vs what's referenced
textures_on_disk = {"hero_albedo.png", "hero_normal.png", "hero_roughness.png",
                    "hero_emissive.png", "env_ground.png", "env_sky.hdr"}

textures_referenced = {"hero_albedo.png", "hero_normal.png", "hero_roughness.png",
                       "env_ground.png", "env_sky.hdr", "villain_albedo.png"}

# What's referenced but missing from disk?
missing = textures_referenced - textures_on_disk
print(f"\n Missing textures: {missing}")
# {'villain_albedo.png'}

# What's on disk but not referenced (orphaned)?
orphaned = textures_on_disk - textures_referenced
print(f"  Orphaned textures: {orphaned}")
# {'hero_emissive.png'}

# What do both sets share?
in_use = textures_on_disk & textures_referenced
print(f" In use: {in_use}")

# Everything combined
all_textures = textures_on_disk | textures_referenced
print(f" All known textures: {len(all_textures)}")

Sets Are Fast

Checking if an item is in a set is O(1) - constant time, no matter how large the set. Checking if an item is in a list is O(n). For large asset collections, this difference is massive. Always use a set when you only care about membership.

List Comprehensions and Generators

Comprehensions are Python's concise syntax for creating new collections from existing ones. They're everywhere in TA code.

import os

# List comprehension basics
# [expression for item in iterable if condition]

# Get all FBX files in a directory
all_files = os.listdir(".")
fbx_files = [f for f in all_files if f.endswith(".fbx")]

# Transform asset names
raw_names = ["Hero_Sword", "Hero Shield", "POTION_health"]
clean_names = [name.lower().replace(" ", "_") for name in raw_names]
print(clean_names)  # ['hero_sword', 'hero_shield', 'potion_health']

# Build export paths
assets = ["hero_sword", "shield_01", "potion_health"]
export_paths = [f"/export/{asset}_v001.fbx" for asset in assets]

# Dict comprehension - build a lookup from a list
poly_data = [("hero_sword", 15432), ("shield_01", 8200), ("potion", 3100)]
poly_lookup = {name: count for name, count in poly_data}
print(poly_lookup["hero_sword"])  # 15432

# Set comprehension - unique extensions in a directory
extensions = {os.path.splitext(f)[1].lower() for f in all_files if os.path.isfile(f)}
print(f"File types: {extensions}")

# Nested comprehension - flatten LOD lists
assets_with_lods = {
    "hero_sword": [15432, 7200, 2100],
    "shield_01": [8200, 4000, 1200],
}
all_poly_counts = [count for lods in assets_with_lods.values() for count in lods]
print(f"All LOD poly counts: {all_poly_counts}")

Generators - Memory-Efficient Processing

import os
from pathlib import Path

def find_textures(root_dir, extensions=None):
    """Generator that yields texture file paths one at a time.

    Unlike returning a list, this never loads all paths into memory.
    Essential when scanning directories with millions of files.
    """
    if extensions is None:
        extensions = {".png", ".jpg", ".jpeg", ".tga", ".exr", ".tiff"}

    for dirpath, dirnames, filenames in os.walk(root_dir):
        for filename in filenames:
            if Path(filename).suffix.lower() in extensions:
                yield os.path.join(dirpath, filename)

# Usage - process one file at a time (low memory)
texture_count = 0
total_size = 0

for texture_path in find_textures("/projects/my_game/assets"):
    texture_count += 1
    total_size += os.path.getsize(texture_path)

print(f"Found {texture_count} textures, total {total_size / (1024**3):.1f} GB")

# Generator expressions (like comprehensions, but lazy)
# Use () instead of []
sizes = (os.path.getsize(p) for p in find_textures("/projects/my_game/assets"))
total = sum(sizes)  # processes one at a time, never builds a full list

Generators Are One-Shot

A generator can only be iterated once. After that, it's exhausted. If you need to iterate multiple times, either convert to a list with list() or call the generator function again.

Named Tuples and Dataclasses

When a plain dict isn't structured enough and a full class is overkill, named tuples and dataclasses hit the sweet spot.

Named Tuples - Lightweight Read-Only Records

from typing import NamedTuple

class Vector3(NamedTuple):
    """A 3D vector with named components."""
    x: float
    y: float
    z: float

class TextureInfo(NamedTuple):
    """Metadata for a texture file."""
    name: str
    width: int
    height: int
    channels: int = 3  # default value
    format: str = "png"

# Create instances
pos = Vector3(3.14, 0.0, -7.5)
normal = Vector3(0.0, 1.0, 0.0)

# Access by name OR by index
print(f"X: {pos.x}, Y: {pos.y}, Z: {pos.z}")
print(f"First component: {pos[0]}")

# Unpack just like regular tuples
x, y, z = pos

# Use in calculations
def dot_product(a: Vector3, b: Vector3) -> float:
    return a.x * b.x + a.y * b.y + a.z * b.z

print(f"Dot product: {dot_product(pos, normal)}")  # 0.0

# TextureInfo with defaults
tex = TextureInfo("hero_albedo", 4096, 4096)
print(f"{tex.name}: {tex.width}x{tex.height}, {tex.channels}ch, {tex.format}")
# hero_albedo: 4096x4096, 3ch, png

Dataclasses - Mutable Structured Data

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Asset:
    """Represents a game asset with full metadata."""
    name: str
    asset_type: str
    poly_count: int
    lods: list[int] = field(default_factory=list)
    textures: list[str] = field(default_factory=list)
    tags: set[str] = field(default_factory=set)
    artist: Optional[str] = None
    export_ready: bool = False

    @property
    def lod_count(self) -> int:
        return len(self.lods)

    @property
    def total_polys(self) -> int:
        return sum(self.lods) if self.lods else self.poly_count

    def validate(self) -> list[str]:
        """Return a list of validation errors (empty = valid)."""
        errors = []
        if not self.name:
            errors.append("Asset name is required")
        if self.poly_count <= 0:
            errors.append("Poly count must be positive")
        if not self.textures:
            errors.append("At least one texture is required")
        if self.name != self.name.lower():
            errors.append("Asset name must be lowercase")
        return errors

# Create an asset
sword = Asset(
    name="hero_sword",
    asset_type="weapon",
    poly_count=15432,
    lods=[15432, 7200, 2100],
    textures=["hero_sword_albedo.png", "hero_sword_normal.png"],
    tags={"weapon", "melee", "hero"},
    artist="Jane",
    export_ready=True,
)

print(sword)
print(f"LODs: {sword.lod_count}")
print(f"Total polys across LODs: {sword.total_polys:,}")

# Validate
errors = sword.validate()
if errors:
    print(f"Validation errors: {errors}")
else:
    print(" Asset is valid")

# Mutable - update fields directly
sword.version = 4
sword.export_ready = False

NamedTuple vs Dataclass

NamedTuple: immutable, lightweight, can be used as dict keys. Best for fixed records like coordinates, color values, and return types. Dataclass: mutable, supports methods and properties, more like a traditional class. Best for entities that have behavior: assets, scene objects, tool configurations.

Practical Example: Building an Asset Registry

Let's combine everything into a real, production-style tool: an asset registry that tracks, queries, and reports on all assets in a project.

"""asset_registry.py - A complete asset registry for a game project.

Demonstrates: dataclasses, dicts, sets, list comprehensions,
generators, file I/O, and JSON serialization.
"""
import json
import logging
from dataclasses import dataclass, field, asdict
from typing import Optional
from pathlib import Path

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
log = logging.getLogger(__name__)

@dataclass
class Asset:
    """A single asset in the registry."""
    name: str
    asset_type: str
    poly_count: int
    lods: list[int] = field(default_factory=list)
    textures: list[str] = field(default_factory=list)
    tags: list[str] = field(default_factory=list)
    artist: Optional[str] = None
    version: int = 1
    export_ready: bool = False

class AssetRegistry:
    """Manages a collection of assets with querying and reporting."""

    def __init__(self):
        self._assets: dict[str, Asset] = {}
        self._tag_index: dict[str, set[str]] = {}

    def register(self, asset: Asset) -> None:
        """Add or update an asset in the registry."""
        self._assets[asset.name] = asset

        # Update tag index for fast lookups
        for tag in asset.tags:
            if tag not in self._tag_index:
                self._tag_index[tag] = set()
            self._tag_index[tag].add(asset.name)

        log.info(f"Registered: {asset.name} (v{asset.version})")

    def get(self, name: str) -> Optional[Asset]:
        """Get an asset by name."""
        return self._assets.get(name)

    def find_by_tag(self, tag: str) -> list[Asset]:
        """Find all assets with a given tag."""
        names = self._tag_index.get(tag, set())
        return [self._assets[n] for n in names if n in self._assets]

    def find_by_type(self, asset_type: str) -> list[Asset]:
        """Find all assets of a given type."""
        return [a for a in self._assets.values() if a.asset_type == asset_type]

    def find_oversized(self, max_polys: int) -> list[Asset]:
        """Find assets that exceed a polycount budget."""
        return [a for a in self._assets.values() if a.poly_count > max_polys]

    def all_textures(self) -> set[str]:
        """Get the set of all unique textures across all assets."""
        return {tex for asset in self._assets.values() for tex in asset.textures}

    @property
    def total_poly_count(self) -> int:
        """Total polycounts of all LOD0 assets."""
        return sum(a.poly_count for a in self._assets.values())

    def generate_report(self) -> str:
        """Generate a text summary report."""
        lines = [
            "Asset Registry Report",
            "=" * 60,
            f"Total assets: {len(self._assets)}",
            f"Total polys (LOD0): {self.total_poly_count:,}",
            f"Unique textures: {len(self.all_textures())}",
            f"Export-ready: {sum(1 for a in self._assets.values() if a.export_ready)}",
            "",
            f"{'Name':<25s} {'Type':<12s} {'Polys':>8s} {'LODs':>4s} {'Ready':>5s}",
            "-" * 60,
        ]

        for asset in sorted(self._assets.values(), key=lambda a: a.name):
            ready = "" if asset.export_ready else ""
            lines.append(
                f"{asset.name:<25s} {asset.asset_type:<12s} "
                f"{asset.poly_count:>8,} {len(asset.lods):>4d} {ready:>5s}"
            )

        # Type breakdown
        types = {}
        for a in self._assets.values():
            types[a.asset_type] = types.get(a.asset_type, 0) + 1

        lines.append("")
        lines.append("By type:")
        for asset_type, count in sorted(types.items()):
            lines.append(f"  {asset_type}: {count}")

        return "\n".join(lines)

    def save(self, filepath: str) -> None:
        """Save the registry to a JSON file."""
        data = {
            "assets": [asdict(a) for a in self._assets.values()]
        }
        Path(filepath).write_text(json.dumps(data, indent=2), encoding="utf-8")
        log.info(f"Registry saved to {filepath}")

    def load(self, filepath: str) -> None:
        """Load the registry from a JSON file."""
        raw = json.loads(Path(filepath).read_text(encoding="utf-8"))
        for item in raw["assets"]:
            asset = Asset(**item)
            self.register(asset)
        log.info(f"Loaded {len(raw['assets'])} assets from {filepath}")

# --- Usage ---
if __name__ == "__main__":
    registry = AssetRegistry()

    # Register some assets
    registry.register(Asset(
        name="hero_sword",
        asset_type="weapon",
        poly_count=15432,
        lods=[15432, 7200, 2100],
        textures=["hero_sword_albedo.png", "hero_sword_normal.png",
                   "hero_sword_roughness.png"],
        tags=["weapon", "melee", "hero"],
        artist="Jane",
        version=3,
        export_ready=True,
    ))

    registry.register(Asset(
        name="health_potion",
        asset_type="consumable",
        poly_count=3200,
        lods=[3200, 800],
        textures=["health_potion_albedo.png", "health_potion_emissive.png"],
        tags=["consumable", "item", "glowing"],
        artist="Mike",
        version=2,
        export_ready=True,
    ))

    registry.register(Asset(
        name="villain_armor",
        asset_type="armor",
        poly_count=52000,
        lods=[52000, 25000, 8000, 2500],
        textures=["villain_armor_albedo.png", "villain_armor_normal.png",
                   "villain_armor_roughness.png", "villain_armor_detail.png"],
        tags=["armor", "villain", "boss"],
        artist="Jane",
        version=1,
        export_ready=False,
    ))

    registry.register(Asset(
        name="env_barrel",
        asset_type="prop",
        poly_count=1200,
        lods=[1200, 400],
        textures=["env_barrel_albedo.png"],
        tags=["prop", "environment", "destructible"],
        artist="Alex",
        version=5,
        export_ready=True,
    ))

    # Print the full report
    print(registry.generate_report())

    # Queries
    print("\n--- Weapons ---")
    for a in registry.find_by_type("weapon"):
        print(f"  {a.name}: {a.poly_count:,} polys")

    print("\n--- Tagged 'hero' ---")
    for a in registry.find_by_tag("hero"):
        print(f"  {a.name}")

    print("\n--- Oversized (>20k polys) ---")
    for a in registry.find_oversized(20000):
        print(f"   {a.name}: {a.poly_count:,} polys")

    print(f"\n--- All textures ({len(registry.all_textures())}) ---")
    for tex in sorted(registry.all_textures()):
        print(f"  {tex}")

    # Save and reload
    registry.save("asset_registry.json")

    new_registry = AssetRegistry()
    new_registry.load("asset_registry.json")
    print(f"\nReloaded {len(new_registry._assets)} assets from disk.")

From Script to Production Tool

This registry pattern is the foundation of real production asset management systems. Extend it with a database backend (SQLite is great for single-user tools), add a PySide2 UI, and integrate it with your DCC tool's export pipeline - and you've built a genuinely useful studio tool.

Next Steps

You now have a solid grasp of Python's core data structures and how they apply to technical art. Here's where to continue:

Maya Scripting - Use these data structures to manage scene data, rigs, and materials inside Maya.
Houdini - Apply Python data handling to procedural workflows and HDA parameter management.
Projects - Put everything together with hands-on projects that exercise all these skills.

Challenge: Extend the Registry

Try adding these features to the asset registry: (1) Search by artist name, (2) a delete method that also cleans up the tag index, (3) a CSV export option, and (4) duplicate detection based on texture overlap between assets. These are real features that production systems need.