Optional / possibly-missing dict keys – codebase scan

Scanned for obj[key]-style access where the key might be missing (potential KeyError). Grouped by category. Use .get(key, default) or check key in obj before subscript when the key is optional.


Category 1: Movie dict – keys that may be absent

Source: get_movie() / get_movie_metadata() / DynamoDB movies table. Movies can be in different states (just created, uploading, processing, zip not built yet, etc.).

Location

Key

Risk

Notes

odb.py ~1486

LAST_FRAME_TRACKED

FIXED

Now uses .get(LAST_FRAME_TRACKED, None).

odb.py 1356

COURSE_ID

Low

Set at movie create; could be missing only if DB is inconsistent.

odb.py 1361

MOVIE_DATA_URN

FIXED

movie_data_urn_for_movie_id() now uses .get(MOVIE_DATA_URN).

odb.py 1367

MOVIE_ZIPFILE_URN

FIXED

movie_zipfile_urn_for_movie_id() now uses .get(MOVIE_ZIPFILE_URN).

odb_movie_data.py 108–110

MOVIE_ZIPFILE_URN, MOVIE_DATA_URN

FIXED

get_movie_data() uses .get(); no try/except needed.

odb_movie_data.py 131, 145

VERSION

FIXED

set_movie_data() uses movie.get(VERSION, 0).

odb_movie_data.py 186

MOVIE_ZIPFILE_URN

FIXED

purge_movie_zipfile() uses urn = movie.get(...) then delete_object(urn).

flask_api.py 449

MOVIE_ZIPFILE_URN

Medium

After if movie_metadata.get(MOVIE_ZIPFILE_URN, None); then we do movie_metadata[MOVIE_ZIPFILE_URN] – safe if logic is correct.

Recommendations: All high-priority movie URN/version fixes above are implemented.


Category 2: User dict – keys that may be absent

Source: get_user() / DynamoDB users table. Some attributes are set only when the user is an admin or has a primary course.

Location

Key

Risk

Notes

flask_api.py 314

'primary_course_id'

High

New user or user without primary course → KeyError when creating a movie.

odb.py 1268

PRIMARY_COURSE_ID

High

course_id = user[PRIMARY_COURSE_ID] in list_movies; can be missing.

apikey.py 176

'primary_course_id'

Medium

user_dict['primary_course_id']; may be None for some users (apikey sets it to None in one branch).

odb.py 517, 618, 959, 966, 992, 1002, 1104–1105, 1130, 1172, 1245, 1343, 1576–1577, 1593

COURSES, ADMIN_FOR_COURSES, USER_ID, etc.

Low–Medium

Schema and add-course logic usually set these; missing only on legacy or partial records.

Recommendations:

  • flask_api.api_new_movie: use user.get('primary_course_id') (or PRIMARY_COURSE_ID) and return 400 if missing (e.g. “Set a primary course before creating a movie”).

  • odb.list_movies: use user.get(PRIMARY_COURSE_ID) and handle None (e.g. no movies for that user, or fetch by another rule).

  • apikey.py: use user_dict.get('primary_course_id') and handle None where needed.


Category 3: Course dict – keys from DynamoDB

Source: get_course() / DynamoDB courses table. Schema defines expected attributes.

Location

Key

Risk

Notes

odb.py (multiple)

COURSE_ID, COURSE_NAME, ADMINS_FOR_COURSE

Low

Part of core course schema; only risky if table is corrupted or schema changed.

No change suggested unless you add stricter typing (e.g. TypedDict).


Category 4: DynamoDB / boto3 response shapes

Source: response from query(), scan(), get_item(), etc.

Location

Key

Risk

Notes

odb.py 845

response['LastEvaluatedKey']

Low

Only used when 'LastEvaluatedKey' in response (line 843).

Various

response['Items']

Low

Standard DynamoDB contract; use response.get('Items', []) if you want to be defensive.

Various

e.response['Error']['Code']

Low

boto3 ClientError contract; documented structure.

Optional hardening: use response.get('Items', []) and response.get('LastEvaluatedKey') where pagination is used.


Category 5: Item/frame from query or batch

Source: Items from response['Items'], or from get_movie_frame(), etc.

Location

Key

Risk

Notes

odb.py 734–735

movie_frame[MOVIE_ID], movie_frame[FRAME_NUMBER]

Low

From movie_frames table; primary key attributes.

odb.py 1424, 1435–1437

frame[FRAME_NUMBER], frame[MOVIE_ID], frame[FRAME_URN]

Low

Same table, expected attributes.

odb.py 506, 1201

item[API_KEY], item[USER_ID]

Low

From api_keys / users tables; key attributes.

Generally safe; only change if you add optional attributes to those tables.


Category 6: Newly created / in-memory dicts

Source: ret = {'error': False}, create_new_movie return value, etc.

Location

Key

Risk

Notes

flask_api.py 313–328

ret[MOVIE_ID], user['primary_course_id']

High (user)

ret is built in this function; user is from get_user() – use .get('primary_course_id') as above.


Summary – high‑value fixes

  1. Movie (done): movie_data_urn_for_movie_id, movie_zipfile_urn_for_movie_id, get_movie_data, set_movie_data, purge_movie_zipfile, and Lambda lambda_tracking_env.get_movie_data now use .get() for URNs and VERSION.

  2. User (pending): flask_api new-movie path – use user.get('primary_course_id') and handle None.

  3. User (pending): odb.list_movies – use user.get(PRIMARY_COURSE_ID) and handle None.

  4. User (pending): apikey – use user_dict.get('primary_course_id') where appropriate.

Optional: add a TypedDict (or similar) for movie/user so Pyright can warn on missing optional keys when you use [] instead of .get().


Pydantic types for User and Movie – scope evaluation

Current state: schema.py already defines Pydantic User and Movie models. They are used for validation on write (put_user, put_movie use User(**user) / Movie(**moviedict)). get_user and get_movie return plain dicts (DynamoDB items, with fix_movie() applied for movies in some code paths).

What “moving to Pydantic” would mean: Have get_user() and get_movie() return User and Movie instances instead of dicts, and type hints accordingly. That would give attribute access, optional-field safety, and better static checking.

Extent of change

Area

Effort

Notes

Return type of get_movie / get_user

Small

In odb.py, have get_movie return Movie(**fix_movie(item)) and get_user return User(**fix_user(item)) (see below). Handle ValidationError for bad DB data (log and re-raise or return None per policy).

fix_user

Small

There is fix_movie / fix_movie_prop_value for DynamoDB→Python types (Decimal→int, etc.). You’d add a similar fix_user (and possibly fix_course) so DynamoDB output fits the Pydantic model.

Call sites using dict subscript

None

Pydantic BaseModel supports obj['key'] via __getitem__, so existing movie[MOVIE_ID], user[USER_ID] etc. keep working.

Call sites using .get()

Medium

Models don’t have .get(key, default). There are ~15+ places that do movie.get(...) or user.get(...) (in odb.py, odb_movie_data.py, resize.py, lambda_tracking_env.py, flask_api.py, etc.). Each would need to become getattr(obj, attr, default) or a small helper (e.g. get_model_attr(m, KEY, default)), or you keep a .get()-like method on the model (e.g. a custom method or model_dump().get()).

Lambda / vendored app

Medium

Same .get() and type changes in lambda-resize and any vendored copies of app code.

Tests

Low–medium

Tests that build dicts and pass to code expecting User/Movie may need to use User(...) / Movie(...) or .model_dump() where a dict is required.

Schema optionality

Small

User currently has primary_course_id: str. For legacy users or “no course yet”, that would need to be str | None and call sites that assume “always set” would need to handle None (as in the Category 2 fixes).

Recommendation

  • Short term: Keep returning dicts from get_movie/get_user; keep validating on write with existing Pydantic models. The optional-key fixes (e.g. .get() for primary_course_id, URNs, version) are the main win and are done or documented.

  • If you move to Pydantic return types:

    1. Add fix_user() (and optionally fix_course()) for DynamoDB payloads.

    2. Change get_movie/get_user to return Movie/User, with ValidationError handling.

    3. Replace every movie.get(KEY, default) / user.get(KEY, default) with a single pattern: e.g. getattr(m, KEY, default) or a helper so Pyright and readers see one convention. Expect on the order of 20–30 call sites across app and lambda.

    4. Make User.primary_course_id optional (str \| None) if that reflects real data; then handle None in the few places that use it.

Overall: moderate effort (roughly half a day to a day of focused refactor plus tests), with the main cost being the .get() call sites and keeping lambda/vendored code in sync.