DynamoDB and Plant Tracer

The Plant Tracer webapp uses AWS DynamoDB to store:

  • The user list

  • The course list

  • The enrollment join table (which users are in which courses)

  • The movie list

  • The per-frame trackpoint annotations

  • API keys and audit logs

Originally this was stored in a MySQL database. We migrated to DynamoDB for cost — most uses of Plant Tracer can fit within the DynamoDB free tier, while the cost for running MySQL is upwards of $50/month on AWS.

Each DynamoDB table is identified by an account and a table name. All table names share a common prefix controlled by the DYNAMODB_TABLE_PREFIX environment variable (e.g. demo-). For local development you can use the AWS DynamoDB local (downloadable version). We recommend using the version downloaded as a JAR file.

The canonical table definitions are in src/app/schema.py. All attribute-name constants are defined at the top of src/app/odb.py. Tables are created by odbmaint.create_tables() and dropped by odbmaint.drop_tables() — used by make make-local-demo and the test fixtures.

Table Summary

All table names below are shown without the prefix. With DYNAMODB_TABLE_PREFIX=demo- the users table is named demo-users, etc.

Table

Purpose

Partition Key

Sort Key

users

One record per registered user

user_id

unique_emails

Enforces email address uniqueness across users

email

api_keys

One record per issued API key (a user may have several)

api_key

courses

One record per course

course_id

course_users

Enrollment join table — which users are in which courses

course_id

user_id

movies

One record per uploaded movie

movie_id

movie_frames

Per-frame trackpoint annotations

movie_id

frame_number

logs

Audit log entries

log_id

Table Details

users

One record per registered user.

Attribute

Type

Description

user_id

String (PK)

Unique identifier, prefixed u for regular users, ud for admins

email

String

User’s email address (unique, enforced via unique_emails)

user_name

String

Display name; may be blank

created

Integer

Unix epoch seconds at registration

enabled

Integer (0/1)

Whether the account is active

primary_course_id

String

The course the user registered through; used as default context

primary_course_name

String

Denormalized name of the primary course

courses

List of strings

All courses the user is enrolled in

admin_for_courses

List of strings

Courses for which the user has admin privileges

api_keys

One record per issued API key. A user may hold multiple keys (e.g. after re-sending a login link). The key is sent as a cookie or POST parameter; the server validates it on every request.

Attribute

Type

Description

api_key

String (PK)

The key value, a random hex string

user_id

String

Owner of the key

first_used_at

Integer

Unix epoch seconds of first use (i.e. first login)

last_used_at

Integer

Unix epoch seconds of most recent use

enabled

Integer (0/1)

Whether the key is still valid

GSI: user_id_idx on user_id. Used by DDBO.get_user_login_times() to aggregate first/last login times across all of a user’s keys without a table scan.

courses

Attribute

Type

Description

course_id

String (PK)

Unique identifier for the course

course_name

String

Human-readable course name

course_key

String

Registration passphrase that students use to self-enroll

admins_for_course

List of strings

user_id values of all admins for this course

max_enrollment

Integer

Maximum number of students allowed to self-register (default 50)

course_users

A lightweight join table that records which users are enrolled in which courses. Each item has only the two key attributes: course_id (partition key) and user_id (sort key).

Querying course_users by course_id returns all enrolled user IDs efficiently — this is used by course_enrollments(course_id) in odb.py.

Note

delete_user() does not currently remove course_users rows for the deleted user, which can leave stale enrollment records. list_users_courses() handles this defensively by catching InvalidUser_Id and logging a warning. See issue #968 for the planned fix.

movies

One record per uploaded movie. Key attributes:

movie_id, title, description, user_id, course_id, published (0/1; defaults to 1 on creation), deleted (0/1), status, total_frames, fps, width, height, movie_data_urn (S3 URN of the MP4), movie_zipfile_urn, first_frame_urn, last_frame_tracked, research_use (0/1/None; None = not yet answered), credit_by_name (0/1/None; None = not yet answered), attribution_name, rotation (0/90/180/270 degrees).

See src/app/schema.py Movie class for the full schema and constraints.

movie_frames

Per-frame trackpoint storage. Keyed by (movie_id, frame_number).

Each record’s trackpoints attribute is a list of objects with fields x, y, label, frame_number, status, and err (all defined in the Trackpoint class in schema.py).

Data Consistency Notes

  • Email uniqueness is enforced by a transactional write to both users and unique_emails at registration time.

  • Course admin list (admins_for_course on the course record) is kept in sync with admin_for_courses on the user record by add_course_admin() and remove_course_admin().

  • Enrollment (course_users rows) is added by register_email() but is not removed by delete_user(). See issue #968.

  • Login times (first_used_at, last_used_at) live on api_keys, not users. list_users_courses() aggregates them per user via the user_id_idx GSI.

Schema and Naming Changes

Some naming changes were made for clarity or to avoid conflicts with DynamoDB’s reserved words.

Old name

New name

Reason

name

user_name

name is a DynamoDB reserved word