Zircon

April 2025

Motivation

For three years of my undergraduate career, my daily commute has eaten up nearly two hours of my day—time that could’ve been spent productively. My professors recorded their lectures, but since they weren’t downloadable, spending that much data to review didn’t feel worthwhile. Even if I wanted to use my data, the Canvas interface made it difficult since it was designed for computer screens. I had access to valuable learning resources, but they weren’t designed for the way I wanted to study.

I decided to change that. Zircon is a tool that not only enables lecture downloads but also makes them more accessible in multiple formats. It generates detailed markdown notes, provides concise summaries for quick review, and, for those who prefer a more engaging experience, overlays lectures onto “brainrot” videos—fast-paced gameplay like Minecraft or Subway Surfers. Whether someone has two hours or two minutes, this tool ensures they can absorb material in a way that fits their time and attention span.

Zircon Brainrot — Example of brainrot content using Minecraft

Overview

Users interact with Zircon through a Google Chrome extension that scans Canvas, Unite, and Kaltura pages for recorded lectures stored on the university’s media provider. When a lecture is detected, the extension adds three new buttons: HD Download, SD Download, and Zircon Analysis.

Extension Canvas — The augmented view after using Zircon

Clicking the Zircon Analysis button opens a processing page where users can submit a request to generate notes and a summary. If they prefer a more engaging format, they can also choose a video style for brainrot.

Extension Process — What users see to process a video and submit a job

Once submitted, the backend analyzes the lecture content, and if a video is selected, it processes the lecture into an attention-grabbing video. After a short while, users receive an email notifying them that their content is ready.

Email Sent — The email sent to the user after a video has processed

The backend takes care of the heavy lifting—analyzing the transcript, generating the audio, rendering the video, and sending an email once everything is ready. All users need to do is check their email on the go to access their processed lecture whenever it’s convenient.

Implementation

Zircon is a complex project—there’s no way around it. A lot of moving parts and technologies come together to make it work, and building it has been a deep dive into Golang, AWS networking, AWS processing, and microservice design patterns. Along the way, I’ve learned more than I ever expected about designing scalable, efficient systems.

There’s a lot to unpack, and the sections below break it all down. Use the table of contents to navigate and explore the areas that interest you most.

Codebase

Zircon is structured into four main components: Extension, Frontend, Backend, and Cloud.

Extension – Acts as the user’s entry point, parses the lecture hosting tools to gather information to pass to the backend.
Frontend – Provides a clean, intuitive interface for viewing generated notes and summaries.
Backend – Handles the core processing, including lecture analysis, content generation, rendering, and state management.
Cloud – Orchestrates deployment, ensuring resources can be efficiently provisioned, scaled, and maintained.

Extension

Zircon’s extension is built using pure JavaScript, CSS, and HTML. By leveraging a browser extension, Zircon can inject JavaScript and CSS directly into existing webpages, seamlessly integrating its functionality.

One of the biggest challenges in developing the extension was working within the constraints of the browser environment. Unlike traditional client-server models, extensions have limited capabilities—OAuth authentication, persistent storage, and other common backend features are either restricted or require creative workarounds.

Frontend

Zircon’s frontend is built using Next.js and MDX, following a design similar to the structure of this website. In fact, much of the code is borrowed, making this one of the more straightforward components to develop.

Backend

Zircon’s backend is built in Go and follows a microservice design pattern. It makes extensive use of the AWS SDK and AWS SAM to interact with various AWS resources and infrastructure patterns. One of the biggest challenges was ensuring the codebase remained clean, modular, and easily extendable as new features were added.

For task processing, Zircon leverages Asynq, enabling multiple worker nodes to join the processing architecture seamlessly. This ensures efficient and scalable workload distribution. To further enhance scalability and portability, the entire system is containerized using Docker, making it easy to deploy and manage across different environments.

Cloud

Zircon’s cloud deployments are orchestrated using Terraform on AWS, ensuring that infrastructure can be reliably provisioned and maintained. While this portion didn’t involve writing application code, it presented its own set of challenges—mainly in coupling components and ensuring backend resources received the correct input.

Content Generation

Zircon generates content by first downloading the lecture transcript from the Kaltura Provider and uses LLMs to summarize the material and create structured Markdown notes. Once the summary is generated, it is sent to LemonFox, which converts it into audio and a subtitle JSON file that timestamps when each word is spoken and when it ends.

The backend then processes this subtitle data, converting it into a SubStation Alpha file. This step follows a logic similar to the Text Justification LeetCode problem, ensuring that subtitles are formatted cleanly and displayed in sync with the generated video. Ensuring that this subtitle file is correct is incredibly important to the entire process. Errors, no matter how slight, can negatively impact the generation of the video drastically.

FFMpeg

Zircon uses FFMpeg 6.1.2 to render videos, leveraging a complex filter to handle multiple processing tasks efficiently. This includes:

Burning in subtitles to ensure they are permanently embedded in the video.
Applying a watermark for branding.
Compressing the video to reduce bandwidth usage and storage costs.
Synchronizing all media components, tying together the background video, TTS-generated audio, and subtitles into a single cohesive output.

The code below outlines the full FFMpeg process used to generate the final video:

cmd := exec.Command(
   "ffmpeg",
   "-y",
   "-stream_loop",
   "-1",
   "-ss",
   fmt.Sprintf("00:%02d:%02d", minuteOffset, secondsOffset),
   "-i",
   backgroundVideo,
   "-i",
   filepath.Base(aacFp.Name()),
   "-i",
   logoPng,
   "-filter_complex",
   fmt.Sprintf("ass='%s'[subs];[2]format=rgba,colorchannelmixer=aa=0.3[logo];[subs][logo]overlay=main_w-overlay_w-10:10[output];", filepath.Base(subtitlesFp.Name())),
   "-map",
   "[output]",
   "-map",
   "1:a",
   "-c:v",
   "libx264",
   "-c:a",
   "copy",
   "-crf",
   "30",
   "-shortest",
   "output.mp4",
)

-y: Overwrites the video if it already exists. Theoretically may occur but it’s near-zero probability.
-stream_loop -1: Loop the background video indefinitely.
-ss 00:XX:XX: Start the video at a random offset to create the illusion of variety of content.
-i backgroundVideo: Input at index 0 is our background video (Minecraft, Subway Surfers, etc…).
-i aacName: Input at index 1 is our text to speech audio.
-i logoPng: Input at index 2 is our PNG logo.
-filter_complex …: The complex filter applied, explained in greater detail below.
-map [output]: Defines the content generated by the filter as the video output for the command.
-map 1:a: Maps input 1 as the audio track of the generated video.
-c:v libx264: Defines the encoding codex of the video as libx264.
-c:a copy: Copies the codex of the audio which is okay as we are not modifying the content. This speeds up processing.
-crf 30: Sets the constant rate factor as 30 which allows for lossy encoding to reduce the size of the output.
-shortest: Defines the output’s length as the shortest of the inputs (usually the audio).
output.mp4: The output file should be named “output.mp4”

The filter_complex flag defines a processing pipeline to modify and edit the visuals of the video. Broken down, it can be written as shown below:

ass='subtitle.ass'[subs];
[2]format=rgba,colorchannelmixer=aa=0.3[logo];
[subs][logo]overlay=main_w-overlay_w-10:10[output];

Our first command takes the subtitle.ass SubSystem Alpha file and overlays it over the video generating an output labeled subs
The second command takes our second input (the logo) and reduces its opacity to 30% of the original renaming it as logo
The third command takes our subtitled video and overlays our logo on top of it generating our final product output

FFMpeg Process — A visual description of the filter in [tag/index]:[description] format

Asynq

To handle job processing Zircon takes advantage of a Go library called Asynq . Asynq utilizes a Redis instance to schedule jobs across multiple servers and workers within servers. This allows us to queue workloads to be handled eventually by worker nodes. In addition, this tool also provides priority queues such that more important requests can be processed earlier than lower priority requests. Although not implemented in Zircon, should one have a paid tier or similar you could easily prioritize those users by placing their jobs in a higher queue.

AWS Architecture

Building a tool like Zircon requires more than just processing power—it requires scalability, efficiency, and security. This architecture follows a serverless, asynchronous model, ensuring that tasks happen seamlessly without the need for dedicated servers.

The diagram below illustrates how these components interact to create a fully automated, distributed system. In the following sections, I’ll break down the setup into the core components: networking, security, compute, and storage.

The overarching AWS Architecture encompassing most resources

Networking

One of the more novel challenges that arose when transitioning the tool from running locally to the cloud was the networking infrastructure. Ensuring that users are able to access your services but that these services are also able to efficiently communicate with one another is a very different paradigm than what I was used to. In particular, the introduction of a Virtual Private Cloud (VPC) that stores certian resources which can’t normally be reached without configuring routing rules added a layer of complexity that I was not expecting.

The diagram below emphasizes the elements involved in ensuring operational networking. My following analysis will give a better idea of what each service does and why it’s useful to Zircon.

AWS Networking Architecture — The networking resources used in the project

Route 53

For users to first even access the service it’s best that we provide a domain name that they can use. Simply put, trying to connect to a random ip address would just not be useful, let alone reliable. Route 53 provides a stable endpoint for users even if resources change in the backend. Moreover, the domain name also helps provide authenticity for the application hosting entries for SSL, DKIM, and DMARC validation.

Cloudfront

Zircon’s Cloudfront service serves two roles. Firstly, it serves assets from the S3 bucket to users. Secondly, it routes any API traffic that comes in to the API gateway. The main benefit of this tool is its ability to cache.

For S3 assets, storing frequently accessed resources can reduce the time for retrieval. Moreover, the bandwidth cost of Cloudfront is much cheaper so we are not penalized if a resource is requested more than once in short succession. For API Requests, common shared ones such as the health or exists endpoints can benefit from caching as they do not need frequent updates.

In earlier versions of Zircon users would get items through a webserver which would pull from S3. While this does work in the short term, it does not scale well when bandwidth becomes a limiting factor for instances.

API Gateway

The API Gateway serves as the router for incoming traffic to the service. It firstly provides a singular spot to handle API configuration like permitted headers, CORS policies, allowed origins, etc. In addition, it also integrated with our Cognito authorizers to ensure that security was not an issue handled by our backend code but rather by managed services. Lastly, using this tool ensured that a user’s request is processed by the right function.

VPC

One of the core components of Zircon’s infrastructure is the Virtual Private Cloud (VPC). This system allows for us to account for more complex interactions and interfacing without the noise of the outside internet impacting us. Notably, the last three tools worked by calling or invoking one another, however, in our case we need to have tools within the space to send data to one another. The VPC allows us to define these interactions through public subnets, private subnets, routing tables, nat gateways, internet gateways, and VPC gateways.

Public Subnet

The public subnet can be identified by the green box in the VPC. This area is designed to provide instances inside an IP address alongside access to an internet gateway to access the internet. In earlier versions of Zircon, prior to having a serverless infrastructure our frontend was stored in this area. This was so that it would have access to the internet and be able to communicate with users. However, since switching to a serverless model where some Lambdas operate outside of the VPC this is no longer used for such a purpose. This does not mean it is not useful though. It’s critical to have for an operational NAT Gateway.

Private Subnet

Similar to the public subnet the private subnet serves to group resources within the VPC highlighted by the blue box. Typically instances in this area are given private IP addresses meaning they can be referenced within the VPC but not from outside. This typically means that they cannot access the internet without additional configuration which we will discuss shortly. Nonetheless, this lack of internet access typically also means these resources are safer since external tools can’t typically interface with them. With our processing cluster inside here this is intentional so that malicious actors cannot interfere with our compute resources.

NAT Gateway

Network Address Translation (NAT) Gateway is a useful managed service which allows our elements within the Private Subnet to access the internet. This gateway ensures that while traffic can be established outwards, any traffic entering cannot be routed to an element. Simply put, my resources in the private subnet can access APIs, Services, etc outside of the VPC but any actors cannot access my resources going the other way. One might ask why not place items into a public subnet as opposed to using a NAT Gateway. The short answer is that if I was to put my resources in the public subnet that would expose them to the internet while the NAT Gateway prevents them from doing that. By far, this is the most expensive part of Zircon, however, it is not necessary if you are willing to sacrifice some security/best-practices for cost reduction.

Public Subnet Alternative

NAT Gateways are the most expensive crux of this architecture. Most other resources can be provisioned at the free tier if you have minimal usage (~5 users). If you’re in this situation, you can avoid using the NAT gateway by placing the EC2 instances in a public subnet. While this is not best practice as your nodes are potentially accessible, it is substantially cheaper if you have 1 or 2 compute nodes. A NAT Gateway costs approximately 0.045 cents per hour versus an IP address at 0.01 cents per hour or free if in the free tier.

VPC Endpoint Alternative

If your cluster is spanned across multiple private subnets in different availablity zones it may be better to use VPC endpoints in configuration. Since our cluster ONLY accesses AWS resources, there is no need for pure internet access through the internet gateway. One can provision ~6 VPC endpoints to ECS, ECR, and SES. These cost approximately 0.01 cents per hour. So one can either pay 6 cents per hour for usability across multiple availability zones or 4.5 cents per hour per availability zone for a NAT gateway. Hence, if using over 1 availability zone, it is better to utilize this architecture.

VPC Endpoints

Sometimes in our operation we need to access AWS Resources. Instead of having to go to the internet gateway, route over traffic, and end back in a AWS Datacenter one can use VPC Gateways to keep traffic within the datacenter. This not only speeds up requests but also reduces the bandwidth cost by not penalizing for outbound traffic. In Zircon’s case these are used to access the DynamoDB database and S3 Blob Storage.

Routing Tables

While having all these resources is amazing, being able to ensure that requests go to the right place is acomplished using Routing Tables. There are two in affect, a public routing table and a private routing table each with their own unique tasks and responsibilities.

Private Routing Table: handles traffic exiting the Private Subnet and sends them to the appropriate areas.
- S3 Access: Our cluster instances need to access to store generated videos. Any requests to the S3 bucket should be routed to the right VPC Gateway.
- DynamoDB Access: Our cluster instances need to ensure that our database is up to date for any generations, send any requests to the proper VPC Gateway.
- Other: Our cluster also needs to access other AWS resources which don’t have gateways such as the Elastic Compute Registry (ECR), Elastic Compute Service (ECS), and Simple Email Service (SES). Route this traffic to the NAT Gateway.
Public Routing Table:
- All Traffic: Any traffic that comes to the Public Routing Table should be sent to the Internet Gateway.

SES

Informing our users of a completed video generation is done using AWS Simple Email Service (SES). Instead of configuring SMTP the SES service handles all of that. This tool ensures that generated mail is routed to the right recipient without any issues or hiccups.

Security

Security is a fundamental aspect of Zircon’s architecture, ensuring that user data remains protected, access is controlled, and communications are encrypted. The system integrates multiple layers of security using AWS services to mitigate risks and enforce best practices. These include: certificate manager, cognito, JWT tokens, IAM, and security groups.

Together, these security measures create a defense, safeguarding Zircon’s infrastructure while maintaining the user experience. The following sections will explore how each of these elements contributes to Zircon’s overall security model.

AWS Security Architecture — The security resources used in the project

Certificate Manager

Having secure HTTP traffic is a cornerstone for most modern projects and is often taken for granted. Encorporating SSL ensures that users are sending encrypted information between them and the server to prevent bad actors from comprimising sensitive data. Certificate manager creates these signed certificates and also entries for our domain name such that clients can properly validate their connection. Moreover, a nice benefit of having Certificate Manager is that SSL renewals are done automatically and do not need additional code or effort to keep this maintained.

Cognito

Keeping track of user accounts and ensuring that sensitive data is stored securely is handled by Cognito. Earlier versions of Zircon handled authentication on its own by creating Google OAUTH urls, handling callbacks, and generating tokens. However, while this did work, Cognito generates user pools such that other AWS resources can use this information. Moreover, handling JWTs is complex. Having infrastructure to validate, reissue refresh tokens, and keep track of user information is quite involved.

Cognito is utilized by the API gateway to validate tokens and restrict user access to APIs. In addition, cognito can also invoke lambdas. While not shown in the diagram this cognito has a pre-sign-up and post-confirmation downstream lambda.

resource "aws_cognito_user_pool" "zircon_user_pool" {
  name = "zircon-user-pool"
  lambda_config {
    pre_sign_up       = aws_lambda_function.pre_signup_lambda.arn
    post_confirmation = aws_lambda_function.post_signup_lambda.arn
  }
  lifecycle {
    prevent_destroy = true
  }
}

The pre-signup lambda ensures that the registering user is valid with a @umn.edu domain name. If this is not the case, the user is prevented from signing up.

if request.Request.UserAttributes["email"] == "" {
   return request, fmt.Errorf("email is empty")
}
if !strings.HasSuffix(request.Request.UserAttributes["email"], organization) {
   return request, fmt.Errorf("email is not part of the organization")
}

The post-confirmation lambda is only triggered when a new user registers which creates an entry for them in our database. This allows them to begin queuing jobs on the service.

err := psus.dynamoClient.CreateUserIfNotExists(request.CognitoEventUserPoolsHeader.UserName)
if err != nil {
   ...
}

JWT Tokens

Due to our serverless nature, having authentication based on session is not practical. For example, if you authenticated with Server 1 but your requests shifted to Server 2 you would need to reauthenticate as your validity only existed between you and Server 1. This is not ideal for a user experience. To resolve this, Cognito generates JWT tokens.

JWT tokens are stored by the client and contain a header, payload, and signature. The signature is signed by a secret key which the user does not know. If the header or payload is modified, the signature won’t match and the request would be rejected. As long as all of our servers have this private key they can validate and reject tokens. The original implementation of Zircon did this manually, however, now Cognito handles it entirely.

IAM

With so many resources, ensuring that each component only has the minimal access needed to complete their job is preferred. For example, we would not want to give a health lambda access to our user pool if there is no reason to do so. What if it gets comprimised? An attacker could execute malicious actions on user information. Identity Access Management (IAM) allows us to configure this least-needed access policy.

Good examples of this include cloudfront only having read access to the S3 bucket, users should not be able to overwrite or upload their own content into the system. API gateway has read-only access to cognito to validate JWT tokens, it cannot actually access user data. The cluster EC2s have access to write to the S3 bucket and to write to the DynamoDB database but only in specific areas/tables. All of this ensures that resources do not overstep their boundaries and are limited to the minimum of what they need to operate.

Security Groups

Lastly we have security groups. These operate similarly to IAM policies, however, they govern which connections can and cannot be made. This predominantly applies to the elasticache instance in the VPC which has a configuration to only allow network traffic from the cluster, the health lambda, or the queue lambda. Any traffic outside of these instances is blocked.

Compute

Zircon uses a combination of serverless architecture for handling user requests and a more dedicate compute section for handling longer running jobs like video processing. This structure ensures that critical low-SLO elements remain scalable and responsive while less critical elements focus on being cost-effective. Structuring the codebase as such provides substantial benefits and an adaptable infrastrcture.

This balance is achieved by using lambda, event based invocations, ECS & ECR, cluster management, and autoscaling groups.

AWS Compute Architecture — The compute resources used in the project

Lambda

To handle quick or asynchronous events lambdas are spun up. These are ephemeral blocks of code that are only run when invoked. The current setup has the following lambdas: exists, submit job, tts generation, queue, and health. When called, they initialize their connections to AWS resources or APIs. They run to handle the event request and eventually terminate. Since lambdas are often small and ephemeral it is easy to spin up more or less based on demand.

Events

Lambdas are invoked by events. In Zircon’s case there are API Gateway events and DynamoDB events. API Gateway events occur when an endpoint is sucessfully called, this passes in the request information and the lambda is tasked to process it. Lambdas for exists, submit job, and health are invoked using this event method.

DynamoDB events occur when a specific update occurs on a table. Lambdas for tts generation and queue are invoked in this method. To better utilize the DynamoDB event I introduced filters such that events only trigger if specific conditions are met. For example, tts generation only occurs if a modification is made to an entry, and that modification changes the audio-available item from false to true. In queue’s lambda, the event only gets triggered if a new video entry is created.

Using events with lambdas allows us to handle different occurences within our backend both synchronously and asynchronously.

ECS & ECR

In order to manage our cluster two resources are utilized, Elastic Compute Service (ECS) and Elastic Compute Registry (ECR). ECR is where docker images for the consumer are placed, this is what handles the actual video generation. ECS oversees the deployment of these containers on compute instances such as EC2. Doing so enables us to quickly spin up and down containers as per our workload need.

One might ask why a cluster is needed for video generation as opposed to using a Lambda. This is a fair question, lambdas would be a worthwhile usecase here. However, videos can occasionally take greater than 15 minutes to render which exceeds the runtime duration of a lambda. In this case having a cluster allows for us to compute for longer than 15 minutes, typically at a cheaper rate.

Cluster Management

Cluster management is handled within ECS, this involves defining services and tasks. Doing so lets us specify how much many instances of a specific task we want operational, what each task contains, and the permissions that each task recieves.

Autoscaling Groups

Zircon uses an EC2 autoscaling group to provide compute resources for tasks. This spins up or down EC2 instances based on the need of the service. Other scaling groups like Fargate could be used which abstracts away EC2 selection. In this situation I chose EC2 as Social Coding had grant funding which could be applied to this specific use case. If one wanted to handle compute using Fargate changing this would not be difficult outside of changing the aws_ecs_capacity_provider.ecs-consumer-capacity-provider terraform resource.

Storage

Lastly, we have our storage solutions. These systems help us maintain state in the backend and keep track of our files or other information. This involves the usage of DynamoDB, Elasticache, and S3.

DynamoDB

For this project we need a database that can scale horizontally and store large amounts of data in a short amount of time. DynamoDB is a perfect contender for this. Most of our data does not hold relational information and as such can take advantage of NoSQL paradigms. Dynamo handles storing our user profile information, video metadata, and video generation requests.

Elasticache

In order to maintain a queue for our operations Elasticache runs a Redis cluster which Asynq connects to. This allows clusters instances to simultaniously be aware of the tasks in processing.

S3

To store data that does not have structure such as files and other assets we utilize S3. This storage keeps the rendered videos, audio, and subtitle files.

AWS Storage Architecture — The storage resources used in the project

Acknowledgments

While I developed Zircon, this project wouldn’t have been possible without the support of many people along the way. I’d like to give this time to thank them for their efforts and time.

A special thanks to Alan Hagedorn . When I visited Seattle during my time at Robinhood, he introduced me to a video generation pipeline for rendering Reddit content. That conversation planted the seed for Zircon and was instrumental in helping me understand SubStation Alpha subtitle files and other key aspects of the video generation process. His insights and support played a crucial role in bringing this project to life.
Timothy Tu for providing valuable feedback throughout the development of the application. His input helped shape key aspects of Zircon,
I’d like to express my gratitude to David Ji , who was an incredible mentor during my internship at Robinhood. His guidance in understanding AWS and cloud infrastructure was invaluable, and I don’t think I would have been nearly as well-equipped without his support. I also appreciate him taking the time to help me revise the content of this writeup.
The development of Zircon wouldn’t have been possible without everyone who helped test it and check for edge cases. Thank you to Eshaan Jay , Tim Ekblad , Aamna Bashir , and Henry Olig .
Lastly, I’d like to thank the officer board at Social Coding who helped provide the compute fleet for hosting this application.