Currently working at Reservoir Labs as an Automatic Parallelization and Compiler Research Engineer. Interested in high-performance computing, programming languages, compilers, formal verification, novel computation architectures, and distributed systems. I treasure those moments of flow when you are working on a difficult problem with the right team.
• Introduced position iteration spaces to complement the standard coordinate iteration spaces
• Generalized standard loop transformations to sparse loops
• Constructed a CUDA GPU backend that allows generating high-performance CUDA kernels for all sparse tensor algebra expressions and formats supported by the Sparse Tensor Algebra Compiler (TACO)
• Built an open-source implementation (7k+ lines of code) that was merged into TACO
• Gave two hour-long presentations to NVIDIA Research and a poster session at "Workshop on Compiler Techniques for Sparse Tensor Algebra"
• Submitted a thesis and a paper under review for publication
• Developed a tool to fingerprint for a user-specified WebSocket protocol given an incomplete TCP packet capture
• Designed and built a framework used in production to allow traders to easily develop scripts to hedge positions
• Improved the performance of high-performance distributed component and extended it with additional features
• Investigated persistent kernels for RNNs by building and comparing 6 different approaches
• Worked with client to show 100x throughput improvement by using GPUs instead of CPUs for real-time ASR task
• Created complex optimizations at the thread, warp, block, and stream level
• Utilized advanced features of CUDA, such as cooperative groups, tensor cores, and warp-level primitives
• Achieved 3x the throughput of cuDNN implementation for batch size 1 inference
• Gave two hour-long presentations to a total of 50+ engineers and presented at a company-wide poster session
• Built several projects in C and Assembly to run on a massively-parallel approximate-arithmetic SIMD mesh
• Developed a framework to run neural networks and perform real-time ImageNet classification in .04W/fps
• Designed and implemented an algorithm to parallelize neural network training for speech recognition
• Built a genetic programming framework which included manipulating genome trees in Assembly
• Created a real-time optical flow computer vision demo that ran at 50 FPS and only used 0.25W
• Prototyped interactions in augmented reality reporting to the VP of UX in Unity and C#
• Implemented computer vision algorithms in C#
• Built Android app based on existing iOS app, including infinite scrolling, socket-based messaging, push notifications, and offline caching
• Developed 12 published Android apps, between ages 14-17
• Generated $60k+ in revenue from app sales, advertising, and in-app purchases
• Open Mic+ has 4 million downloads and was featured on XDA and LifeHacker
• Commandr has 1.5 million downloads and was featured on CNET, XDA, and LifeHacker
• Commandr was selected for Android Authority’s 10 Best Android Apps of 2014
Relevant Coursework: Distributed Systems, Computer Systems Security, Multicore Programming, Operating Systems, and Principles of Computer Systems
Paper under review for publication: Preprint
Relevant Coursework: Performance Engineering of Software Systems, Computer System Engineering, Computer
Vision, Computation Structures, Design and Analysis of Algorithms, Introduction to Neuroscience,
Artificial Intelligence, Linear Algebra, Mathematics for Computer Science, and Introduction to Probability
Activities: VR@MIT (VP of Community and Lead Organizer of MIT IAP class), intramural ice hockey, intramural badminton, intramural soccer, and intramural table tennis
Winner of Binance Decentralized Exchange Competition $60k prize
Centralized exchanges rely on trusting that their owners will take the proper security precautions. N Chainz is a decentralized cryptocurrency exchange. Specifically, N Chainz’s features include block generation, limit orders, and the ability to trade a base token with another token. We use a novel multi-chain consensus structure to increase performance and scalability. Built with Nicholas Egan and Lizzie Wei.
KeyChain is a trustless authentication system, which stores username to public-key mappings on the Ethereum blockchain. KeyChain makes asymmetric cryptography usable for normal users by providing a "web of trust" recovery system where users can recover lost or compromised private keys without a third-party. Built with Sarah Wooders and Michael Shumikhin.View Project
Created a dataset of 373,521 images and trained a captioning neural network to allow for fine-tuned captioning of clothing images (performing substantially better than existing models). We then created methods to take either an image or a caption and produce an embedding vector, which allowed for more accurate nearby searches as well as manipulations. Built with Sarah Wooders.View Project
Built in high school. Uses speech recognition to listen for "Okay Google" in the background and launch Google Now. Launched months before Google added this feature themselves.View Project
Built in high school. Uses an accessibility service to intercept commands to Google Now and run custom commands.View Project
As show in the the Black Mirror episode “Nosedive”, online reputation systems have the potential to be extremely powerful and dangerous. We believe that online reputation systems are inevitable. However we believe that if such a system is ultimately going to exist, it should be decentralized, rather than being controlled by private corporations or governments. We wanted to see how such a system would look like and operate, so we developed a distributed reputation app using Android, Ethereum blockchain, and the Eigentrust algorithm. Built with Sarah Wooders and Michael ShumikhinView Project
Facebook Global Hackathon Finalist and Top 8, Best Facebook Hack YHacks at Yale University
Once you've talked to a person for a certain amount of time, SocialEyes recognizes and stores their face on the Android app. It groups those faces by person and you then have the ability to connect their face to their Facebook account. From then on, whenever you encounter that person, SocialEyes tracks and recognizes the face and brings up his or her information in a HUD environment. SocialEyes can even tell you the persons heart rate as you are talking to them. Built with Logan Engstrom, Michael Shumikhin, and Logan Taylor.
2nd Place in Crowd Vote and Best AR Hack TreeHacks at Stanford University
We hacked Android to access the screen, live broadcast it, as well as control the touchscreen via code! We used firebase as a realtime update mechanism for storing our data, which proved to be challenging as Unity does not have a library for firebase. We hacked together our own Unity firebase library in order to turn the vision into reality. Using certain linux commands, we were able to output the screen's buffer into a png file, which we read back, resized, and uploaded to firebase as a base 64 String. We relayed touch coordinates that simulated taps on the touchscreen via linux commands that simulate touchscreen input. Built with Mohammad Adib and Andrew Nguyen.
2nd Place and Best Microsoft Hack DubHacks at University of Washington
Problem: GPS's are super useful even when you know how to get from point A to point B because they will route you away from traffic. But it is a pain to have to start the gps every time you get in the car. It also drains your battery and forces you to listen to directions you already know.
Solution: Use gps, wifi, and accelerometer data to detect when you are driving. Then perform analysis on the possible routes (using Bing Maps API) to determine if you are attempting to drive home and if you are taking the optimal route. Once it detects you are driving it will periodically check if your preferred route is still the fastest. If there is a better route it will let you know and ask if you wish to start navigation. Built with Joseph Zhong.