My Experience Interning at Amazon
September - December, 2024
Before the Internship
The Opportunity
This fall, I took a leap and left school for a semester to spend 12 weeks in Seattle, WA, interning with Amazon. The journey began in February when I initially interviewed for a summer internship. Although I received an offer, I requested to postpone it to the fall semester since I had already committed to a summer internship at Pinterest. Amazon declined my request, stating the offer was only valid for summer, so I turned it down, unwilling to renege on my Pinterest commitment. To my surprise, in early April, I received an unexpected email: "Congrats, you got the Amazon internship for Fall 2024."
Overcoming Hesitations
I was ecstatic to have the opportunity to experience a co-op during the school year. Despite my excitement, a twinge of anxiety lingered about moving to Seattle, especially since my family lives in Oakland, and I study at UC Berkeley. This would be my first time living away from home in a city known for its rainy fall weather. I had asked if the internship could be in the Bay Area, but the answer was no. Knowing it would push me out of my comfort zone, I accepted the challenge. After completing my summer internship at Pinterest and learning valuable lessons, I packed my bags on September 26, 2024, and flew to Seattle.
Settling in Seattle
I found a micro-studio apartment in Capitol Hill, just a 15-minute walk from the Amazon office, for $925/month. It was a great deal, offering all the basics: a mini fridge, microwave, toilet, shower, folding table and chair, twin bed, utilities, and Wi-Fi. Arriving in Seattle on a Thursday, I used the weekend to settle in and explore my new surroundings.
One of the first challenges I encountered was the absence of familiar stores like Target, Ikea, or Costco downtown. Instead, I discovered a local chain called QFC, owned by Kroger. It had groceries and essentials like sheets, towels, and dishware that I needed for my apartment.
Exploring the City
I ventured out to the iconic Pike Place Market, which buzzed with positive energy. Though I didn’t purchase anything, I enjoyed soaking in the positive vibes, admiring waterfront views, and browsing the handmade crafts by local artists. I also visited the original Starbucks, which had a line extending out the door.
Curious about Seattle's history, I took an underground tunnel tour. I learned about the city's frequent flooding in its early days, which caused sewage to back up into the streets and bay. To solve this, the streets and sidewalks were raised by 10 feet, improving the drainage system. While the historical tidbits were fascinating, the actual tour under the raised sidewalks was underwhelming.
Discovering the Parks
Before starting my internship, I explored the Japanese Tea Garden near Washington Park Arboretum. The park became my daily running spot, and the tea garden offered a serene and unique experience. It featured bonsai trees, a koi fish pond, and a variety of beautiful plants with striking foliage and blossoms. If you’re a plant enthusiast, it’s an excellent place for inspiration.
Inside Amazon
First Day and Team Introduction
My internship at Amazon started on a Monday morning. I met my manager, Pavan, in the building lobby at 9:30 AM. Amazon had shipped my laptop to my home in Oakland, but it arrived after I had already flown to Seattle, so I had to pick up a new one from the IT station. After getting the laptop, I met Pavan, who introduced me to my mentor, Sacchi, and the rest of the team: Yiming, Sahithya, Vishesh, Salma, Shuli, and Nymish.
The Automated Profitability Management (APM) Team
I was part of a division within Amazon's Retail Unit called Automated Profitability Management (APM). In simple terms, Amazon has internal vendor managers who handle relationships with companies like Nike, Apple, and other retailers that sell their products on Amazon’s retail website. Amazon being Amazon, these vendor managers negotiate with these companies to ensure Amazon gets the best prices and deals. For example, they may negotiate product prices or clawbacks—money owed to Amazon if the product fails to meet performance expectations.
The core of these negotiations lies in identifying "opportunities." These opportunities can include factors like raw material price changes, currency fluctuations, inflation rates, and competitor pricing. Essentially, they are any pieces of information that can justify terms in negotiations with vendors.
My Role: Supporting Vendor Management Tools
My team helped build and maintain the internal tools that vendor managers use to manage these negotiations and opportunities. These tools include dashboards, data storage systems, and algorithms.
For my internship projects, I focused on two main tasks: adding new functionality to help vendor managers construct negotiations more effectively and upgrading a database to support more complex read operations. This upgrade would make it possible to develop new product features and improve the tools’ capabilities.
Technical Projects
Project 1: Designing an API for Negotiation Opportunities
My first project involved designing and developing an API that allowed vendor managers to filter opportunities and generate a negotiation. Opportunities are the key justifications behind the terms in a negotiation, but, surprisingly, vendor managers previously had no control over them. This meant they often worked with outdated or irrelevant opportunities, such as those from a year ago or ones tied to different ASINs (Amazon Standard Identification Numbers). This lack of control was a long-standing issue, and my project aimed to address it by giving vendor managers the ability to filter and select the opportunities relevant to their current negotiations.
Technical Details: Backend Development and Design Decisions
I focused on the backend aspect of the feature, essentially functioning as an "API plumber." My job was to write the code that connected to the frontend interface, where vendor managers could select filters from dropdowns and input other criteria. After receiving this data, my API processed it to create a negotiation that was then displayed on the vendor manager's dashboard.
Amazon uses its own framework for building APIs called Coral, which is similar to Express.js or Java Spring Boot but tailored specifically for Amazon’s needs. It allows seamless integration of various services and, for my project, enabled the frontend website to pass opportunity filter data to my backend. Setting up a new Coral service, however, was a bit complex, which added some challenges to the project.
Choosing the Right Architecture
I had two potential design options for implementing the API. The first involved placing my code in a package that contained logically related code, but it didn’t have a Coral service. This would require adding a new Coral service, which was a more involved process. The second option was to reuse an existing package within my team that had a Coral service, but the code in this package wasn’t directly related to the task at hand. I would write new code here to accept opportunity filters from the frontend and pass them to the other package (which had related code but no Coral service) via an Amazon Simple Queue Service (SQS). This SQS would trigger an AWS Lambda function to process the filters and leverage the existing code from the other package.
After presenting both options to my team, we decided on the second approach: reusing the existing Coral service and implementing the Simple Queue Service with the Lambda function. This decision had some trade-offs. On the downside, it made the code asynchronous—meaning I couldn’t return a direct response from the API. Instead, the vendor manager would be notified when the negotiation was created. On the positive side, this approach was lighter and faster to implement, as it allowed us to skip the complexity of setting up a new Coral service. It also reduced code bloat and enabled us to develop a prototype quickly that could be presented to the vendor managers for feedback. If they liked it, we could build a more permanent solution with a dedicated Coral service and more synchronous responses.
Development and Results
Once the architecture was decided, development was relatively straightforward. I used the opportunity filters to quickly retrieve matching opportunities from an Elasticsearch engine. I then integrated with other internal services to construct the negotiation based on the filtered opportunities.
I completed the development and unit testing of the project before reaching the midpoint of my internship. The feedback I received from my team was positive, and I was ready to present the prototype to the vendor managers for their input.
Project 2: Upgrading the Database for Complex Searches
In my description of Project 1, I mentioned using an Elasticsearch database for finding matching opportunities. While that approach worked well, it only covered about a quarter of the opportunities. The rest were stored in a DynamoDB database. However, DynamoDB, being a non-relational database, lacked the ability to perform the complex searches required for my API. Therefore, my second project focused on upgrading the DynamoDB database to one that could support more complex reads, both for my API and future projects on our team’s roadmap.
Choosing the Right Database
This project was more technically challenging and allowed me to learn a lot. The first step was deciding what type of database to use. I considered non-relational, relational, and analytical engines.
Non-Relational Databases
On the non-relational side, options like DynamoDB and MongoDB are popular. These databases store data in flexible formats like JSON, which is great for flexibility because the objects don't need to have a predefined structure. However, this flexibility comes at a cost. Complex searches are difficult because DynamoDB, for example, operates essentially as a key-value store. It allows reads, writes, and updates by opportunity ID but doesn’t support advanced searching. MongoDB offers slightly more advanced searching capabilities, but its limited number of indexes means complex queries are still restricted. For our team's use cases, these non-relational options weren’t sufficient.
Relational Databases
Relational databases, such as MySQL, PostgreSQL, Oracle, and IBM Db2, store data in tables with rows and columns. The objects are mapped to rows, and relationships can be created between different tables. While relational databases excel in handling structured data and complex queries, they do have some drawbacks. The rigid schema requires every object to be mapped to a row, and fields that aren't used in every data object are often set to null. Additionally, traditional relational databases have historically had storage limitations. However, cloud services like AWS have significantly increased these limits. For example, Amazon Aurora can handle up to 128 TB of data, which is far more than our team's current storage needs of less than 1 TB.
Relational databases also excel in transactional processing (OLTP), which is crucial for operations that involve frequent updates, inserts, and searches. By creating indexes on various fields, relational databases can support efficient searches, including those needed for filtering opportunities by vendor managers. The downside is that creating too many indexes can increase storage costs and slightly slow down query times. I also found some differences between relational database engines. For example, PostgreSQL uses immutable rows, meaning that every update to a row creates a new version, and every associated index must be updated to point to the new row. This approach, while beneficial for concurrency and versioning, adds overhead. On the other hand, MySQL allows in-place updates, which are more efficient because the row is modified directly without the need for creating a new one or updating associated indexes. Though MySQL's in-place update approach is more efficient, my team ultimately did not choose MySQL. We selected PostgreSQL instead, as it supports more complex features, such as indexing JSON data, which was critical for transitioning from DynamoDB.
Analytical Engines
On the analytical engine side, online analytical processing (OLAP) tools like Elasticsearch, RedShift, and Trino offer powerful search and aggregation capabilities. These engines are built on top of other storage systems, such as DynamoDB, PostgreSQL, or AWS S3. They can handle complex searching, aggregation, and calculations but are much more compute- and storage-intensive than traditional databases. These engines also don’t support transactional processing, so write operations, such as inserts and updates, require a separate data storage solution. Furthermore, the time for these engines to reflect updates can be several hours, making them unsuitable for our project needs.
Final Decision: AWS Aurora PostgreSQL
After evaluating all the options, I recommended using a relational database, specifically AWS Aurora PostgreSQL. Non-relational databases couldn’t meet our complex searching and filtering requirements, and analytical engines were too overkill and expensive. PostgreSQL offered sufficient storage, could support our transactional workloads, and allowed us to index JSON fields—essential for transitioning from DynamoDB. AWS Aurora PostgreSQL is Amazon’s version of PostgreSQL, customized to perform more efficiently in the AWS environment while maintaining full compatibility with PostgreSQL.
After getting approval from my team, I began performance testing to ensure the database could handle both our current and future planned use cases.
Introduction to Performance Testing with Aurora PostgreSQL
Conducting a performance test was a completely new experience for me. To get started, I first outlined my testing goals. I wanted to evaluate whether Aurora PostgreSQL could support our current DynamoDB workload, which includes operations such as reading, inserting, and updating opportunities by opportunity ID. Additionally, I aimed to test for future workloads, like finding opportunities that match specific fields (e.g., x, y, z) or executing aggregate queries, such as identifying the top N vendors with opportunities that can generate the most revenue for Amazon. Furthermore, I sought to test the database's reliability under high concurrency, where multiple users read and write to the database simultaneously, as well as to explore potential deadlock scenarios.
Analyzing the Existing DynamoDB Workload
To begin with, I examined our existing DynamoDB databases. These databases contained approximately 30 million opportunities, each supporting sub-20 millisecond latency for insert, update, and read operations. This gave me a benchmark for performance, and I used it as a reference point when designing the PostgreSQL performance tests.
Designing the PostgreSQL Schema
The next step was to decide on the schema and structure of the PostgreSQL tables. I needed to define the number of columns, field types, indexes, and other design details. I aimed to populate these PostgreSQL tables with roughly 30 million opportunities to replicate the existing workload.
Initially, I tried to design the schema to closely mirror the structure of the data in DynamoDB. In DynamoDB, the opportunities were stored as JSON objects, with each object broken down into sub-objects representing various parts of the opportunity. Based on this, I created an opportunity table in PostgreSQL with a superset of all the fields across all opportunities. I also represented the sub-objects as custom PostgreSQL composite types.
Challenges with the Initial Schema Design
This initial schema resulted in a unique database structure, involving arrays of composite types. However, I encountered some limitations, especially when I attempted to index nearly every field. For instance, I discovered that indexing on fields within composite types stored in arrays was not possible.
Research and Re-evaluation of the Schema
To resolve these issues, I conducted further research to explore alternative approaches. During my search, I came across the concept of database normalization, which organizes data into multiple tables and creates relationships between them. This technique allows you to break down complex data structures into simpler components, making them easier to manage and index.
Normalization and Improved Database Efficiency
I realized that my initial schema design was highly inefficient and not recommended. With the principles of database normalization in mind, I restructured the schema by separating the composite types from the main opportunity table. I created two tables: one for the main opportunity data and another for the composite types, which were previously stored in arrays. By associating each composite type with a corresponding row in the main table, I could now index these composite types, significantly improving the efficiency of searches for matching opportunities.
This normalization approach not only optimized database performance but also increased flexibility for potential future schema changes. It was a valuable learning experience, and the revised schema has allowed for better performance, reliability, and scalability in Aurora PostgreSQL.
Populating the Database with 30 Million Opportunities
The next challenge was populating the PostgreSQL database with 30 million opportunities. Initially, I considered using real opportunities from our existing DynamoDB database. However, this would require reading from DynamoDB, deserializing the data, and then writing it to PostgreSQL. The deserialization process was slowing me down, so my teammate suggested using fake data since the performance test didn't require the actual data. I decided to take their advice and began using fake data.
Performance Issues on Local Machine
I started by running a Python script on my local computer to populate the database. However, my computer could only insert about 1 million fake opportunities per hour. I attempted to run the script overnight and over the weekend, but my machine frequently timed out, losing connection to the database. As a result, several days passed with minimal progress.
Leveraging EC2 for Better Performance
Eventually, I realized that using an EC2 instance with more computing power and uninterrupted runtime could significantly speed up the process. I set up an EC2 instance, and the difference was remarkable. The performance was drastically better compared to my local machine. I was able to run 400 workers in parallel, allowing me to populate 30 million fake opportunities in nearly an hour instead of the 30 hours it would have taken on my local machine.
Setting Up the Tests
With the fake data successfully populated in the database, I proceeded to write Python scripts for the three tests I planned to run. The first script simulated simple inserts, gets, and updates by opportunity ID, mimicking the existing workload in DynamoDB. The second script tested more complex operations, such as filtering and aggregation, to support use cases like Project 1, which involved using opportunity filters to find matching opportunities. The third test aimed to simulate deadlock scenarios, simply to observe how the system handled them.
Running the Tests
I ran each test with several readers and writers operating in parallel, as well as with dozens of readers and writers in high-concurrency scenarios. This allowed me to compare how the system performed under low and high concurrency conditions. To analyze the results, I used AWS Performance Insights. This tool provided valuable data, including the load on the database CPU, write latency, read latency, read and write I/O operations per second, number of connections to the database, deadlocks, rolled-back transactions, cache rate, and much more.
Promising Results
The test results were extremely promising. In fact, they were even a bit too good to be true. Writes and read latency were consistently under a millisecond—faster than DynamoDB. The cache hit rate was 100%, and no deadlocks occurred. These results indicated that the PostgreSQL database was performing exceptionally well under the test conditions.
Addressing Concerns with Data Generation and Randomness
After sharing the initial performance results with my team, everyone was pleased, but one teammate raised a valid concern. He questioned the high cache hit rate and pointed out that the fake data I was generating might not be random enough, leading to a limited number of unique opportunities. I reviewed my approach and discovered that the Python library I was using, Faker, generated fake data by selecting values from a limited set—like names from a list of only 100 options.
To improve the randomness of my data, I decided to generate it manually using combinations of strings concatenated with random numbers. I used my teammate’s knowledge of our actual dataset, such as the number of unique brand codes, to guide the random number range and better align the fake data with the characteristics of the real data. This update helped ensure that the data more closely resembled the variety we would expect in production.
Solving the Random Seed Issue for Parallel Workers
While making this change, I realized another issue with my parallel workers. Previously, all workers were using the same random seed, which resulted in them generating identical "random" data. I switched to Python’s random library, which made it more apparent that I needed to assign a unique random seed to each worker. This ensured that each worker would generate different data, preventing any overlap and increasing the variability of the generated opportunities.
Deadlock Testing: Adjusting Worker Behavior
Next, I revisited my deadlock test, where parallel workers were updating and reading from two opportunities. Initially, all workers were updating opportunity ID A before ID B. This meant they were waiting to acquire locks in the same order, preventing deadlocks from occurring. To create a real deadlock scenario, I adjusted the worker behavior so that some workers acquired A’s lock first while others acquired B’s lock first. This change led to a true deadlock situation, where workers would hold one lock and wait for the other worker to release the second lock, causing a deadlock.
Once I implemented this change, deadlocks appeared instantly during the tests. I learned that when a deadlock occurs, PostgreSQL rolls back the affected transactions, forcing one worker to abort their operations. This was a key insight for improving the accuracy of my tests.
Re-evaluating Latency Measurements
The sub-millisecond latency I had seen in AWS Performance Insights seemed too good to be true, so I dug deeper into the latency metrics. Upon reviewing the information, I found that AWS was reporting the latency per I/O operation, not the total latency per query. An I/O operation is a single interaction with a page of memory on the disk, and each SQL query could involve hundreds or thousands of I/O operations. To get a more accurate picture of the latency, I decided to measure the query latency directly from the script itself, rather than relying on AWS Performance Insights. Although this measurement included network delay, it provided a more realistic view of the user experience, especially from the perspective of a frontend user interacting with the database.
Running the Updated Performance Tests
With the updates in place, I ran all three performance tests again. This time, the results showed a much more realistic latency range. For updates, reads, and inserts by ID, the operations took between half a second and a second and a half 99% of the time—much higher than the sub-millisecond latency I had originally seen. The complex reads and filtering queries performed similarly, which was a promising result. However, for aggregation queries, the latency could reach up to 8 seconds, which was unacceptable for our use case.
Optimizing Aggregate Queries
To address the issue with aggregate queries, I explored potential solutions. Despite experimenting with additional indexing and restructuring SQL queries, performance remained unsatisfactory. I then came across a recommendation to create an additional aggregate table that stored vendor companies and the total amount of money Amazon could make from negotiations with them. I implemented this by adding PostgreSQL trigger functions that updated the aggregate table every time an opportunity was inserted, updated, or deleted. This optimization resulted in sub-second response times for the main aggregate query we needed to support.
Final Thoughts on Deadlocks and Realistic Patterns
Finally, my team and I discussed the possibility of deadlocks in production. While we had successfully forced deadlocks during testing, we concluded that, in reality, our read and update patterns would not trigger deadlocks under normal circumstances. Therefore, while deadlocks are possible, they were unlikely to occur in our actual workload.
Finalizing the Schema and Moving Forward
After completing the performance testing, we found that the read and write latency in PostgreSQL was higher than DynamoDB’s, but still acceptable for our use case. With this in mind, my team approved the final schema design. This approval allowed me to move on to the next task: serializing and deserializing opportunities between the domain model used by our codebases and the data storage model in PostgreSQL.
Wrapping Up the Project
I tackled the serialization and deserialization task in my final two weeks of the internship. Despite the tight timeline, I managed to complete this portion just in time, which was especially satisfying since the last week also included preparing for my demo presentation and writing my self-review.
Handoff to the Team
The next steps for the database upgrade will be handled by my teammate, Shuli. Shuli will focus on writing a script to migrate data from DynamoDB into the new relational PostgreSQL database. This marks the end of my contributions to the project, and I’m proud of the progress we made during the internship.
Living in Capitol Hill, Seattle: Expectations vs Reality
When I first moved to Seattle, I assumed that Capitol Hill was a wealthy neighborhood. I felt lucky to find an apartment unit for $925. While $925 is still a good deal, I quickly learned that the area presented a more complex reality than I had imagined. Capitol Hill had its fair share of challenges, with a significant presence of drugs and unhoused people. This made for some unpleasant experiences, like the nights when I couldn’t sleep because of someone having a mental health crisis and yelling in the street, people fighting or arguing, or even singing at 4 a.m.
I later discovered that the wealthier residents of Capitol Hill lived in a different part of the neighborhood, about a 20-minute walk away from where I was. It was a stark contrast to the area I was in, to say the least.
Unexpected Positives in Seattle
Despite the drawbacks of living in Capitol Hill, there were also some unexpected positives. The rain wasn’t as much of an issue as I had feared. It mostly rained at night or wasn’t heavy enough during the day to soak me through. Wearing a rain jacket and boots, and bringing an umbrella when needed, made it manageable. Grocery prices also seemed more reasonable than in the Bay Area, which was a welcome surprise.
One of the highlights of my time in Seattle occurred when I went for a run around the lake. I stumbled across the workshop of a famous titanium bike builder—a name I knew well as a bike enthusiast and titanium bike owner. I had forgotten that he was based in Seattle. That day, I had the chance to meet Bill Davidson, the man behind Davidson Bicycles, and Christopher Wahl of Mischief Bicycles. I even got a tour of the factory, which turned out to be much smaller than I expected—just the two bike builders and one person working at the front in a space a quarter of the size of a traditional Trader Joe’s.
Right across the street was Brooks’ flagship running store, the main reason I ran around the lake that day to try on some new running shoes. Meeting these legends in the bike world and getting the factory tour turned out to be one of the most memorable experiences of my time in Seattle.
Final Thoughts on My Amazon Internship Experience
Amazon is an enormous company, and the main downside I felt was how small I seemed in comparison. Using the internal tool, PhoneTool, I could see the hierarchy above me, with countless managers in the chain. Before joining, I thought I might have a chance to run into Jeff Bezos or CEO Andy Jassy, but once I arrived, I realized the scale of Amazon. With 50+ office buildings in Seattle alone and hundreds of thousands of corporate employees, my manager, who had been with the company for 14 years, had never met Jeff Bezos.
Despite the company's size, my team created a welcoming and supportive environment with a strong culture. My first mentor, Sacchi, was extremely kind and helpful during code reviews and design discussions. One of the highlights of my day was walking to the café with him for our daily beverage and having great conversations. Sacchi had to leave for his wedding in India towards the end of my internship, but his mentorship made my time at Amazon much more positive.
My second mentor, Vishesh, was knowledgeable and pushed me to conduct a second round of performance tests, uncovering flaws in the first round. He, too, was supportive and insightful. Everyone on my team, including my manager, was friendly, smart, and balanced—personally and professionally.
Amazon's Leadership Principles
A distinctive feature of Amazon is its leadership principles, which all employees are expected to embody. These principles play a key role in performance reviews and technical project decisions. For example, principles like "Customer Obsession" and "Bias for Action" guided my decision to reuse an existing Coral service in my first project. This allowed us to move faster and receive quicker feedback, even though it wasn’t the long-term solution we desired. The leadership principles gave me something to strive for and improve on, and they functioned very well overall.
AWS Services and Project Ownership
I’m grateful for the knowledge I gained working with various AWS services, including DynamoDB, EC2 instances, SQS, Lambda, Aurora RDS, and PostgreSQL. These experiences provided the hands-on learning I hoped for and will be useful in future roles and personal projects.
Moreover, I appreciated the opportunity to take ownership of the database upgrade project. I had to make critical decisions based on my research and performance tests, and that experience will give me the confidence to take on larger, more technically complex projects in the future.
The Return to Office Policy
At the time of writing, Amazon had announced a 5-day return-to-office policy, which some employees were frustrated with due to personal commitments like childcare or long commutes. However, for interns and new employees like myself, this policy was beneficial. The "swivel chair" experience—being able to turn in my chair and chat with my teammates—was invaluable for both my productivity and enjoyment. Having spent the previous summer in a fully remote internship, I found being in the office far more productive. I didn’t have to wait for Slack messages or feel guilty about sending too many, and I enjoyed spending time with my teammates rather than working alone every day. I understand, though, if seasoned professionals prefer more flexibility to spend time with family.
Conclusion
Amazon offers an excellent internship program and a strong environment for software developers. The company’s scale and experience provide well-established systems, such as the internal Sage tool for quickly debugging issues, ensuring a smooth workflow with few blockers. I believe you’ll experience steady career growth, skill development, and a good work-life balance at Amazon. Overall, my internship was a positive experience, and I’m grateful for the opportunity.