a year of counting fish

the shape of real work

Aug 22, 2024

When I was a freshman I joined a plasma physics lab. Plasma physics is beautiful stuff, magnetic flux tubes and reconnection and turbulence, which I came to appreciate over the two years that I did this research. But I first became interested because if you figure out those pretty things, then you can write down equations to plug into a simulation, and that simulation can give you a better idea of how plasma behaves in a fusion reactor. Then, of course, you build the perfect fusion reactor and you save the world.

That isn’t how it turned out, though. I started the project—and college—thinking I was a practical, applications-driven person, and that there were just a few big problems in the world, and I would pick one and work on it. Two years later I felt pulled apart in every direction. If I cared about applicability so much, why were my favorite classes invariably math classes where I could forget about the real world and sink into pure abstraction, where what I cared about most was finding a beautiful representation of some deep truth? If I cared about theoretical work instead, why wasn’t I any good at it, why did I feel a resistance to hunkering down and getting good at it—like I was holding out for a better ambition—and why did I feel this itch to build something real?

I thought that with my plasma research I had landed at an intersection of “pure” work and “real” work, but it turned out not to feel like either. To get any good at theoretical plasma physics you have to know a lot of things that I didn’t, or else be willing to put in time that I didn’t. And we’re a long way off from fusion power, even farther from commercial-scale fusion. The remaining challenges mostly lie in engineering, not esoteric plasma physics phenomena.

The more I explored, the more the very notion of “real work” became confused. Twiddling with dimensions in SolidWorks for engineering teams felt even bleaker than reading opaque plasma physics papers. One policy class and then another showed me that I liked learning about policy, but politics felt depressing and unproductive.

In my mind theory and application existed on two sides of an impenetrable divide, what had once been a hairline crack before college but which now yawned wider by the day. In every field I turned to, the same question reared its head: how do scientific discoveries leave journals and enter the real world? How should you think about the real world in your research? How does anyone ever get anything done? When I asked the adults I knew who were scientists and engineers and policymakers, they all said that’s a really good question! and not one of them answered it. Would I not find, at MIT, a single anecdote from a single person who felt deep down that they could do not only good work, but real work for the world?1

It’s been a bit over a year since that spring. Now, I have an anecdote.

I started counting fish totally by accident. I had been accepted to this climate and sustainability research program. You could continue a previous sustainability-related project, or they would provide you with a list of options to pick from. I filled out my application making it clear that I wanted to do a nuclear engineering project. Fusion may not have felt worth working on anymore, but there’s a lot more to nuclear science, like materials or fission energy.

When they sent me the list of projects to choose from, I scrolled through the entire twenty-page document and saw that none were even remotely physics- or nuclear science-related. But amidst the disappointment there was this salmon thing. For some reason I kept coming back to this project description and reading it with glee, even though I told myself that I definitely did not have the prerequisite machine learning knowledge and that logically it was better to drop out of the program and find a solid physics project. Then I’d at least have something relevant on my resume to apply to grad schools or jobs with. But a small part of myself, something left over from childhood, still had ambitions towards marine biology and oceanography and ecology and was utterly enchanted by the idea of working on salmon.

That August stood on the final cusp of the era when the best you could coax from DALL-E was an oversaturated bundle of shapes, right when it started seeming like ML was something I needed to learn about. That’s how I justified my decision: working with ML models would be useful in my physics career too.

Here’s the idea of the project. Salmon spawn in rivers and swim out to the ocean to spend their adult lives before returning to their home rivers to spawn the next generation of salmon. This is really important to the ecosystems surrounding their home rivers: spawning salmon provide food, and their bodies fertilize the rivers when they die. But their populations have waned with habitat loss and warming waters.

To measure the efficacy of any conservation measure, we need an accurate method for measuring population. Currently people put sonar cameras in rivers and count the number of salmon swimming upstream or downstream when they return to spawn. This is very time- and labor-intensive, and it costs a lot to hire the technicians who do this counting, meaning that only a few rivers employ this sonar technology. Six years ago, some scientists including my PI asked, what if we apply computer vision to this problem? That’s the project I joined.

Though I work in a computer vision lab, most of what we talk about in group meetings is how to benchmark data, improve annotation experience, construct good metrics, and much less of the theoretical ML I’d thought would make up the project backbone. In fact, most of the work looks like this: given a problem, what data do you need to solve it, and how do you construct that dataset? How do you account for—a big problem in ecology—long-tailed distributions, where many classes of interest are sighted rarely, where 40% of your dataset is gazelles and 40% is zebras and the other 20% are a gazillion other species you want to identify correctly? How do you take a model which works well on data from one domain and develop some way of preprocessing or fine-tuning or unsupervised learning to help generalize the model to a new domain? Given an existing model, what kinds of questions can you ask and what strategies can you use to probe exactly what it is and isn’t understanding?

In group meetings for this salmon project we get even more concrete. A dam is being removed on the Klamath river this fall and they want to use our model to help count fish populations, before and after. Each Zoom call is a frenzy of discussion: we could give them our model as a web tool, but cloud GPU time is expensive; we could send it to them packaged in a Jetson (a little Linux box with NVIDIA GPUs), but it would be difficult to debug if anything went wrong during, say, a software update; we could buy them iPads and have them download an app, but then we’re getting into ireallyOS app development. “Maybe Apple would sponsor us,” one professor muses. “They could have this slogan, ‘Apple saves the fish!’”

They’re not being facetious; iOS app development isn’t outside of the realm of possibility! My grad students are building a raytracer to generate synthetic fish data and modifying this open-source annotation software. They build web apps to display multimodal data in sync, to query our database and collect clips more easily. I thought I might feel out of place as a physics and math student who, at the start of the project, had only used programming for plasma physics research before and barely knew what a class was—but everyone in the lab was picking up skills on the fly to cobble together applications, and I was no exception. (I’ve now built three web apps in my time here.)

At the end of June, I flew to Seattle for a conference our group had organized. The idea was to aggregate all the people who were involved in this salmon counting project—my mentors at MIT, the other half of the group at Caltech, and for the first time, the people actually sitting watching the sonar video feed while fish swam by. In the conference hall, we set up our model on a desktop and tested it on people’s hard drives of sonar clips. One guy had sorted his clips into Easy, Medium, and Hard. Our model failed miserably on the Hard ones, somehow predicting over a thousand fish swimming by because the water was so turbid that the same twenty fish kept flashing in and out. My head felt fuzzy—my flight had been delayed by 12 hours overnight, and in the end I got just three hours of sleep—so I drifted out of conversations to refill my paper cup with coffee, conversations with an eclectic mix of ecologists and computer scientists, this short girl with dark hair and sure eyes who simulated fish populations in rivers, this tall skinny guy who’d pulled our code from GitHub and run it beforehand and was waving around a see-through red water bottle with an I Love Fish! sticker on it, this stocky man in a baseball cap who complained about how the NOAA wouldn’t fund the digitization of the decades of salmon counts they had yellowing in drawers.

The next morning we drove to the Skagit River and stooped over inside an RV to watch sonar clips of fish (and a beaver!) swimming by while technicians explained how they found these clips in the video feed. Some guys came laughing into the RV and showed us iPhone videos of fish corpses lined up along the bank during spawning season, whose insides were so battered and rotted away from the strenuous process of swimming up the river that, while cataloguing them, the guys would stick knives into the bursting red salmon sides and watch pink liquid stream out—“we call those ones yogurt bags,” one of them giggled while we all made retching noises.

I could go on and on about how beautiful and green everything was and how the whole place felt alive and how everyone cared about it being alive. The more people I met in Seattle who were deeply invested in the wellbeing of the river and ocean ecosystems, the less I cared for purity. Did it matter if my work was not pure, a calculation confined to a scientific journal waiting for an engineer to dig it up in fifty years to improve the efficiency of a fusion plant by a fraction of a percent—or even, improbably, a discovery that would help advance human knowledge for centuries to come?

You know, there is care and beauty and impact in that kind of work, too. As a freshman wholly unused to the kinds of problems I’d encounter in the real world, my mistake had been to conflate “real” work with physical, engineering-type work; then, as a sophomore affected by the intellectual prestige climb, I conflated “real” work with whichever type of work felt hardest and most technical. But you can feel dissatisfied whether you’re building a spaceship or reading math papers. What matters is finding something to bridge the crevasse you can’t help but see. Just like other people can’t help but want to code or make music or, I guess, think about topology, there’s this gaping chasm between theory and application that I know I would be sneaking desperate glances at forever if I don’t put my soul into bridging it. In Seattle I peered into the chasm once more and, for the first time, caught a glimpse of the bottom. Learning from different people, collaborating with my mentors, driving a project from start to finish to the hands of people who would use its results—to me that’s the real work: real not as in physical or technical or even legitimate, but as in the work that makes my life feel richer for each moment I spend on it.

It felt fitting that my first-ever visit to Seattle was in the context of this salmon project. Wherever we walked, salmon iconography appeared in sculpted totem poles and colorful shop faces. By the time I left my heart felt very full. My mentor and another PhD student waved us goodbye as I stepped into the bus back to the airport with another member of my group who was flying out that night. “I’m glad you’ll be working here full-time this summer,” he said, sitting next to me. The air outside filled with orange as the sun sank below the skyline. “There is so much for us to do.”

to be clear, plenty of people at MIT were and are doing really cool work. but at the time I was so confused and desperate that it didn’t feel like it; and for sure as an underclassman the amount of people I knew who were vocally doubtful that they could find something fulfilling and/or impactful way outnumbered the amount of people I knew who were vocal about believing in their work.

a year of counting fish

the shape of real work

Discussion about this post