Domain Knowledge: An Essential First Step to Data Analysis Projects.

Derya Gumustel
3 min readMar 13, 2021

--

By far, the coolest job I had in college was working as a part-time data analyst with a member of the University of Washington’s eScience Institute. As an undergraduate, I majored in oceanography and my degree required me to take courses that taught me how to use Python programming to analyze environmental data. My first Python class wasn’t great, but the code appealed to me so much that I signed up for a second class, and then a third, and even decided to attend a hack week! The classes were a good start, but the data analyst position I found at eScience offered me my first opportunity to apply my programming knowledge to look for oceanographic concepts in real-world data. Needless to say, I absolutely loved it and quickly thereafter decided to pursue a career in data science for environmental good.

Fast forward to today and I’m enrolled in a remote data science intensive program offered by General Assembly. Again, I’m thriving in an environment where I can apply my python skills to real-world data projects. However, instead of doing projects about the oceans, I’m now doing data analysis on topics that I have no background in. For instance, our first project involved analyzing statewide trends in SAT and ACT participation in the United States. I haven’t taken either of these exams, and I’ve been a student long enough to know that standardized tests are not fun, but I quickly found that I lacked something called “domain knowledge” — defined below:

“Domain knowledge is knowledge of a specific, specialized discipline or field, in contrast to general knowledge, or domain-independent knowledge.” https://en.wikipedia.org/wiki/Domain_knowledge

For this project, lacking domain knowledge ultimately meant that I didn’t have any questions prepared when I started my exploratory data analysis (EDA). Identifying the questions that you hope to answer is vital to the start of an analytical project. Unfortunately, I didn’t realize this until I had already spent three days creating data visualizations only to step back and ask myself, ‘Wait, what questions do these visualizations even answer?’

Honestly? That felt pretty bad.

So how do you gain a little domain knowledge and identify some good questions to give your EDA purpose? If your project is entirely your own and the goals and audience are up to you, then you might try reading about your study topic in the context of your existing bias or opinion. This might help you identify a personal attachment to the topic, which can fuel your questions and analysis and lead to some fun discoveries. Or, if your project was assigned to you with a particular question or if it’s for a predetermined audience, you can search up articles about those specifics.

In the example of my SAT project, I could have read about whether standardized testing should be kept or abolished, or found some research on the ways in which the SAT is unfair to certain demographics (aligning with my bias). Or, I could have read about the College Board that administers the SAT, looked into why some students might prefer the ACT over the SAT, and investigated how we could encourage more students to take the SAT (presenting to a predetermined audience, the College Board). Even reading some high-level opinion pieces can provide inspiration for a data analysis project, and maybe I would have found a great goal to pursue if I had looked through some of the interesting analyses that were already out there. Dry, objective articles like research papers or Wikipedia pages are great when you have a specific question in mind, but can be difficult to navigate otherwise. Personally, I find that I’m best able to stay motivated and on-target throughout a project by finding ways to get invested and engaged in the topic at hand.

In the end, I’m fortunate to have only spent a week on my SAT project and not a moment longer. It was much better to learn that lesson quickly instead of losing more time to it. My next project has already begun and my first steps are to read, read, read, and I’m sure my analysis will go just a little bit smoother this time around. I hope yours will, too!

--

--

Derya Gumustel

Oceanographer turned data scientist, doing my part in science communication.