throughout my PhD journey in Belgium, I learned very nuanced and yet very important trait in research, particularly in Computer Architecture. After reading lots of papers, talking to my advisor, discussing with friends in the lab and also new people in conferences, I realized that there is a quality which should be trained or should be brought up if it is there inside of the brain. That is called Critical Thinking. To be more specific, I'm bringing examples from my own field so that you will better understand my points.
When a researcher does research on a Computer Architecture topic (Let's say interconnection in the context of Multi-Chip GPU systems), they should keep one important thing in mind: How to relate architectural insights to system-level decisions. This is exactly what separates someone who reads architecture papers from someone who thinks architecturally. The goal of this article is to make this mindset more conscious so you can harness it deliberately in your future research and writing.
When reading any paper, usually you can ask these set of questions at first:
1- What is the problem this paper is pointing at ?
2- Why we have such a problem ?
3- Where is the root cause of the problem ?
4- When this problem happens ?
To map these questions into Computer Architecture context, we can ask:
1- What architectural weakness is being claimed ?
2- What does this weakness stem from ?
3- What micro-architectural mechanism is used to realize that ?
The next set of questions usually comes after this is:
1- Why we call it problem ? Does it hurt something ? Does it hurt IPC ? Power ? or what ?
This step is exactly the point where you bind architectural insight with System-level behavior. Parameters such as IPC, throughput, latency, power, area are the ones we (computer architects) like to optimize.
Now that you find the problem and the impact of it on the system-level parameters, it's time to propose your idea. Note that, you propose a new architectural change and you are responsible to answer the following questions:
1- What architectural claim is being made ? For example, in SMs, there is L1 cache as well as shared memory. I propose a unified memory which can be dynamically partitioned between L1 and shared memory depending on the demand of the workload.
2- What system-level behavior does this affect ? I expect to increase the overall IPC of the system.
3- What workload conditions are required for this to matter ? I'm seeking workloads which require shared data usage among CTAs of the same SM.
4- What are the unspoken trade-offs ? How this dynamic allocation mechanism affects the area of the chip. What about power ?
5- How does this proposal interact with adjacent layers ? Does it need the help of compiler (upper level) ? Or does it require particular consideration in terms of fabrication ?