Conclusion
As organizations are shifting from manual testing to automation, testing solutions are becoming more agile and scalable nowadays more than ever. However, running multiple tests in parallel remains one of the most problematic areas in automation testing. Most businesses are still following sequential testing methods for quality assurance, leading to more time, effort, and resources.
Parallel testing is a convenient and faster way to deliver quality applications without compromising their quality. Besides that, it also allows you to save money that you would be spending on developing and maintaining a full-scale test environment.
What is Parallel Decoding in Computer Architecture?
A scalar processor has to decode only a single instruction in each cycle as shown in the figure. In addition, a pipelined processor has to check for dependencies to decide whether this instruction can be issued or not. In comparison, a superscalar processor has to perform a much more complex task.
As shown in the figure, it has to decode multiple instructions, say four, in a single clock cycle. It also needs to check for dependencies from two perspectives: First, whether the instructions to be issued are dependent on the instructions currently in execution. Second, whether there are dependencies among the instructions which are candidates for the next issues.
Since a superscalar processor has more EUs than a scalar one, the number of execution instructions is far higher than in the scalar case. This means that more comparisons have to be performed in the course of dependency checks. The decode-issue path of the superscalar processor is a much more critical issue in achieving a high clock frequency than that of scalar processors.
Superscalar processors tend to use two, even three, or more pipeline cycles for decoding and issuing instructions. For instance, the PowerPC 601, PowerPC 604, and UltraSparc need 2 cycles, while the α 21064 requires 3 cycles and the PentiumPro even needs 4.5 cycles. One way to cope with this problem is through pre-decoding.
Predecoding shifts a part of the decoding task up into the loading phase of the on-chip instruction cache (I-cache) as shown in the figure.
Here, while the I-cache is being loaded, a dedicated unit is called the predecoded unit. It executes partial decoding and joins several decode bits to each instruction. In the case of RISC processors, for example, 4-7 bits are usually attached which indicate −
the instruction class
the type of resources that are required for the execution and
in some processors even the fact that branch target address has already been calculated during pre-decoding like in the Hal PM1 or the UltraSparc.
The number of the pre-decoded unit are shown in the table.
Do'stlaringiz bilan baham: |