Wednesday, October 30, 2013

Advanced Computer Architecture- IIIrd Internal QB



Advanced Computer Architecture- IIIrd Internal QB
Unit-3

1) What are the basic compiler techniques for exposing ILP? Explain briefly.

2) List the steps to unroll the code and schedule.

3) What are the techniques used to reduce branch costs? Explain both static and dynamic branch prediction used for same.

4) Explain the dynamic branch prediction state diagram.
4) What is the drawback of 1-bit dynamic branch prediction method? Clearly state, how it is overcome in 2-bit prediction. Give the state transition diagram of 2-bit predictor.
4) What is dynamic prediction? Draw the state transition diagram for 2-bit prediction scheme?

5) What is correlating predictors? Explain with examples.
  
Unit-6

1) How to protect virtual memory and virtual machines?

2) Explain the six basic cache optimization techniques.
2) Explain the six basic optimizations.

2) Explain the types of basic cache optimization.
2) Explain in brief, the types of basic cache optimization.

2) Briefly explain four basic cache optimization methods.

3) What are the techniques for fast address translation? Explain.

4) With a neat diagram, explain the hypothetical memory hierarchy.

5) Explain block replacement strategies to replace a block, with example when a cache miss occurs.

6) With a diagram, explain organization of data cache in the opteron microprocessor.

7) Assume we have a computer where CPI is 1.0 when all memory accesses hit in the cache.
 The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 cycles and miss rate is 2%, how much faster would the computer be, if all instructions were cache hits?
7) Assume we have a computer where the clocks per instruction (CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores and these total 50% of the instructions. If the mass penalty is 25 clock cycles and the mass rate is 2%, how much faster would the computer be if all instructions were cache hits?

8) Assume that the hit time of a two-way set-associative first-level data chache is 1.1 times faster than a four-way set-associative cache of the same size. The miss rate falls from 0.049 to 0.044 for an8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock.  
 Assume the miss penalty is 10 clock cycles to the L2 cache for the two-way set-associative cache and that the L2 cache does not miss. Which has the faster average memory access time?

9) Suppose you measure a new DDR3 DIMM to transfer at 16000 MB/sec. What do you think its name will be? What is the clock rate of that DIMM? What is your guess of the name of DRAMs used in that DIMM?

10) Given the data below, what is the impact of second level cache associativity on its mass penalty?
•             Hit time L2 for direct mapped=10 clock cycles
•             Two way set associativity increases hit time by 0.1 clock cycles to 10.1 
               clock cycles.
•             Local miss rate L2 for direct mapped = 25%
•             Local miss rate L2 for two-way set associative= 20%
•             Miss penalty L2= 200 clock cycles.

Monday, October 7, 2013

Question Bank for Second Internal- ACA



Question Bank for Second Internals-Advanced Computer Architecture 
On suggestions from students i am reducing two more questions so that there will be less questions to read. I am striking out two more questions. Please see which ones.
Unit-3
1) Explain Tomasulo’s algorithm, sketching the basic structure of a MIPS floating point unit.
2) Explain how Tomasulo’s algorithm can be extended to support speculation.
3) With a neat diagram, give the basic structure of Tomasulo based MIPS FP unit and explain the various fields of reservation stations.

4) For the following instructions, using dynamic scheduling show the status of R.O.B, Reservation station when only MUL.D is ready to commit and two L.D committed.
L.D  F6,32(r2)
L.D  F2,44(R3)
MUL.D  F0,F2,F4
SUB.D F8,F2,F6
DIV.D F10,F0,F6
ADD.D F6,F8,F2
Also show the type of hazards between instructions.

Unit-5
1) Explain the directory based coherence for a distributed memory multiprocessor system.
1) Explain the directory based cache coherence for a distributed memory multiprocessor system along with state transition diagram.
1) Explain in detail, the distributed shared memory and directory based coherence.

2) Explain any two hardware primitives to implement synchronization with example.
2) List and explain any three hardware primitives to implement synchronization.

3) Explain the symmetric shared memory architecture, in detail.

4) Explain performance of symmetric shared memory multiprocessors.

5) Explain the different taxonomy of parallel architecture.(flyns classification)

6) Explain basic schemes for enforcing coherence.
6) Explain the basic schemes for enforcing coherence in a shared memory multiprocessor system.
6) What is multiprocessor cache coherence? List two approaches to cache coherence protocol.
        Give the state diagram for write-invalidate write-back cache coherence protocol. Explain the   
        three states of a block.

7) Suppose we have an application running on a 32-processor multiprocessor, which has a 200ns time to handle references to a remote memory. For this application, assuming that all the references except those involving communication hit in the local memory hierarchy, which is slightly optimistic. Processors are stalled on a remote request, and the processor clock rate is 2Ghz. If the base CPI(assuming that all references hit in the cache) is 0.5, how much faster is the multiprocessor if there is no communication versus if 0.2% of the instructions involve a remote communication reference?
8) With a neat diagram, explain the basic structure of a centralized shared – memory and distributed – memory multiprocessor.
9) Explain the snooping, with respect to cache – coherence protocols.

Unit-6
1) How to protect virtual memory and virtual machines?

2) Assume that the hit time of a two-way set-associative first-level data chache is 1.1 times faster than a four-way set-associative cache of the same size. The miss rate falls from 0.049 to 0.044 for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock. Assume the miss penalty is 10 clock cycles to the L2 cache for the two-way set-associative cache, and that the L2 cache does not miss. Which has the faster average memory access time?

3) Suppose you measure a new DDR3 DIMM to transfer at 16000 MB/sec. What do you think its name will be? What is the clock rate of that DIMM? What is your guess of the name of DRAMs used in that DIMM?

4) Explain the six basic cache optimization techniques.
4) Explain the six basic optimizations.
4) Briefly explain four basic cache optimization methods.
4) Explain in brief, the types of basic cache optimization.
4) Explain the types of basic cache optimization.

5) Given the data below, what is the impact of second level cache associativity on its mass penalty?
             Hit time L2 for direct mapped=10 clock cycles
             Two way set associativity increases hit time by 0.1 clock cycles to 10.1 clock cycles.
             Local miss rate L2 for direct mapped = 25%
             Local miss rate L2 for two-way set associative= 20%
             Miss penalty L2= 200 clock cycles.

6) What are the techniques for fast address translation? Explain.

7) With a neat diagram, explain the hypothetical memory hierarchy.

8) Explain block replacement strategies to replace a block, with example when a cache miss occurs.

9) With a diagram, explain organization of data cache in the opteron microprocessor.

10) Assume we have a computer where CPI is 1.0 when all memory accesses hit in the cache.
       The only data accesses are loads and stores, and these total  50% of the instructions. If the miss penalty is 25 cycles and miss rate is 2%, how much faster would the computer be, if all instructions were cache hits?
10) Assume we have a computer where the clocks per instruction(CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores and these total  50% of the instructions. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the computer be if all instructions were cache hits?