See bottom of page to download .doc file
Dec,2012
1)a) List and explain four important technologies which have led to the improvements in computer system.
b) Give a brief explanation about trends in power in integrated circuits and cost.
2) a) Explain the pipeline hazards, in detail.
b) Show java loop is unrolled so that there are four copies of the loop body, assuming R1-R2(that is, the size of the array) is initially a multiple of 32, which means that the number of loop iterations is a multiple of 4. Eliminate any obvious redundant computations and do not reuse any of the registers.
3 a) What is dynamic prediction? Draw the state transition diagram for 2-bit prediction scheme?
b) What is the basic compiler technique for exposing ILP?
c) How to overcome the data hazards with dynamic scheduling?
4 a) How to exploit ILP, using multiple issues and dynamic scheduling?
b) What is the basic concept of VLIW approach?
5 a) Explain the symmetric shared memory architecture, in detail.
b) Explain in detail, the distributed shared memory and directory based coherence.
6 a) How to protect virtual memory and virtual machines?
b) Assume that the hit time of a two-way set-associative first-level data chache is 1.1 times faster than a four-way set-associative cache of the same size. The miss rate falls from 0.049 to 0.044 for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock. Assume the miss penalty is 10 clock cycles to the L2 cache for the two-way set-associative cache, and that the L2 cache does not miss. Which has the faster average memory access time?
c) Suppose you measure a new DDR3 DIMM to transfer at 16000 MB/sec. What do you think its name will be? What is the clock rate of that DIMM? What is your guess of the name of DRAMs used in that DIMM?
7 a) Describe eleven advanced optimizations for cache performance.
b) What is memory technology and optimization?
8 a) How to enhance the loop level parallelism?
b) What all are the hardware support for exposing parallelism?
______________________0000____________
June,2012
1 a) Define computer architecture. Illustrate the seven dimensions of an ISA.
b) Explain in brief measuring, reporting and summarizing performance of computer system.
c) Assume a disk subsystem with the following components and MTTF:
10 disks, each rated at 1000000- hour MTTF.
1 SCSI controller, 500,000- hour MTTF.
1 power supply, 200,000 – hour MTTF.
1 fan, 200,000 – hour MTTF.
1 SCSI cable, 1,000,000 – hour MTTF.
Using the simplifying assumptions that the lifetimes are exponentially distributed and that failures are independent, compute the MTTF of the system as a whole.
2 a) Explain how pipeline is implemented in MIPS.
b) Explain different techniques in reducing pipeline branch penalties.
c) What are the major hurdles in pipelining? Explain briefly.
d) Consider the unpipelined processor in RISC. Assume that it has a 1ns clock cycle and that it uses 4 cycles for ALU operations and branches and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose that due to clock skew and setup, pipelining the processor adds 0.2ns of overload to the clock. Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline?
3 a) What are the basic compiler techniques for exposing ILP? Explain briefly.
b) Explain Tomasulo’s algorithm, sketching the basic structure of a MIPS floating point unit.
c) Explain true data dependence, name dependence and control dependence with an example code fragment.
4 a) Explain exploiting ILP using dynamic scheduling, multiple issue and speculation.
b) Explain Pentium 4 pipeline supporting multiple issue with speculation.
c) Suppose we have a VLIW that could issue two memory references, two FP operations and one integer operation or branch in every clock cycle, show an unrolled version of the loop x(i)=x(i)+s, for such a processor. Unroll as many times as necessary to eliminate any stalls. Ignore delayed branches.
MIPS Code
Loop: L.D F0,0(R1);
ADD.D F4,F0,F2;
S.D F4,0(R1);
DADDUI R1,R1,#-8;
BNE R1,R2,Loop
5 a) Explain basic schemes for enforcing coherence.
b) Explain performance of symmetric shared memory multiprocessors.
c) Suppose we have an application running on a 32-processor multiprocessor, which has a 200ns time to handle references to a remote memory. For this application, assuming that all the references except those involving communication hit in the local memory hierarchy, which is slightly optimistic. Processors are stalled on a remote request, and the processor clock rate is 2Ghz. If the base CPI(assuming that all references hit in the cache) is 0.5, how much faster is the multiprocessor if there is no communication versus if 0.2% of the instructions involve a remote communication reference?
6 a) Explain the six basic cache optimization techniques.
b) Given the data below, what is the impact of second level cache associativity on its mass penalty?
· Hit time L2 for direct mapped=10 clock cycles
· Two way set associativity increases hit time by 0.1 clock cycles to 10.1 clock cycles.
· Local miss rate L2 for direct mapped = 25%
· Local miss rate L2 for two-way set associative= 20%
· Miss penalty L2= 200 clock cycles.
c) What are the techniques for fast address translation? Explain.
7 a) Explain any 3 advanced cache optimization techniques.
b) Explain memory technology and optimizations.
c) Assume that the hit time of a two-way set associative first level data cache is 1.1 times faster than a four-way set associative cache of the same size. The miss falls from 0.049 to 0.044 for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock. Assume that the miss penalty is 10 clock cycles to the L2 cache for the two-way set associative cache, and that the L2 cache does not miss. Which has the faster average memory access time?
8 a) Explain detecting and enhancing loop level parallelism for VLIW.
b) Explain Intel-IA 64 architecture with a neat diagram.
c) Explain hardware support for exposing parallelism for VLIW and EPIC.
------------0000-----------------
Dec,2011
1 a) Define the computer architecture. Explain the response time, throughput, elapsed time and processor clock.
b) Briefly explain the Amdhal’s law.
c) Two code sequences for a particular machine are considered by a compiler designer.
Instruction class CPI for this instruction class
A 1
B 2
C 3
The compiler designer considers 2 code sequences that require the following instruction counts for a particular high-level language statement
Code Sequence Instruction counts for instruction class
A B C
1 20 10 20
2 40 10 10
i) Which code sequence executes most of the instruction?
ii) What is the CPI for each sequence?
iii) Which will be faster?
2 a) What are the major hurdles in pipelining? Illustrate the data hazard, briefly.
b) With a neat block diagram, explain how an instruction can be executed in 4 or 5 clock cycles in MIPS data path, without the pipeline register.
3 a) List the steps to unroll the code and schedule.
b) Explain how Tomasulo’s algorithm can be extended to support speculation.
c) Explain the dynamic branch prediction state diagram.
4 a) Explain the basic VLIW approach. List its drawbacks.
b) With a neat diagram, explain the steps involved in handling an instruction, with a branch target buffer. Also evaluate how well it works.
5 a) Explain the different taxonomy of parallel architecture.
b) With a neat diagram, explain the basic structure of a centralized shared – memory and distributed – memory multiprocessor.
c) Explain the snooping, with respect to cache – coherence protocols.
6 a) Explain the six basic optimizations.
b) With a neat diagram, explain the hypothetical memory hierarchy.
7 a) Explain the DRAM technology. How do you improve memory performance inside a DRAM chip?
b) Explain the compiler optimizations to reduce miss rate.
8 a) Find all the true dependences, output dependences and anti-dependences and eliminate the output and antidependences by renaming, in the code given below:
for(i=1;i<=100;i=i+1)
y[i]=x[i]/c; /* s1 */
x[i]=x[i]+c; /* s2 */
z[i]=y[i]+c; /* s3 */
y[i]=c-y[i]; /* s4 */
}
b) Write short notes on:
i) The Itanium 2 processor
ii) IA-64 register model.
--------00000--------
June-July 2011
1 a) Explain with a learning curve, how the cost of processor varies with time along with factors influencing the cost.
b) Find the number of dies per 200cm wafer of circular shape that is used to cut die that is 1.5 cm side and compare the number of dies produced on the same wafer if the die is 1.25 cm.
c) Define Amdahls law. Derive an expression for CPU clock as a function of instruction count, clocks per instruction and clock cycle time.
2 a) What are the major hazards in a pipeline? Explain data hazard and methods to minimize data hazard with example.
b) Consider the following calculations: x= y + z ; a= b * c. Assume the calculations are done using registers. Show, using 5 stage pipeline, how many clock pulses are required for direct operations. By recording with stalls show how many clock pulses are required and saving in the number of clock pulses to solve data hazard.
3 a) What are data dependencies? Explain name dependences with example between two instructions.
b) What is correlating predictors? Explain with examples.
c) For the following instructions, using dynamic scheduling show the status of R.O.B, Reservation station when only MUL.D is ready to commit and two L.D committed.
L.D F6,32(r2)
L.D F2,44(R3)
MUL.D F0,F2,F4
SUB.D F8,F2,F6
DIV.D F10,F0,F6
ADD.D F6,F8,F2
Also show the type of hazards between instructions.
4 a) Explain the basic VLIW approach for exploiting ILP, using multiple issues.
b) What are the key issues in implementing advanced speculation techniques? Explain in detail.
c) Write a note on value predictors.
5 a) Explain the directory based cache coherence for a distributed memory multiprocessor system along with state transition diagram.
b) Explain any two hardware primitives to implement synchronization with example.
6 a) Explain block replacement strategies to replace a block, with example when a cache miss occurs.
b) Explain the types of basic cache optimization.
c) With a diagram, explain organization of data cache in the opteron microprocessor.
7 a) Explain the following advanced optimization of cache:
i) Compiler optimizations to reduce miss rate.
ii) Merging write buffer to reduce miss penalty.
iii) Non blocking caches to increase cache bandwidth.
b) Explain in detail the architecture support for protecting processor from each other via virtual machines.
c) Explain internal organization of 64 MB DRAM.
8 a) Explain in detail the hardware support for preserving exception behaviour during speculation.
b) Explain the architecture of IA64 intel processor and also the prediction and speculation support provided.
---------0000---------
Dec 2010
1 a) List and explain four important technologies, which have led to the improvements in computer system.
b) The given data presents the power consumption of several computer system components:
Component Product Performance Power
Processor Sun Niagara 8-core 1.2 GHz 72-79 W
DRAM Kingston 1GB 184-pin 3.7 W
Hard drive Diamond Max 7200 rpm 7.9 W read
4.0 W idle
i) Assuming the maximum load for each component, a power supply efficiency of 70%, what wattage must the server’s power supply deliver to a system with a Sun Niagara 8-core chip, 2GB 184-pin Kingston DRAM and 7200 rpm hard drives?
ii) How much power will the 7200 rpm disk drive consume, if it is idle roughly 40% of the time?
iii) Assume that for the same set of requests, a 5400 rpm disk will require twice as much time to read data as a 10800 rpm disk. What percentage of time would the 5400 rpm disk drive be idle to perform the same transaction as in part (II)?
c) We will run two applications on dual Pentium processor, but the resource requirements are not the same. The first application needs 80% of the resources, and the other only 20% of the resources.
i) Given that 40% of the first application is parallelizable, how much speed up will we achieve with that application, if run in isolation?
ii) Given that 99% of the second application is parallelizable, how much speed up will this application observe, if run in isolation?
iii) Given that 40% of the first application is parallelizable, how much overall system speedup would you observe, if we parallelized it?
2 a) List pipeline hazards. Explain any one in detail.
b) List and explain five different ways of classifying exception in a computer system.
c) An unpipelined machine has 10ns clock cycle and it uses four cycles for ALU operations and branches, five cycles for memory operations. Assume that relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose due to clock skew and setup, pipelining the machine adds 1ns overhead to the clock. Find the speed up from pipelining.
3 a) Show how the below loop would look on MIPS 5-stage pipeline, under the following situations. Find the number of cycles per iteration, for each case. Assume the latencies for integer and floating point operations, as given in the prescribed text book.
Loop: L.D F0,0(R1)
ADD.D F4,F0,F2
S.D F4,0(R1)
DADDUI R1,R1,#-8
BNE R1,R2, Loop
i) Without scheduling and without loop unrolling.
ii) With scheduling and without loop unrolling.
iii) With loop unrolling four times and without scheduling.
iv) With loop unrolling four times and with scheduling.
b) What is the drawback of 1-bit dynamic branch prediction method? Clearly state, how it is overcome in 2-bit prediction. Give the state transition diagram of 2-bit predictor.
4 a) Explain the salient features of VLIW processor.
b) Explain branch-target buffer.
c) Write a short note on value predictors.
5 a) What is multiprocessor cache coherence? List two approaches to cache coherence protocol.
Give the state diagram for write-invalidate write-back cache coherence protocol. Explain the
three states of a block.
b) List and explain any three hardware primitives to implement synchronization.
6 a) Assume we have a computer where CPI is 1.0 when all memory accesses hit in the cache.
The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 cycles and miss rate is 2%, how much faster would the computer be, if all instructions were cache hits?
b) Briefly explain four basic cache optimization methods.
7 a) List and explain three C’s model that sorts all cache misses.
b) Explain the optimization methods mentioned below:
i) Trace cache to reduce hit time
ii) Non-blocking cache to increase cache bandwidth
iii) Multi banked cache to increase cache bandwidth.
c) Briefly explain how memory protection is enforced via virtual memory.
8 a) Consider the loop below:
for (i=1; i<=100;i=i+1) {
A[i]=A[i]+B[i]; /* S1 */
B[i+1]=C[i]+D[i]; /* S2 */
}
What are the dependences between S1 and S2? Is this loop parallel? If not, show how to make it parallel.
b) Explain Intel IA-64 architecture.
-----------00000-------------
June 2010
1 a) Define computer architecture. Illustrate the seven dimensions of an ISA.
b) What is dependability? Explain two main measures of dependability.
c) Given the following measurements:
Frequency of FP operations= 25% Average CPI of FP operations=4.0
Average CPI of other instructions=1.33 Frequency of FPSQR=2%
CPI of FPSQR=20
Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare the two design alternatives using the processor performance equations.
2 a) With a neat diagram, explain the classic five-stage pipeline for a RISC processor.
b) What are the major hurdles of pipelining? Illustrate the branch hazards in detail.
3 a) What are the techniques used to reduce branch costs? Explain both static and dynamic branch prediction used for same.
b) With a neat diagram, give the basic structure of Tomasulo based MIPS FP unit and explain the various field of reservation stations.
4 a) Explain the basic VLIW approach for exploiting ILP, using multiple issues.
b) What re the key issues in implementing advanced speculation techniques? Explain them in detail.
5 a) Explain the basic schemes for enforcing coherence in a shared memory multiprocessor system.
b) Explain the directory based coherence for a distributed memory multiprocessor system.
6 a) Assume we have a computer where the clocks per instruction(CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores and these total 50% of the instructions. If the mass penalty is 25 clock cycles and the mass rate is 2%, how much faster would the computer be if all instructions were cache hits?
b) Explain in brief, the types of basic cache optimization.
7 a) Which are the major categories of advanced optimizations of cache performance? Explain any one in detail.
b) Explain in detail, the architecture support for protecting processes from each other via virtual memory.
8 a) Explain in detail, architecture support for preserving exception behavior during speculation.
b) Explain the prediction and speculation support provided in IA64.
___________________________________________________________________________________
ACA-Unitwise Question Bank(Download Doc)
Dec,2012
1)a) List and explain four important technologies which have led to the improvements in computer system.
b) Give a brief explanation about trends in power in integrated circuits and cost.
2) a) Explain the pipeline hazards, in detail.
b) Show java loop is unrolled so that there are four copies of the loop body, assuming R1-R2(that is, the size of the array) is initially a multiple of 32, which means that the number of loop iterations is a multiple of 4. Eliminate any obvious redundant computations and do not reuse any of the registers.
3 a) What is dynamic prediction? Draw the state transition diagram for 2-bit prediction scheme?
b) What is the basic compiler technique for exposing ILP?
c) How to overcome the data hazards with dynamic scheduling?
4 a) How to exploit ILP, using multiple issues and dynamic scheduling?
b) What is the basic concept of VLIW approach?
5 a) Explain the symmetric shared memory architecture, in detail.
b) Explain in detail, the distributed shared memory and directory based coherence.
6 a) How to protect virtual memory and virtual machines?
b) Assume that the hit time of a two-way set-associative first-level data chache is 1.1 times faster than a four-way set-associative cache of the same size. The miss rate falls from 0.049 to 0.044 for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock. Assume the miss penalty is 10 clock cycles to the L2 cache for the two-way set-associative cache, and that the L2 cache does not miss. Which has the faster average memory access time?
c) Suppose you measure a new DDR3 DIMM to transfer at 16000 MB/sec. What do you think its name will be? What is the clock rate of that DIMM? What is your guess of the name of DRAMs used in that DIMM?
7 a) Describe eleven advanced optimizations for cache performance.
b) What is memory technology and optimization?
8 a) How to enhance the loop level parallelism?
b) What all are the hardware support for exposing parallelism?
______________________0000____________
June,2012
1 a) Define computer architecture. Illustrate the seven dimensions of an ISA.
b) Explain in brief measuring, reporting and summarizing performance of computer system.
c) Assume a disk subsystem with the following components and MTTF:
10 disks, each rated at 1000000- hour MTTF.
1 SCSI controller, 500,000- hour MTTF.
1 power supply, 200,000 – hour MTTF.
1 fan, 200,000 – hour MTTF.
1 SCSI cable, 1,000,000 – hour MTTF.
Using the simplifying assumptions that the lifetimes are exponentially distributed and that failures are independent, compute the MTTF of the system as a whole.
2 a) Explain how pipeline is implemented in MIPS.
b) Explain different techniques in reducing pipeline branch penalties.
c) What are the major hurdles in pipelining? Explain briefly.
d) Consider the unpipelined processor in RISC. Assume that it has a 1ns clock cycle and that it uses 4 cycles for ALU operations and branches and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose that due to clock skew and setup, pipelining the processor adds 0.2ns of overload to the clock. Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline?
3 a) What are the basic compiler techniques for exposing ILP? Explain briefly.
b) Explain Tomasulo’s algorithm, sketching the basic structure of a MIPS floating point unit.
c) Explain true data dependence, name dependence and control dependence with an example code fragment.
4 a) Explain exploiting ILP using dynamic scheduling, multiple issue and speculation.
b) Explain Pentium 4 pipeline supporting multiple issue with speculation.
c) Suppose we have a VLIW that could issue two memory references, two FP operations and one integer operation or branch in every clock cycle, show an unrolled version of the loop x(i)=x(i)+s, for such a processor. Unroll as many times as necessary to eliminate any stalls. Ignore delayed branches.
MIPS Code
Loop: L.D F0,0(R1);
ADD.D F4,F0,F2;
S.D F4,0(R1);
DADDUI R1,R1,#-8;
BNE R1,R2,Loop
5 a) Explain basic schemes for enforcing coherence.
b) Explain performance of symmetric shared memory multiprocessors.
c) Suppose we have an application running on a 32-processor multiprocessor, which has a 200ns time to handle references to a remote memory. For this application, assuming that all the references except those involving communication hit in the local memory hierarchy, which is slightly optimistic. Processors are stalled on a remote request, and the processor clock rate is 2Ghz. If the base CPI(assuming that all references hit in the cache) is 0.5, how much faster is the multiprocessor if there is no communication versus if 0.2% of the instructions involve a remote communication reference?
6 a) Explain the six basic cache optimization techniques.
b) Given the data below, what is the impact of second level cache associativity on its mass penalty?
· Hit time L2 for direct mapped=10 clock cycles
· Two way set associativity increases hit time by 0.1 clock cycles to 10.1 clock cycles.
· Local miss rate L2 for direct mapped = 25%
· Local miss rate L2 for two-way set associative= 20%
· Miss penalty L2= 200 clock cycles.
c) What are the techniques for fast address translation? Explain.
7 a) Explain any 3 advanced cache optimization techniques.
b) Explain memory technology and optimizations.
c) Assume that the hit time of a two-way set associative first level data cache is 1.1 times faster than a four-way set associative cache of the same size. The miss falls from 0.049 to 0.044 for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock. Assume that the miss penalty is 10 clock cycles to the L2 cache for the two-way set associative cache, and that the L2 cache does not miss. Which has the faster average memory access time?
8 a) Explain detecting and enhancing loop level parallelism for VLIW.
b) Explain Intel-IA 64 architecture with a neat diagram.
c) Explain hardware support for exposing parallelism for VLIW and EPIC.
------------0000-----------------
Dec,2011
1 a) Define the computer architecture. Explain the response time, throughput, elapsed time and processor clock.
b) Briefly explain the Amdhal’s law.
c) Two code sequences for a particular machine are considered by a compiler designer.
Instruction class CPI for this instruction class
A 1
B 2
C 3
The compiler designer considers 2 code sequences that require the following instruction counts for a particular high-level language statement
Code Sequence Instruction counts for instruction class
A B C
1 20 10 20
2 40 10 10
i) Which code sequence executes most of the instruction?
ii) What is the CPI for each sequence?
iii) Which will be faster?
2 a) What are the major hurdles in pipelining? Illustrate the data hazard, briefly.
b) With a neat block diagram, explain how an instruction can be executed in 4 or 5 clock cycles in MIPS data path, without the pipeline register.
3 a) List the steps to unroll the code and schedule.
b) Explain how Tomasulo’s algorithm can be extended to support speculation.
c) Explain the dynamic branch prediction state diagram.
4 a) Explain the basic VLIW approach. List its drawbacks.
b) With a neat diagram, explain the steps involved in handling an instruction, with a branch target buffer. Also evaluate how well it works.
5 a) Explain the different taxonomy of parallel architecture.
b) With a neat diagram, explain the basic structure of a centralized shared – memory and distributed – memory multiprocessor.
c) Explain the snooping, with respect to cache – coherence protocols.
6 a) Explain the six basic optimizations.
b) With a neat diagram, explain the hypothetical memory hierarchy.
7 a) Explain the DRAM technology. How do you improve memory performance inside a DRAM chip?
b) Explain the compiler optimizations to reduce miss rate.
8 a) Find all the true dependences, output dependences and anti-dependences and eliminate the output and antidependences by renaming, in the code given below:
for(i=1;i<=100;i=i+1)
y[i]=x[i]/c; /* s1 */
x[i]=x[i]+c; /* s2 */
z[i]=y[i]+c; /* s3 */
y[i]=c-y[i]; /* s4 */
}
b) Write short notes on:
i) The Itanium 2 processor
ii) IA-64 register model.
--------00000--------
June-July 2011
1 a) Explain with a learning curve, how the cost of processor varies with time along with factors influencing the cost.
b) Find the number of dies per 200cm wafer of circular shape that is used to cut die that is 1.5 cm side and compare the number of dies produced on the same wafer if the die is 1.25 cm.
c) Define Amdahls law. Derive an expression for CPU clock as a function of instruction count, clocks per instruction and clock cycle time.
2 a) What are the major hazards in a pipeline? Explain data hazard and methods to minimize data hazard with example.
b) Consider the following calculations: x= y + z ; a= b * c. Assume the calculations are done using registers. Show, using 5 stage pipeline, how many clock pulses are required for direct operations. By recording with stalls show how many clock pulses are required and saving in the number of clock pulses to solve data hazard.
3 a) What are data dependencies? Explain name dependences with example between two instructions.
b) What is correlating predictors? Explain with examples.
c) For the following instructions, using dynamic scheduling show the status of R.O.B, Reservation station when only MUL.D is ready to commit and two L.D committed.
L.D F6,32(r2)
L.D F2,44(R3)
MUL.D F0,F2,F4
SUB.D F8,F2,F6
DIV.D F10,F0,F6
ADD.D F6,F8,F2
Also show the type of hazards between instructions.
4 a) Explain the basic VLIW approach for exploiting ILP, using multiple issues.
b) What are the key issues in implementing advanced speculation techniques? Explain in detail.
c) Write a note on value predictors.
5 a) Explain the directory based cache coherence for a distributed memory multiprocessor system along with state transition diagram.
b) Explain any two hardware primitives to implement synchronization with example.
6 a) Explain block replacement strategies to replace a block, with example when a cache miss occurs.
b) Explain the types of basic cache optimization.
c) With a diagram, explain organization of data cache in the opteron microprocessor.
7 a) Explain the following advanced optimization of cache:
i) Compiler optimizations to reduce miss rate.
ii) Merging write buffer to reduce miss penalty.
iii) Non blocking caches to increase cache bandwidth.
b) Explain in detail the architecture support for protecting processor from each other via virtual machines.
c) Explain internal organization of 64 MB DRAM.
8 a) Explain in detail the hardware support for preserving exception behaviour during speculation.
b) Explain the architecture of IA64 intel processor and also the prediction and speculation support provided.
---------0000---------
Dec 2010
1 a) List and explain four important technologies, which have led to the improvements in computer system.
b) The given data presents the power consumption of several computer system components:
Component Product Performance Power
Processor Sun Niagara 8-core 1.2 GHz 72-79 W
DRAM Kingston 1GB 184-pin 3.7 W
Hard drive Diamond Max 7200 rpm 7.9 W read
4.0 W idle
i) Assuming the maximum load for each component, a power supply efficiency of 70%, what wattage must the server’s power supply deliver to a system with a Sun Niagara 8-core chip, 2GB 184-pin Kingston DRAM and 7200 rpm hard drives?
ii) How much power will the 7200 rpm disk drive consume, if it is idle roughly 40% of the time?
iii) Assume that for the same set of requests, a 5400 rpm disk will require twice as much time to read data as a 10800 rpm disk. What percentage of time would the 5400 rpm disk drive be idle to perform the same transaction as in part (II)?
c) We will run two applications on dual Pentium processor, but the resource requirements are not the same. The first application needs 80% of the resources, and the other only 20% of the resources.
i) Given that 40% of the first application is parallelizable, how much speed up will we achieve with that application, if run in isolation?
ii) Given that 99% of the second application is parallelizable, how much speed up will this application observe, if run in isolation?
iii) Given that 40% of the first application is parallelizable, how much overall system speedup would you observe, if we parallelized it?
2 a) List pipeline hazards. Explain any one in detail.
b) List and explain five different ways of classifying exception in a computer system.
c) An unpipelined machine has 10ns clock cycle and it uses four cycles for ALU operations and branches, five cycles for memory operations. Assume that relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose due to clock skew and setup, pipelining the machine adds 1ns overhead to the clock. Find the speed up from pipelining.
3 a) Show how the below loop would look on MIPS 5-stage pipeline, under the following situations. Find the number of cycles per iteration, for each case. Assume the latencies for integer and floating point operations, as given in the prescribed text book.
Loop: L.D F0,0(R1)
ADD.D F4,F0,F2
S.D F4,0(R1)
DADDUI R1,R1,#-8
BNE R1,R2, Loop
i) Without scheduling and without loop unrolling.
ii) With scheduling and without loop unrolling.
iii) With loop unrolling four times and without scheduling.
iv) With loop unrolling four times and with scheduling.
b) What is the drawback of 1-bit dynamic branch prediction method? Clearly state, how it is overcome in 2-bit prediction. Give the state transition diagram of 2-bit predictor.
4 a) Explain the salient features of VLIW processor.
b) Explain branch-target buffer.
c) Write a short note on value predictors.
5 a) What is multiprocessor cache coherence? List two approaches to cache coherence protocol.
Give the state diagram for write-invalidate write-back cache coherence protocol. Explain the
three states of a block.
b) List and explain any three hardware primitives to implement synchronization.
6 a) Assume we have a computer where CPI is 1.0 when all memory accesses hit in the cache.
The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 cycles and miss rate is 2%, how much faster would the computer be, if all instructions were cache hits?
b) Briefly explain four basic cache optimization methods.
7 a) List and explain three C’s model that sorts all cache misses.
b) Explain the optimization methods mentioned below:
i) Trace cache to reduce hit time
ii) Non-blocking cache to increase cache bandwidth
iii) Multi banked cache to increase cache bandwidth.
c) Briefly explain how memory protection is enforced via virtual memory.
8 a) Consider the loop below:
for (i=1; i<=100;i=i+1) {
A[i]=A[i]+B[i]; /* S1 */
B[i+1]=C[i]+D[i]; /* S2 */
}
What are the dependences between S1 and S2? Is this loop parallel? If not, show how to make it parallel.
b) Explain Intel IA-64 architecture.
-----------00000-------------
June 2010
1 a) Define computer architecture. Illustrate the seven dimensions of an ISA.
b) What is dependability? Explain two main measures of dependability.
c) Given the following measurements:
Frequency of FP operations= 25% Average CPI of FP operations=4.0
Average CPI of other instructions=1.33 Frequency of FPSQR=2%
CPI of FPSQR=20
Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare the two design alternatives using the processor performance equations.
2 a) With a neat diagram, explain the classic five-stage pipeline for a RISC processor.
b) What are the major hurdles of pipelining? Illustrate the branch hazards in detail.
3 a) What are the techniques used to reduce branch costs? Explain both static and dynamic branch prediction used for same.
b) With a neat diagram, give the basic structure of Tomasulo based MIPS FP unit and explain the various field of reservation stations.
4 a) Explain the basic VLIW approach for exploiting ILP, using multiple issues.
b) What re the key issues in implementing advanced speculation techniques? Explain them in detail.
5 a) Explain the basic schemes for enforcing coherence in a shared memory multiprocessor system.
b) Explain the directory based coherence for a distributed memory multiprocessor system.
6 a) Assume we have a computer where the clocks per instruction(CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores and these total 50% of the instructions. If the mass penalty is 25 clock cycles and the mass rate is 2%, how much faster would the computer be if all instructions were cache hits?
b) Explain in brief, the types of basic cache optimization.
7 a) Which are the major categories of advanced optimizations of cache performance? Explain any one in detail.
b) Explain in detail, the architecture support for protecting processes from each other via virtual memory.
8 a) Explain in detail, architecture support for preserving exception behavior during speculation.
b) Explain the prediction and speculation support provided in IA64.
___________________________________________________________________________________
ACA-Unitwise Question Bank(Download Doc)
No comments:
Post a Comment