Wednesday, October 30, 2013

Advanced Computer Architecture- IIIrd Internal QB



Advanced Computer Architecture- IIIrd Internal QB
Unit-3

1) What are the basic compiler techniques for exposing ILP? Explain briefly.

2) List the steps to unroll the code and schedule.

3) What are the techniques used to reduce branch costs? Explain both static and dynamic branch prediction used for same.

4) Explain the dynamic branch prediction state diagram.
4) What is the drawback of 1-bit dynamic branch prediction method? Clearly state, how it is overcome in 2-bit prediction. Give the state transition diagram of 2-bit predictor.
4) What is dynamic prediction? Draw the state transition diagram for 2-bit prediction scheme?

5) What is correlating predictors? Explain with examples.
  
Unit-6

1) How to protect virtual memory and virtual machines?

2) Explain the six basic cache optimization techniques.
2) Explain the six basic optimizations.

2) Explain the types of basic cache optimization.
2) Explain in brief, the types of basic cache optimization.

2) Briefly explain four basic cache optimization methods.

3) What are the techniques for fast address translation? Explain.

4) With a neat diagram, explain the hypothetical memory hierarchy.

5) Explain block replacement strategies to replace a block, with example when a cache miss occurs.

6) With a diagram, explain organization of data cache in the opteron microprocessor.

7) Assume we have a computer where CPI is 1.0 when all memory accesses hit in the cache.
 The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 cycles and miss rate is 2%, how much faster would the computer be, if all instructions were cache hits?
7) Assume we have a computer where the clocks per instruction (CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores and these total 50% of the instructions. If the mass penalty is 25 clock cycles and the mass rate is 2%, how much faster would the computer be if all instructions were cache hits?

8) Assume that the hit time of a two-way set-associative first-level data chache is 1.1 times faster than a four-way set-associative cache of the same size. The miss rate falls from 0.049 to 0.044 for an8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock.  
 Assume the miss penalty is 10 clock cycles to the L2 cache for the two-way set-associative cache and that the L2 cache does not miss. Which has the faster average memory access time?

9) Suppose you measure a new DDR3 DIMM to transfer at 16000 MB/sec. What do you think its name will be? What is the clock rate of that DIMM? What is your guess of the name of DRAMs used in that DIMM?

10) Given the data below, what is the impact of second level cache associativity on its mass penalty?
•             Hit time L2 for direct mapped=10 clock cycles
•             Two way set associativity increases hit time by 0.1 clock cycles to 10.1 
               clock cycles.
•             Local miss rate L2 for direct mapped = 25%
•             Local miss rate L2 for two-way set associative= 20%
•             Miss penalty L2= 200 clock cycles.

Monday, October 7, 2013

Question Bank for Second Internal- ACA



Question Bank for Second Internals-Advanced Computer Architecture 
On suggestions from students i am reducing two more questions so that there will be less questions to read. I am striking out two more questions. Please see which ones.
Unit-3
1) Explain Tomasulo’s algorithm, sketching the basic structure of a MIPS floating point unit.
2) Explain how Tomasulo’s algorithm can be extended to support speculation.
3) With a neat diagram, give the basic structure of Tomasulo based MIPS FP unit and explain the various fields of reservation stations.

4) For the following instructions, using dynamic scheduling show the status of R.O.B, Reservation station when only MUL.D is ready to commit and two L.D committed.
L.D  F6,32(r2)
L.D  F2,44(R3)
MUL.D  F0,F2,F4
SUB.D F8,F2,F6
DIV.D F10,F0,F6
ADD.D F6,F8,F2
Also show the type of hazards between instructions.

Unit-5
1) Explain the directory based coherence for a distributed memory multiprocessor system.
1) Explain the directory based cache coherence for a distributed memory multiprocessor system along with state transition diagram.
1) Explain in detail, the distributed shared memory and directory based coherence.

2) Explain any two hardware primitives to implement synchronization with example.
2) List and explain any three hardware primitives to implement synchronization.

3) Explain the symmetric shared memory architecture, in detail.

4) Explain performance of symmetric shared memory multiprocessors.

5) Explain the different taxonomy of parallel architecture.(flyns classification)

6) Explain basic schemes for enforcing coherence.
6) Explain the basic schemes for enforcing coherence in a shared memory multiprocessor system.
6) What is multiprocessor cache coherence? List two approaches to cache coherence protocol.
        Give the state diagram for write-invalidate write-back cache coherence protocol. Explain the   
        three states of a block.

7) Suppose we have an application running on a 32-processor multiprocessor, which has a 200ns time to handle references to a remote memory. For this application, assuming that all the references except those involving communication hit in the local memory hierarchy, which is slightly optimistic. Processors are stalled on a remote request, and the processor clock rate is 2Ghz. If the base CPI(assuming that all references hit in the cache) is 0.5, how much faster is the multiprocessor if there is no communication versus if 0.2% of the instructions involve a remote communication reference?
8) With a neat diagram, explain the basic structure of a centralized shared – memory and distributed – memory multiprocessor.
9) Explain the snooping, with respect to cache – coherence protocols.

Unit-6
1) How to protect virtual memory and virtual machines?

2) Assume that the hit time of a two-way set-associative first-level data chache is 1.1 times faster than a four-way set-associative cache of the same size. The miss rate falls from 0.049 to 0.044 for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock. Assume the miss penalty is 10 clock cycles to the L2 cache for the two-way set-associative cache, and that the L2 cache does not miss. Which has the faster average memory access time?

3) Suppose you measure a new DDR3 DIMM to transfer at 16000 MB/sec. What do you think its name will be? What is the clock rate of that DIMM? What is your guess of the name of DRAMs used in that DIMM?

4) Explain the six basic cache optimization techniques.
4) Explain the six basic optimizations.
4) Briefly explain four basic cache optimization methods.
4) Explain in brief, the types of basic cache optimization.
4) Explain the types of basic cache optimization.

5) Given the data below, what is the impact of second level cache associativity on its mass penalty?
             Hit time L2 for direct mapped=10 clock cycles
             Two way set associativity increases hit time by 0.1 clock cycles to 10.1 clock cycles.
             Local miss rate L2 for direct mapped = 25%
             Local miss rate L2 for two-way set associative= 20%
             Miss penalty L2= 200 clock cycles.

6) What are the techniques for fast address translation? Explain.

7) With a neat diagram, explain the hypothetical memory hierarchy.

8) Explain block replacement strategies to replace a block, with example when a cache miss occurs.

9) With a diagram, explain organization of data cache in the opteron microprocessor.

10) Assume we have a computer where CPI is 1.0 when all memory accesses hit in the cache.
       The only data accesses are loads and stores, and these total  50% of the instructions. If the miss penalty is 25 cycles and miss rate is 2%, how much faster would the computer be, if all instructions were cache hits?
10) Assume we have a computer where the clocks per instruction(CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores and these total  50% of the instructions. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the computer be if all instructions were cache hits?

Monday, September 2, 2013

Advanced Computer Architecture-Ist Internal




I have tried to remove duplicates as much as possible since my last post. There are now 30 unique questions out of which nearly 11 are problematic and remaining are descriptive. Pattern will be 1a,1b,1c ...4a,4b.
Unit-I
1) List and explain four important technologies which have led to the improvements in computer system.
2) Give a brief explanation about trends in power in integrated circuits and cost.
3) Define computer architecture. Illustrate the seven dimensions of an ISA.
3) Define the computer architecture. Explain the response time, throughput, elapsed time and processor clock.
4) Explain in brief measuring, reporting and summarizing performance of computer system.
5) Assume a disk subsystem with the following components and MTTF:
10 disks, each rated at 1000000- hour MTTF.
1 SCSI controller, 500,000- hour MTTF.
1 power supply, 200,000 – hour MTTF.
1 fan, 200,000 – hour MTTF.
1 SCSI cable, 1,000,000 – hour MTTF.
Using the simplifying assumptions that the lifetimes are exponentially distributed and that failures are independent,  compute the MTTF of the system as a whole.

6) Briefly explain the Amdhal’s law.
6) Define Amdahls law. Derive an expression for CPU clock as a function of instruction count, clocks per instruction and clock cycle time.
7) Two code sequences for a particular machine are considered by a compiler designer.
                                Instruction class                CPI for this instruction class
                                                A                                             1
                                                B                                             2
                                                C                                             3
The compiler designer considers 2 code sequences that require the following instruction counts for a particular high-level language statement
Code Sequence                Instruction counts for instruction class
                A             B             C
1              20           10           20
2              40           10           10

i)             Which code sequence executes most of the instruction?
ii)            What is the CPI for each sequence?
iii)           Which will be faster?

8) Explain with a learning curve, how the cost of processor varies with time along with factors influencing the cost.
9) Find the number of dies per 200cm wafer of circular shape that is used to cut die that is 1.5 cm side and compare the number of dies produced on the same wafer if the die is 1.25 cm.

10) The given data presents the power consumption of several computer system components:
Component        Product                Performance                     Power
Processor            Sun Niagara 8-core                          1.2 GHz 72-79 W
DRAM                   Kingston 1GB                              184-pin 3.7 W
Hard drive           Diamond Max 7200 rpm                7.9 W read 4.0 W idle
i)             Assuming the maximum load for each component, a power supply efficiency of 70%, what wattage must the server’s power supply deliver to a system with a Sun Niagara 8-core chip, 2GB 184-pin Kingston DRAM and 7200 rpm hard drives?
ii)            How much power will the 7200 rpm disk drive consume, if it is idle roughly 40% of the time?
iii) Assume that for the same set of requests, a 5400 rpm disk will require twice as much time to read data as a 10800 rpm disk. What percentage of time would the 5400 rpm disk drive be idle to perform the same transaction as in part (II)?

11) We will run two applications on dual Pentium processor, but the resource requirements are not the same. The first application needs 80% of the resources, and the other only 20% of the resources.
i) Given that 40% of the first application is parallelizable, how much speed up will we achieve with that application, if run in isolation?
ii) Given that 99% of the second application is parallelizable, how much speed up will this application observe, if run in isolation?
iii) Given that 40% of the first application is parallelizable, how much overall system speedup would you observe, if we parallelized it?

12) What is dependability? Explain two main measures of dependability.
13) Given the following measurements:
                Frequency of FP operations= 25%            Average CPI of FP operations=4.0
                Average CPI of other instructions=1.33  Frequency of FPSQR=2%
                CPI of FPSQR=20
Assume that  the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare the two design alternatives using the processor performance equations.

Unit-II
14) With a neat diagram, explain the classic five-stage pipeline for a RISC processor.
15) What are the  major hurdles of pipelining? Illustrate the branch hazards in detail.
15) What are the major hurdles in pipelining? Illustrate the data hazard, briefly.
16) List pipeline hazards. Explain any one in detail.
16) Explain the pipeline hazards, in detail.
16) What are the major hazards in a pipeline? Explain data hazard and methods to minimize data hazard with example.
17) List and explain five different ways of classifying exception in a computer system.
18) An unpipelined machine has 10ns clock cycle and it uses four cycles for ALU operations and branches, five cycles for memory operations. Assume that relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose due to clock skew and setup, pipelining the  machine adds 1ns overhead to the clock. Find the speed up from pipelining.

19) Consider the following calculations:  x= y + z    ;  a= b * c.  Assume the calculations are done using registers. Show, using 5 stage pipeline, how many clock pulses are required for direct operations. By recording with stalls show how many clock pulses are required and saving in the number of clock pulses to solve data hazard.

20) Show java loop is unrolled so that there are four copies of the loop body, assuming R1-R2(that is, the size of the array) is initially a multiple of 32, which means that the number of loop iterations is a multiple of 4. Eliminate any obvious redundant computations and do not reuse any of the registers.

21) Explain how pipeline is implemented in MIPS.
21) With a neat block diagram, explain how an instruction can be executed in 4 or 5 clock cycles in MIPS data path, without the pipeline register.

22) Explain different techniques in reducing pipeline branch penalties.

23) Consider the unpipelined processor in RISC. Assume that it has a 1ns clock cycle and that it uses 4 cycles for ALU operations and branches and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose that due to clock skew and setup, pipelining the processor adds 0.2ns of overload to the clock. Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline?



Unit-III
24) What are the basic compiler techniques for exposing ILP? Explain briefly.
25) Explain true data dependence, name dependence and control dependence with an example code fragment.
25) What are data dependencies? Explain name dependences with example between two instructions.

26) List the steps to unroll the code and schedule.

27) What are the techniques used to reduce branch costs? Explain both static and dynamic branch prediction used for same.
28) Explain the dynamic branch prediction state diagram.
28) What is the drawback of 1-bit dynamic branch prediction method? Clearly state, how it is overcome in 2-bit prediction. Give the state transition diagram of 2-bit predictor.
28) What is dynamic prediction? Draw the state transition diagram for 2-bit prediction scheme?


29) What is correlating predictors? Explain with examples.
  
30) Show how the below loop would look on MIPS 5-stage pipeline, under the following situations. Find the number of cycles per iteration, for each case. Assume the latencies for integer and floating point operations, as given in the prescribed text book.
Loop:     L.D          F0,0(R1)
                ADD.D   F4,F0,F2
                S.D         F4,0(R1)
                DADDUI               R1,R1,#-8
                BNE        R1,R2, Loop
i)             Without scheduling and without loop unrolling.
ii)            With scheduling and without loop unrolling.
iii)           With loop unrolling four times and without scheduling.
iv)           With loop unrolling four times and with scheduling.

                                            ~~~~x0x~~~~



Monday, March 11, 2013

Computer Graphics And Visualization-Ist internal Question Bank


Computer Graphics and Visualization-10CS65       - I st internal Question Bank

 (E-Easy, I-Important, D-Little Difficult(i don't expect anyone to attempt))

1)    List out four major areas of applications of computer graphics and explain each one of them.(E,I)

 
2)    Draw the five major elements of a graphics system and explain.(E,I)

 
3)    Describe a pin-hole camera and derive the size of images generated. Compare this with the human visual system.(E,I)

 
4)    Explain the Sythetic-Camera model with diagrams. What do the following terms mean

i) Projectors     ii) Center of Projection    iii) Projection Plane   iv) Clipping Window(E,I)

 
5)    Draw the application programmers model of graphics system. What functions are supported by the pen plotter model and the raster based model.(E,I)
 

6)    a) Explain the modeling-rendering paradigm.(D)

b) Draw and Explain the Display Processor Architecture.(E)

 
7)    a) Describe the four major steps in a Graphics Pipeline.(I)

b) What are Programmable Pipelines.(D)

 
8)    Give an algorithm to generate the 2D sierpinski gasket by plotting points.(E)

 
9)    Write an OpenGL C program to generate the 2D sierpinski gasket by recursive subdivision of triangles.(I)
 
10) Give the seven major groups of OpenGL Graphics Functions and explain each group.(E,I)

 
11) Write about the Open GL libraries and their organization.(E,I)

 
12) Draw and explain a simplified OpenGL pipeline.(E)

 
13) How do you draw basic geometric primitives (Points, Lines, Polylines, Polygons, Triangles, Quadrilaterals, Strips and Fans) in OpenGL.(E,I)

 
14) a) What are the three properties a polygon must have in order to be displayed correctly.(E,I)

b) What are the attributes of points, line segments, polygons and stroke text.(D)

 
15) Describe Stroke text and Raster text. Describe functions(APIs) to draw text in each case.(E,I)

 
16) What is Three Color Theory. Differentiate Additive Color and Subtractive Color model.(I)

 
17)  a) How do you specify Color using RGB values.(I)

      b) How do you specify Indexed Color using a Color Lookup Table. How do you set the entries in a Color table. (I)

 
18) a) How do you specify a viewing volume for Orthographic projections.(E)

          b) How do you specify a clipping rectangle for Two-Dimesional Viewing.(E)

     
19) a) Describe the control functions used to control the appearence of window on a display.(D)

b) How do you specify a viewport?(E)

c) How do you register a function display for callback. What are the events that trigger the display callback.(E)
 

20) Write an OpenGL program to draw a three-dimensional Sierpinski gasket inside a tetrahedron using recursive subdivision.(E,I)

 
21) How do you plot an implicit function such as x2+y2-1=0 using marching squares technique. Describe at least one case of the 16 cases.(DDD)

 
22) Write a short note on categorization of logical input devices.(I)
 

23) Briefly describe the three input modes with diagrams.(I)
 

24) What are display lists? How do you define and execute them in OpenGL. How can you create multiple lists and call them with a single function call.(D,I)