Intel Multicore Lab

Equipment Details

Intel Software Development Tools

Intel Integrated Performance Primitives

  • Collection of functions for digital data processing. Highly optimized.

Intel Thread Building Blocks

  • Development of task based applications that are scalable, reliable and parallel with fewer lines of code.

Intel Vtune Amplifier

  • Used for analysing performance of serial and parallel applications.

Intel Cluster Toolkit

  • Used to develop, analyse and optimize performance of applications that run on cluster environment.

Intel MPI library

  • Use of high performance message passing library for developing applications that can run on multiple clusters.

Intel Trace Analyser and Collector

  • Helps in understanding the performance of MPI application.

Eucalyptus cloud computing tool

  • Used for building AWS-compatible private and hybrid clouds

  • Pools together existing virtualized infrastructure to create cloud resources for compute, network and storage.
  • Can be dynamically scale up or down depending on application workloads

MPI programming

  • Used for parallel programming

  • Implements software level parallelism

  • Commands used:

MPI_Send()

MPI_Receive()

SESC (Super Escalar Simulator)

  • Multi-processor simulator used for modeling caches, out-of-order pipeline.

  • Capable of simulating static and dynamic instructions.

M5 Sim

  • It is an event driven simulation tool

  • Enables users to simulate a multi-core environment

  • It models CPU core, caches as objects.

List of M.Tech Projects in INTEL MULTICORE LAB

 

S. No.

Roll No.

Name

Title of the Project

Month & Year

1

CSA 0514

G. Pravinth

A Cache-Aware Scheduling Scheme for Real Time Tasks on Multicore Platforms

Dec-06

2

CSA 0513

V. SenthilKumar

Parallelization Methodology for Multicore Architecture Simulation

Dec-06

3

CSA 0515

M. Sivaram

A Cycle Accurate ISS for a Dynamically Reconfigurable Processor Architecture

Dec-06

4

CSA 0501

Sunitha P. George

A Generic Dual-Core Architecture

Dec-06

5

CSA 0513

V. SenthilKumar

Parallelization and Power Evaluation Methodology for Multicore Architecture Simulation

May-07

6

206107001

B R Prasad

Interconnection In Multicore Architecture Design

May-09

7

206108019

Srinivas Reddy A

Study of Ear Segmentation For Implementing Face Recognition

Dec-09

8

206108002

Atul Baban Chavan

A Proposal Of Thread Scheduler Framework For Multi-core Platform (Phase-I)

Dec-09

9

206108021

Dhawaleswar Rao

Study on The Performance of Some Web Caching Replacement Algorithms

Dec-09

10

206108002

Atul Baban Chavan

A Proposal Of Thread Scheduler Framework For Multi-core Platform(Phase-II)

May-10

11

206109020

Hathiram Banoth

Study of Performance Issues on Multicore Architecture

Dec-10

12

206109026

Pavan Kumar Paruchuri

Study of Cache-Aware Real-Time Schedulers In Multicore Platforms

Dec-10

13

206109026

Pavan Kumar Paruchuri

Global Scheduling Algorithms For Small To Medium Multicore Platforms

May-11

14

206109020

Hathiram Banoth

Studies of Cache Performance Evaluation of Multicore Architectures For The ISAs

May-11

15

206111035

Amit Kumar Singh

Techniques For Better Utilization Of Shared Caches In Multicore Architectures

Dec-12

16

206111035

Amit Kumar Singh

Study on performance of cache coherence protocols for multicore architectures

May-13

17

206111008

Tanmoy Kundu

Studies On The Impact Of Memory Management On Process Scheduling In The Context Of Multicore Architecture

Dec-12

18

206111008

Tanmoy Kundu

Implementation of scheduling schemes to mitigate shared resource contention in multicore architecture

May-13

19

206112029

Prabhin

BIOS Design for Thunderbolt in Next Generation Intel Platforms

Dec-13

20

206112029

Prabhin

BIOS Design for Thunderbolt in Next Generation Intel Platforms

May-14

21

206113017

S. Vinod kumar

A proposed schema for efficient Packet classification in network Processors

July-14

22

206113034

K. Hemalatha

Memory aware task scheduling for real Time operating systems

July-14

23

206113017

S Vinod Kumar

Implementing ravel and gruu feature for ims 3gpp release-11

Dec-14

24

206113034

K. Hemalatha

Adaptive bitrate transcoding for power efficient video streaming in mobile devices

Dec-14

25 206113017

S Vinod Kumar

Architecture for roaming user scenario for voice over ims with local breakout

May-15

26 206113034

K. Hemalatha

Pose estimation technique using modified posit method for mobile devices

May-15

27

206114015

Sangeetha Vikraman

Performance comparison for Reconfigurable and partial reconfigurable SOC

Dec-15

28

206114015

Sangeetha Vikraman

performance enhancement of low power mode in 3g firmware

May-15

29

206114015

Sangeetha Vikraman

Implementing digrf driver host test with multiple excecution context

May-16

30

206115007

Sreedeep C

Prevention of Side Channel Attacks using Hardware

May-16

31

206115013

Siraj P S

Hardware Security using Bloom Filters

May-16

32

206115008

Pranita Solanke

Implementation of MESI Cache Coherence Protocol using Snoop Filtering Technique

May-16

33

206115007

Sreedeep C

 

Dynamic Partial Re-configuration of Image Processing Blocks on FPGA

Dec-16

34

206115013

Siraj P S

Improving the Performance of H.265 Video Encoding using CPU+GPU Systems

Dec-16

35

206115008

Pranita Solanke

Efficient Hardware Implementation of Multi-Modular Exponentiation in RSA Algorithms

Dec-16

On Going and Completed Projects in INTEL LAB 2014-2016

Energy efficient modular exponentiation for PKC (Completed)                                                                                                                          

Modular exponentiation and modular multiplications are two fundamental operations in various cryptographic applications, and hence the performance of public-key cryptographic algorithms is strongly influenced by the efficient implementation of these operations. Reducing the frequency of modular multiplications and the time requirements for modular multiplication will help in developing efficient modular exponential algorithms. This work proposes an energy efficient modular exponential algorithm based on bit forwarding techniques. In particular, two algorithms, Bit Forwarding 1-bit (BFW1) and Bit Forwarding 2-bits (BFW2), which are modifications of the existing binary exponential algorithm, have been developed. Hardware realizations of the proposed algorithms have been evaluated in terms of throughput, power and energy. Results show increased throughput of the order of 11.02% and 15.13%, reduction in power to 1.93% and 6.35% and energy saving of the order of 1.9% and 6.35% for BFW1 and BFW2 algorithms respectively. Xilinx ISE-14.2 on Virtex-5 evaluation board and ICARUS Verilog simulation and synthesis tool are used for hardware realization for FPGA and synthesized using Cadence for ASIC.

©2016 Elsevier B.V. All rights reserved. (http://www.sciencedirect.com/science/article/pii/S0020019016301715)                                                                                                                                                                               

Usability aware Resource saving in handheld devices (Completed)

The emergence of new operating systems and applications for mobile phones and tablets has necessitated the need for power optimization. Storage space has become another matter of concern as new operating systems have started supporting video codec and formats originally meant for desktop application without compression and conversion. The proposed approach tries to identify the region of interest for video by combining the approach of feature extraction with natural statistics for dynamic analysis of the scene. The portion outside the region of interest in the original video is depreciated in order to increase redundancy for pixel value in a frame. Open CV is being used for the implementation of saliency map. It is expected to reduce power consumption, file size and average CPU consumption for handheld device using this approach.

Adaptive Bitrate Transcoding for Power Efficient Video Streaming in Mobile Devices (Completed)

Video applications are an important part of mobile devices. Capacity of battery is increasing max of 10% per year, which is not sufficient for upcoming application & Operating System. Power consumption by video application depends on factors like network load, signal quality etc., and it can be optimized through heuristics based streaming. The work exploits adaptive bitrate streaming to determine the optimum bitrate as per available bandwidth. Selection of optimum bitrate ensures high quality delivery of video as well as optimum power consumption of the device. MPEG-DASH has been used for implementing the switching between the bitrates with fluctuating bandwidth using Java script, HTML, CSS in Android 4.0.4 operating system. The four bitrates selected for encoding are closer to the mean value which is available for streaming. The proposed method will lead to a low power consuming video streaming with high quality.

Packet classification in network processor (Completed)

Packet classification is the essential function in various applications like as, Router, switches and firewalls. Because of their performance and scalability limitations, current packet classification solutions are insufficient in addressing the growing network bandwidth and increasing new application. So we necessitate implementing the efficient techniques in software as well as hardware. The proposed work tries to reduce the memory space and increasing the performance, using efficient hash technique to reduce the memory space for predefined rule set that stored in the RAM. For improving the performance, the rule sets are grouped based on clustering method and applying Simulated Annealing technique further optimization.

Security in Reconfigurable Computing

The requirement for highly parallel computation and reduced heat production gives way for FPGA co-processors. Traditional processors are based on fetch and execute technology, while FPGAs act as program as circuit on the device. This makes FPGAs highly parallel and reduced frequency requirement which in turn reduces the heat produced. Existing processors have limit on the extension to which the amount of parallelism that can be achieved and speedup that can be increased without reducing the heat production. In cloud where the amount of computational requirements growing day by day, FPGAs can act as accelerators. The efficient hardware based implementation of algorithms and their security issues while deploying them in FPGAs on cloud are major issues. While using FPGAs on data centers, speed is the main factor to be considered that resource usage. Deploying our hardware design on data center face different security threats like hardware Trojans, cloning etc. While coming to other applications of reconfigurable computing, evolvable hardware using different evolutionary algorithms also comes into picture. With the availability of large amount of resources, reconfigurable hardware, a new era for designing hardware, changing already built hardware, providing security as needed etc. can be created. Whenever we find some application is killing the overall system can be found out, we could design the application specific hardware and deploy the application from main CPU to FPGA.

A study on High performance hybrid cache design for multi-core architecture using optimal cache partitioning techniques

Designers are responsible for selecting the appropriate cache according to the requirements of the system. The cache design space is big, as there are many variables that can affect the system’s behavior and performance These include the total size of the cache, it is associatively, the size of each cache line, the policy according to which lines are placed or replaced inside the cache array and the actual placement of the cache in the architecture and its distance from the processing cores The increasing number of transistors per chip widens this range of options, as it is now possible to bring bigger caches closer to the processor or introduce multi-level cache hierarchies on chip A straightforward solution to increase the cache’s effectiveness and thus improve the overall performance would be to increase the cache’s size and associatively. Consequently, the cache would be able to hold more data blocks and reduce conflicts between data lines that map on the same cache line. This approach is primarily limited by the available area on the chip. Additionally, it makes the cache slower and more power hungry, which could ultimately have a negative effect on the system clearly, the effectiveness of this solution is limited and designers have been searching for other alternatives. Memory subsystem is an essential part of such architectures. We focus the problem of cache partitioning for energy optimization on MCAs, and propose hybrid cache architecture for optimizing partition-sharing. The architecture shows that the problem of partition-sharing is reducible to the problem of partitioning. The technique uses dynamic programming to optimize partitioning for overall miss ratio, and for two different kinds of fairness. The hybrid cache architecture contains SRAM banks, STT-RAM banks, and STT-RAM/SRAM or any other hybrid banks for chip multiprocessors. The proposed optimization based hybrid architecture can significantly improve equal partitioning but not free-for-all sharing. Each optimization result is obtained from a very large solution space for different ways to share the cache along with the block placement and replacement policies.

Multi-core Cache Coherence and Related Issues

Multi-core processor architecture has become dominant in todays' computing environment. We use multi-core processor on each step of life, starting from personal mobile devices to large scale servers used for high performance computing environment like cloud. The underlying computing capacity can be utilized by writing code with parallel programming constructs like threads, OpenMP clauses etc. These are some of the effective ways to properly utilize shared memory architectures. Shared memory architectures face a problem of cache coherence.

There are two basic schemes that deals with the problem of cache coherence viz. snoopy bus protocol and directory protocol. The snoopy bus protocol is easy to implement but doesn't scale beyond certain number of cores. The directory protocol is complex to implement but scales properly for number of cores. The survey of cache coherence schemes tells that, there is a need of a scheme that will scale with respect to number of cores appropriately along with energy saving and good performance. Recent studies show that the usage of hybrid protocol that will use characteristics of both snoopy bus and directory with different key techniques has been discover to achieve good performance with low energy consumption.

 

List of Students

 

S.No

Name

Roll No

Category

Year of Admission

Lab

1.

Satyanarayana

406112003

PhD, Full Time TEQIP

2012

INTEL

2.

P.S.Tamilzharasan

406912001

PhD, Part Time

2012

INTEL

3.

R. Sangeetha

406913002

PhD, Part Time

2013

INTEL

4.

Manjith B C

406114001

PhD, QIP, Full Time

2014

INTEL

5.

Praveen Kumar Yadav

306112001

M.S, Full Time

2012

INTEL

6.

Anand Prem Kumar V

306113001

M.S, Full Time

2013

INTEL