Intel Multicore Lab

Equipment Details

Intel Software Development Tools

Intel Integrated Performance Primitives

Collection of functions for digital data processing. Highly optimized.

Intel Thread Building Blocks

Development of task based applications that are scalable, reliable and parallel with fewer lines of code.

Intel Vtune Amplifier

Used for analysing performance of serial and parallel applications.

Intel Cluster Toolkit

Used to develop, analyse and optimize performance of applications that run on cluster environment.

Intel MPI library

Use of high performance message passing library for developing applications that can run on multiple clusters.

Intel Trace Analyser and Collector

Helps in understanding the performance of MPI application.

Eucalyptus cloud computing tool

Used for building AWS-compatible private and hybrid clouds
Pools together existing virtualized infrastructure to create cloud resources for compute, network and storage.
Can be dynamically scale up or down depending on application workloads

MPI programming

Used for parallel programming
Implements software level parallelism
Commands used:

MPI_Send()

MPI_Receive()

SESC (Super Escalar Simulator)

Multi-processor simulator used for modeling caches, out-of-order pipeline.
Capable of simulating static and dynamic instructions.

M5 Sim

It is an event driven simulation tool
Enables users to simulate a multi-core environment
It models CPU core, caches as objects.

List of M.Tech Projects in INTEL MULTICORE LAB

S. No.	Roll No.	Name	Title of the Project	Month & Year
1	CSA 0514	G. Pravinth	A Cache-Aware Scheduling Scheme for Real Time Tasks on Multicore Platforms	Dec-06
2	CSA 0513	V. SenthilKumar	Parallelization Methodology for Multicore Architecture Simulation	Dec-06
3	CSA 0515	M. Sivaram	A Cycle Accurate ISS for a Dynamically Reconfigurable Processor Architecture	Dec-06
4	CSA 0501	Sunitha P. George	A Generic Dual-Core Architecture	Dec-06
5	CSA 0513	V. SenthilKumar	Parallelization and Power Evaluation Methodology for Multicore Architecture Simulation	May-07
6	206107001	B R Prasad	Interconnection In Multicore Architecture Design	May-09
7	206108019	Srinivas Reddy A	Study of Ear Segmentation For Implementing Face Recognition	Dec-09
8	206108002	Atul Baban Chavan	A Proposal Of Thread Scheduler Framework For Multi-core Platform (Phase-I)	Dec-09
9	206108021	Dhawaleswar Rao	Study on The Performance of Some Web Caching Replacement Algorithms	Dec-09
10	206108002	Atul Baban Chavan	A Proposal Of Thread Scheduler Framework For Multi-core Platform(Phase-II)	May-10
11	206109020	Hathiram Banoth	Study of Performance Issues on Multicore Architecture	Dec-10
12	206109026	Pavan Kumar Paruchuri	Study of Cache-Aware Real-Time Schedulers In Multicore Platforms	Dec-10
13	206109026	Pavan Kumar Paruchuri	Global Scheduling Algorithms For Small To Medium Multicore Platforms	May-11
14	206109020	Hathiram Banoth	Studies of Cache Performance Evaluation of Multicore Architectures For The ISAs	May-11
15	206111035	Amit Kumar Singh	Techniques For Better Utilization Of Shared Caches In Multicore Architectures	Dec-12
16	206111035	Amit Kumar Singh	Study on performance of cache coherence protocols for multicore architectures	May-13
17	206111008	Tanmoy Kundu	Studies On The Impact Of Memory Management On Process Scheduling In The Context Of Multicore Architecture	Dec-12
18	206111008	Tanmoy Kundu	Implementation of scheduling schemes to mitigate shared resource contention in multicore architecture	May-13
19	206112029	Prabhin	BIOS Design for Thunderbolt in Next Generation Intel Platforms	Dec-13
20	206112029	Prabhin	BIOS Design for Thunderbolt in Next Generation Intel Platforms	May-14
21	206113017	S. Vinod kumar	A proposed schema for efficient Packet classification in network Processors	July-14
22	206113034	K. Hemalatha	Memory aware task scheduling for real Time operating systems	July-14
23	206113017	S Vinod Kumar	Implementing ravel and gruu feature for ims 3gpp release-11	Dec-14
24	206113034	K. Hemalatha	Adaptive bitrate transcoding for power efficient video streaming in mobile devices	Dec-14
25	206113017	S Vinod Kumar	Architecture for roaming user scenario for voice over ims with local breakout	May-15
26	206113034	K. Hemalatha	Pose estimation technique using modified posit method for mobile devices	May-15
27	206114015	Sangeetha Vikraman	Performance comparison for Reconfigurable and partial reconfigurable SOC	Dec-15
28	206114015	Sangeetha Vikraman	performance enhancement of low power mode in 3g firmware	May-15
29	206114015	Sangeetha Vikraman	Implementing digrf driver host test with multiple excecution context	May-16
30	206115007	Sreedeep C	Prevention of Side Channel Attacks using Hardware	May-16
31	206115013	Siraj P S	Hardware Security using Bloom Filters	May-16
32	206115008	Pranita Solanke	Implementation of MESI Cache Coherence Protocol using Snoop Filtering Technique	May-16
33	206115007	Sreedeep C	Dynamic Partial Re-configuration of Image Processing Blocks on FPGA	Dec-16
34	206115013	Siraj P S	Improving the Performance of H.265 Video Encoding using CPU+GPU Systems	Dec-16
35	206115008	Pranita Solanke	Efficient Hardware Implementation of Multi-Modular Exponentiation in RSA Algorithms	Dec-16

On Going and Completed Projects in INTEL LAB 2014-2016

Energy efficient modular exponentiation for PKC (Completed)

Modular exponentiation and modular multiplications are two fundamental operations in various cryptographic applications, and hence the performance of public-key cryptographic algorithms is strongly influenced by the efficient implementation of these operations. Reducing the frequency of modular multiplications and the time requirements for modular multiplication will help in developing efficient modular exponential algorithms. This work proposes an energy efficient modular exponential algorithm based on bit forwarding techniques. In particular, two algorithms, Bit Forwarding 1-bit (BFW1) and Bit Forwarding 2-bits (BFW2), which are modifications of the existing binary exponential algorithm, have been developed. Hardware realizations of the proposed algorithms have been evaluated in terms of throughput, power and energy. Results show increased throughput of the order of 11.02% and 15.13%, reduction in power to 1.93% and 6.35% and energy saving of the order of 1.9% and 6.35% for BFW1 and BFW2 algorithms respectively. Xilinx ISE-14.2 on Virtex-5 evaluation board and ICARUS Verilog simulation and synthesis tool are used for hardware realization for FPGA and synthesized using Cadence for ASIC.

Usability aware Resource saving in handheld devices (Completed)

The emergence of new operating systems and applications for mobile phones and tablets has necessitated the need for power optimization. Storage space has become another matter of concern as new operating systems have started supporting video codec and formats originally meant for desktop application without compression and conversion. The proposed approach tries to identify the region of interest for video by combining the approach of feature extraction with natural statistics for dynamic analysis of the scene. The portion outside the region of interest in the original video is depreciated in order to increase redundancy for pixel value in a frame. Open CV is being used for the implementation of saliency map. It is expected to reduce power consumption, file size and average CPU consumption for handheld device using this approach.

Adaptive Bitrate Transcoding for Power Efficient Video Streaming in Mobile Devices (Completed)

Video applications are an important part of mobile devices. Capacity of battery is increasing max of 10% per year, which is not sufficient for upcoming application & Operating System. Power consumption by video application depends on factors like network load, signal quality etc., and it can be optimized through heuristics based streaming. The work exploits adaptive bitrate streaming to determine the optimum bitrate as per available bandwidth. Selection of optimum bitrate ensures high quality delivery of video as well as optimum power consumption of the device. MPEG-DASH has been used for implementing the switching between the bitrates with fluctuating bandwidth using Java script, HTML, CSS in Android 4.0.4 operating system. The four bitrates selected for encoding are closer to the mean value which is available for streaming. The proposed method will lead to a low power consuming video streaming with high quality.

Packet classification in network processor (Completed)

Packet classification is the essential function in various applications like as, Router, switches and firewalls. Because of their performance and scalability limitations, current packet classification solutions are insufficient in addressing the growing network bandwidth and increasing new application. So we necessitate implementing the efficient techniques in software as well as hardware. The proposed work tries to reduce the memory space and increasing the performance, using efficient hash technique to reduce the memory space for predefined rule set that stored in the RAM. For improving the performance, the rule sets are grouped based on clustering method and applying Simulated Annealing technique further optimization.

Security in Reconfigurable Computing

The requirement for highly parallel computation and reduced heat production gives way for FPGA co-processors. Traditional processors are based on fetch and execute technology, while FPGAs act as program as circuit on the device. This makes FPGAs highly parallel and reduced frequency requirement which in turn reduces the heat produced. Existing processors have limit on the extension to which the amount of parallelism that can be achieved and speedup that can be increased without reducing the heat production. In cloud where the amount of computational requirements growing day by day, FPGAs can act as accelerators. The efficient hardware based implementation of algorithms and their security issues while deploying them in FPGAs on cloud are major issues. While using FPGAs on data centers, speed is the main factor to be considered that resource usage. Deploying our hardware design on data center face different security threats like hardware Trojans, cloning etc. While coming to other applications of reconfigurable computing, evolvable hardware using different evolutionary algorithms also comes into picture. With the availability of large amount of resources, reconfigurable hardware, a new era for designing hardware, changing already built hardware, providing security as needed etc. can be created. Whenever we find some application is killing the overall system can be found out, we could design the application specific hardware and deploy the application from main CPU to FPGA.

A study on High performance hybrid cache design for multi-core architecture using optimal cache partitioning techniques

Designers are responsible for selecting the appropriate cache according to the requirements of the system. The cache design space is big, as there are many variables that can affect the system’s behavior and performance These include the total size of the cache, it is associatively, the size of each cache line, the policy according to which lines are placed or replaced inside the cache array and the actual placement of the cache in the architecture and its distance from the processing cores The increasing number of transistors per chip widens this range of options, as it is now possible to bring bigger caches closer to the processor or introduce multi-level cache hierarchies on chip A straightforward solution to increase the cache’s effectiveness and thus improve the overall performance would be to increase the cache’s size and associatively. Consequently, the cache would be able to hold more data blocks and reduce conflicts between data lines that map on the same cache line. This approach is primarily limited by the available area on the chip. Additionally, it makes the cache slower and more power hungry, which could ultimately have a negative effect on the system clearly, the effectiveness of this solution is limited and designers have been searching for other alternatives. Memory subsystem is an essential part of such architectures. We focus the problem of cache partitioning for energy optimization on MCAs, and propose hybrid cache architecture for optimizing partition-sharing. The architecture shows that the problem of partition-sharing is reducible to the problem of partitioning. The technique uses dynamic programming to optimize partitioning for overall miss ratio, and for two different kinds of fairness. The hybrid cache architecture contains SRAM banks, STT-RAM banks, and STT-RAM/SRAM or any other hybrid banks for chip multiprocessors. The proposed optimization based hybrid architecture can significantly improve equal partitioning but not free-for-all sharing. Each optimization result is obtained from a very large solution space for different ways to share the cache along with the block placement and replacement policies.

Multi-core Cache Coherence and Related Issues

Multi-core processor architecture has become dominant in todays' computing environment. We use multi-core processor on each step of life, starting from personal mobile devices to large scale servers used for high performance computing environment like cloud. The underlying computing capacity can be utilized by writing code with parallel programming constructs like threads, OpenMP clauses etc. These are some of the effective ways to properly utilize shared memory architectures. Shared memory architectures face a problem of cache coherence.

There are two basic schemes that deals with the problem of cache coherence viz. snoopy bus protocol and directory protocol. The snoopy bus protocol is easy to implement but doesn't scale beyond certain number of cores. The directory protocol is complex to implement but scales properly for number of cores. The survey of cache coherence schemes tells that, there is a need of a scheme that will scale with respect to number of cores appropriately along with energy saving and good performance. Recent studies show that the usage of hybrid protocol that will use characteristics of both snoopy bus and directory with different key techniques has been discover to achieve good performance with low energy consumption.

List of Students

S.No	Name	Roll No	Category	Year of Admission	Lab
1.	Satyanarayana	406112003	PhD, Full Time TEQIP	2012	INTEL
2.	P.S.Tamilzharasan	406912001	PhD, Part Time	2012	INTEL
3.	R. Sangeetha	406913002	PhD, Part Time	2013	INTEL
4.	Manjith B C	406114001	PhD, QIP, Full Time	2014	INTEL
5.	Praveen Kumar Yadav	306112001	M.S, Full Time	2012	INTEL
6.	Anand Prem Kumar V	306113001	M.S, Full Time	2013	INTEL

Departments / Centres

Centers/Common Facility

Faculty