Rise Lab NITT
Equipment Details
The gem5 Simulator System
Bluespec
-
A high-level functional hardware description programming language.
-
It leads to shorter, more abstract, and verifiable (provably correct) source code.
-
More than 50% improvements compared to conventional methods of design
Altera Modelsim
Xilinx ISE Design Suite
-
Xilinx ISE (Integrated Synthesis Environment) is a software tool produced by Xilinx for synthesis and analysis of HDL designs, enabling the developer to synthesize their designs, perform timing analysis, examine RTL diagrams, simulate a design's reaction to different stimuli, and configure the target device with the programmer.
-
It is a design environment for FPGA products from Xilinx, and is tightly-coupled to the architecture of such chips, and cannot be used with FPGA products from other vendors.
-
Primarily used for circuit synthesis and design, while the ModelSim logic simulator is used for system-level testing.
-
Other components shipped with the Xilinx ISE include the Embedded Development Kit (EDK), a Software Development Kit (SDK) and Chip Scope Pro.
CACTI (Cache Access Cycle Time Indicator)
Hardware Components
List of M.Tech Projects in the RISE Lab
1
|
206110022
|
Srinivas V.V.
|
Enhancing QoS For Efficient Video Streaming In A Typical Cloud Environment
|
Dec-11
|
2
|
206110018
|
Chaitanya Vagga
|
Block Scheduling For Multicore Platform
|
Dec-11
|
3
|
206110022
|
Srinivas V.V.
|
Performance Evaluation of Stream Log Collection Using HADOOP Distributed File System
|
May-12
|
4
|
206110018
|
Chaitanya Vagga
|
Deadlock Free Scheduling For Distributed Systems
|
May-12
|
5
|
206112028
|
V.V.Varadhan
|
Hardware Assisted Scheduler for Multicore Architecture
|
Dec-13
|
6
|
206112001
|
Divya Patel
|
Automation of Power and Performance Validation Flow of Server Processor
|
May-14
|
7
|
206112028
|
V.V.Varadhan
|
A Prototype of Multi-core Cryptographic Processor with Hardware Scheduler
|
May-14
|
8 |
206113004
|
T. Vidya
|
RABED: Design of a reconfigurable associativity and block size embedded dynamic data cache
|
July-14
|
9 |
206113009
|
Ankam Koti
Veera Kumar
|
Implementation of parallel Irregular and Recursive Algorthims using open-MP
|
July-14
|
10 |
206113004
|
T. Vidya
|
Design of an interconnect topology for multi-cores and scale-out workloads
|
Dec-14
|
11 |
206113009
|
Ankam Koti
Veera Kumar
|
Design and implementation of lightstor protocol controller
|
Dec-14
|
12 |
206113004
|
T. Vidya
|
Scaling a noc topology and implementation of routing algorithms
|
May-15
|
13 |
206113009
|
Ankam Koti
Veera Kumar
|
Parallel Implementation of ACO with local search for Multi-Dimensional Knapsack Problem
|
May-15
|
14 |
206114023
|
S. Hemanthkumar
|
Mobile application for voice Messaging
|
July-15
|
15 |
206114026
|
Revathi Uddaraju
|
Implementation of MSI Cache coherence Protocol using Bluespec in Noc-(cache)
|
July-15
|
16 |
206114004
|
Prakash Borkar
|
Implementation of MSI Cache coherence Protocol using Bluespec in Noc-(Directory)
|
July-15
|
17 |
206114023
|
S. Hemanthkumar
|
Improving performance of h.264 video encoding on cpu+gpu systems
|
Dec-15
|
18 |
206114004
|
Prakash Borkar
|
Dynamic Cache Reconfiguration for Improved Performance
|
Dec-15
|
19 |
206114026
|
Revathi Uddaraju
|
Design of fault tolerance framework for faults that occur in soc
|
Dec-15
|
20 |
206114023
|
S. Hemanthkumar
|
Improving Performance of Text data Compression on CPU+GPU Systems
|
May-15
|
21 |
206114004
|
Prakash Borkar
|
Design of dynamic reconfigurable cache to improve energy efficiency
|
May-15
|
22 |
206114026
|
Revathi Uddaraju
|
Fault tolerance sysem for information and time redundancy
|
May-15
|
23 |
206115023
|
M Karthikeyan
|
CUDA Implementation of Speech Processing Algorithms
|
May-16
|
24 |
206115023
|
M Karthikeyan
|
Scaling Existing Lock Based Applications using Adaptive Lock Elision
|
Dec-16
|
On Going Projects in RISE LAB 2014-2017
Performance Analysis of Deep Architectures
Manycore architecture system includes more number of processing elements to improve the performance while sustaining power considerations. Accelerating heterogeneous manycore computing elements involves huge amount of memory copy, computation and thread management. Applications of manycore architectures range from desktop computer to ware-house-scale computer. Deeplearning applications utilize full capability of manycore architecture such as GPU to ease the complexities involved in it. Visual understanding and speech processing are two major applications of deeplearning which require efficient deep architecture models to train the system for prediction and classification tasks. Hybrid deep architecture models comprising convolutional neural networks and recurrent neural networks are used to achieve good accuracy in computer vision tasks.
Designing a trustworthy system in programmable SOC
An embedded system is an electronic system that has a software and is embedded in computer hardware. The Evergreen blooming technology is embedded system, it is used everywhere in modern life, starting from consumer electronics, Education, telecommunication, home appliance, transportation, industry, medical and military. It is programmable or non- programmable depending on the application.
The Complexity of the real time system and time to market is increasing tremendously due to speed and adaptability, single SOC is not a solution for such SOC design, Hence the multiprocessor SOC is very much essential in today’s world. A framework has been proposed to manage resource, reduced power and to design a trustworthy system in programmable SOC for many components which is plugged into FPGA. As chips increase in complexity, trustworthy processing of sensitive information can become increasingly difficult to achieve due to extensive on-chip resource sharing and the lack of corresponding protection mechanisms. A Physical Unclonable Function (PUF) is a function with certain desirable properties, it should be easy to make, but “impossible” to duplicate. A PUF is basically a variability-aware circuit which is able to detect the mismatch in circuit components caused by manufacturing process variation. If a PUF circuit is instantiated on several different chips, then each of the PUF instantiations are expected to produce unique responses when supplied with the same challenge. The challenges in designing a security primitive like PUF are multiple: achieving reliability, cryptography, durability to attacks, low power consumption, shrinking of area size and easy system-level integration.
Enhancing Performance of On-Chip Cache for Multicore Architecture
The modern embedded system has to be designed to meet the tremendous changes due to high speed and advancement in technologies. A multi-core processor is a single computing component with two or more independent actual processing units called “cores”, which are the units that read and execute program instructions. The instructions are ordinary CPU instructions such as add, move data, and branch, but the multiple cores can run multiple instructions at the same time, increasing overall speed for programs amenable to parallel computing. The most important concern in the design of low power embedded applications is to decrease the consumption of energy by on-chip processors caches, as on-chip cache consumes approximately 40% of the power fed to processors. The size of on-chip cache increases with ever-shrinking features which in turn increases overall energy consumption. Multilevel cache memories such as L1, L2, L3 and L4 are introduced to minimize the consumption of energy. One area to focus on cache is its energy consumption. There are many parts of energy consumption to look into. Optimizing energy consumption of caches can be achieved by optimizing cache access, dynamic partitioning of caches, reconfiguring caches and predicting tag access. These smaller areas contribute a significant amount in the increased energy consumption of caches, so optimizing any one of them will decrease energy consumption of caches.
Design Space Exploration for Architectural Synthesis
System designers develop models in high level languages such as C or C++ which offers higher levels of abstraction. This makes verification of the model and its functionality easy. This also makes reusability of the code possible. For developing the corresponding hardware, hardware designers must analyze the high level code and select a suitable hardware the given code. An important challenge in converting this high level design to equivalent hardware design is the many design possibilities that need to be considered. The design spaces usually involve multiple metrics of interest such as timing, resource usage, energy usage, cost, etc. and multiple design parameters like the number and type of processing cores, sizes and organization of memories, interconnect, scheduling and arbitration policies. The relation between design choices on the one hand and metrics of interest on the other hand is often very difficult to establish, due to aspects such as concurrency, dynamic application behavior, and resource sharing. No single modeling approach or analysis tool is fit to cope with all the challenges of modern hardware design. The architecture designed should be optimized to achieve best trade-offs in the selected metrics of interest. This selection process is called Design Space Exploration and it is iterative in nature. It includes a vast set of design choices and relies largely on the decision of the architect. The number of design constructs for a particular design is huge and it exponentially increases the problem complexity. Design Space Exploration involves optimizing the design by selecting components that minimize or maximize the metrics of interest as needed. This optimization problem has multiple, often conflicting objectives that need to be achieved during designing. For example, the optimization problem may need to minimize power consumption under an execution time constraint or vice versa. An exploration algorithm that can achieve performance equivalent to complete exploration of design space with a practical execution time needs to be developed.
Implementation of RAVEL & GRUU Feature for IMS 3GPP Release-11
IP Multimedia Subsystem (IMS) is a standardized 3GPP architectural framework for providing access level independent services to users. IMS flows all the data in Packet Switched Domain with service interoperability with CS domain. IMS uses the SIP protocol as the base for all its communication and signaling aspects. The Globally Routable User Agent URI (GRUU) feature entitles for a unique determination of UE instance, in situations where multiple contacts are registered under the same public user identity. So their exits both the provision for uni-casting and multi casting of requests. These GRUU URI is created for universally unique identification of users.
Design and Implementation of Lightstor Protocol Controller
Lightstor is a new interface specification that allows host software to communicate with a non-volatile memory subsystem. This interface is optimized for Client solid state drives, attached using a Rapid IO fabric. It aims to build a comprehensive Storage and Backup system with unlimited size and bandwidth scalability. The Lightstor interface provides optimized command submission and completion paths. It also includes support for parallel operation by supporting multiple I/O Command Queues. While other routable fabrics like Infiniband or quasi-fabrics like PCIe can be used, LightStor benefits are best demonstrated when RapidIO is used.
Design of a Multi-Core Interconnect for Scale-Out Workloads
Scale-out workloads are applications that are typically executed in a cloud environment and exhibit high level of request level parallelism. Such workloads benefit from processor organizations with very high core count (on the order of hundreds to thousands) since multiple requests can be serviced simultaneously by threads running on these cores. The characteristics of these workloads indicate that they have high instruction footprints, operate on large datasets with limited reuse and have minimal coherence activity due to lesser data sharing. Since most of the instructions will reside in the Last Level Cache (LLC) and will be actively shared by all the cores, the reduction in the latency to fetch a block of words of instructions will improve the performance of these workloads and thereby the performance of the system as a whole. The focus of the current work is to minimize this latency by appropriate design of the network that interconnects the multiple cores. Pejman Lotfi-Kamran et al. in the paper NOC-Out: Micro architecting a Scale-Out Processor advocate separating the LLC tiles from the core slice and placing them in a separate part of the chip to reduce LLC access latency. The current work takes this approach and a new network topology connecting cores, LLC slices and routers has been designed. In this design four cores and a LLC slice connect to a router forming a star topology and the routers form a 2D flattened butterfly topology. The current design has been targeted at 8 cores and has been implemented using the Blue spec System Verilog HDL (Hardware Description Language) and the design has been synthesized using Xilinx Vivado 2013.2 targeting Zynq-7000 product family of FPGA boards. The design has been tested for different amounts of offered traffic and the average latency and the throughput of the interconnection network for uniform random traffic pattern has been calculated.
List of Students
S.No
|
Name
|
Roll No
|
Category
|
Year of Admission
|
Lab
|
1.
|
Jobin Jose
|
406913051
|
PhD, Part Time
|
2013
|
RISE
|
2.
|
Shameedha Begum
|
406913001
|
PhD, Part Time
|
2013
|
RISE
|
3.
|
J. Kokila
|
406114002
|
PhD, DIETY Scheme, MHRD, Full Time
|
2014
|
RISE
|
4.
|
B. Krishna Priya
|
406114055 |
PhD, Full Time, Institute
|
2015 |
RISE
|