|Conference Papers||Journal Papers||Technical Reports||Ph.D. Theses||Book chapters||Books|
X-Attack: Remote Activation of Satisfiability Don’t-Care Hardware Trojans on Shared FPGAs2020-08-31. The International Conference on Field-Programmable Logic and Applications (FPL), August 31 - September 4, 2020.
Are Cloud FPGAs Really Vulnerable to Power Analysis Attacks?2020-03-09. Design, Automation and Test in Europe (DATE), Grenoble, France, March 9-13, 2020.
Optimus Prime: Accelerating Data Transformation in Servers2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203-1216. DOI : 10.1145/3373376.3378501.
A machine learning approach for power gating the FPGA routing network2019-12-11. 2019 International Conference on Field-Programmable Technology (ICFPT), Tianjin, China, December 9-13, 2019. p. 10-18.
FPGA-Assisted Deterministic Routing for FPGAs2019-05-20. 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brasil, May 20-24, 2019. p. 155-162. DOI : 10.1109/IPDPSW.2019.00034.
RPCValet: NI-Driven Tail-Aware Balancing of µs-Scale RPCs2019-04-15. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '19, Providence, Rhode Island, USA, April 13-17, 2019. p. 35-48. DOI : 10.1145/3297858.3304070.
Timing Violation Induced Faults in Multi-Tenant FPGAs2019-03-25. Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, ITALY, Mar 25-29, 2019. p. 1745-1750.
Linebacker: Preserving Victim Cache Lines in Idle Register Files of GPUs2019-01-01. 46th International Symposium on Computer Architecture (ISCA), Phoenix, AZ, Jun 22-26, 2019. p. 183-196. DOI : 10.1145/3307650.3322222.
SMoTherSpectre: Exploiting Speculative Execution through Port Contention2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019.
Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores2019-01-01. 25th IEEE International Symposium on High Performance Computer Architecture (HPCA), Washington, DC, Feb 16-20, 2019. p. 15-27. DOI : 10.1109/HPCA.2019.00024.
Training DNNs with Hybrid Block Floating Point2018-12-04. NeurIPS 2018 - Neural Information Processing Systems, Montréal Canada, December 2-8, 2018.
Towards Commoditizing Simulations of System Models Using Recurrent Neural Networks2018-01-01. IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Aalborg, DENMARK, Oct 29-31, 2018.
Training DNNs with Hybrid Block Floating Point2018-01-01. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.
LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching2018. Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '18, Williamsburg, VA, USA, March 24th – March 28th, 2018. p. 489-502. DOI : 10.1145/3173162.3173211.
Parallel FPGA routing: Survey and challenges2017-01-01. 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, September 4-8, 2017. p. 1-8. DOI : 10.23919/FPL.2017.8056782.
Near-Memory Address Translation2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303-317. DOI : 10.1109/Pact.2017.56.
Unlocking Energy2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393-406.
An Analysis of Load Imbalance in Scale-out Data Serving2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016.
Towards Near-Threshold Server Processors2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7-12.
Sort vs. Hash Join Revisited for Near-Memory Execution2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.
Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014.
BuMP: Bulk Memory Access Prediction and Streaming2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014.
FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014.
Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013.
From A to E: Analyzing TPC’s OLTP Benchmarks -- The obsolete, the ubiquitous, the unexplored2013. 16th International Conference on Extending Database Technology, Genoa, Italy, March 18-22, 2013. p. 17-28.
Dark Silicon Accelerators for Database Indexing2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.
CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.
Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.
Reliability in the Dark Silicon Era2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V-V.
Proactive Instruction Fetch2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7.
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.
Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors2009. p. 195-201.
Spatio-Temporal Memory Streaming2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69-80.
Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184-195.
ReSim, a Trace-Driven, Reconfigurable ILP Processor Simulator2009. DATE 2009, Nice, France, April 20-24, 2009.
A Rate-based Prefiltering Approach to BLAST Acceleration2008. International Conference on Field Programmable Logic and Applications (FPL), Heidelberg, Germany, September 08-10, 2008.
Stall Power Reduction in Pipelined Architecture Processors2008. 21st International Conference on VLSI Design, Hyderabad, January 4-8, 2008. p. 541-546. DOI : 10.1109/VLSI.2008.34.
BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs to Avoid Congestion2008. Design, Automation and Test in Europe, 2008. DATE '08, Munich, March 10-14, 2008. p. 1408-1413. DOI : 10.1109/DATE.2008.4484871.
A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs2008. the 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February.
A UML Based System Level Failure Rate Assessment Technique for SoC Designs2007. 25th IEEE VLSI Test Symposium, Berkeley, May 6-10, 2007. p. 243-248. DOI : 10.1109/VTS.2007.9.
An Analysis of Database System Performance on Chip Multiprocessors2007. Athens, Greece, July.
PROTOFLEX: FPGA-accelerated hybrid functional simulator2007. Long Beach, CA, March. DOI : 10.1109/IPDPS.2007.370516.
To Share or Not To Share?2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351-362.
Mechanisms for store-wait-free multiprocessors2007. San Diego, CA, June. p. 266-277.
Database Servers on Chip Multiprocessors: Limitations and Opportunities2007. Asilomar, CA, January.
ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development2006. Austin, TX, February.
The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors2006. Urbana-Champagne, IL, April.
Simulation sampling with live-points2006. Austin, TX, March. p. 2-12.
Spatial Memory Streaming2006. Boston, MA, June 17-21, 2006. p. 252-263.
TED+: A Data Structure for Microprocessor Verification2005. Asia and South Pacific Design Automation Conference, Shanghai, January 18-21, 2005. p. 567-572. DOI : 10.1109/ASPDAC.2005.1466228.
Accelerating Database Operations Using a Network Processor2005. Baltimore, USA, June.
Temporal Streaming of Shared Memory2005. Madison, WI, June 4-8, 2005. p. 222-233.
An Evaluation of Stratified Sampling of Microarchitecture Simulations2004. Munich, Germany, June.
Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth2004. Boston, MA, October.
Efficient resource sharing in concurrent error detecting superscalar microarchitectures2004. Portland, OR, December. p. 257-268.
Accurate and complexity-effective spatial pattern prediction2004. Madrid, Spain, February. p. 276-287.
Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches2003. San Diego, CA, June.
Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy2002. Anchorage, AK, May.
Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214-224.
Reducing set-associative cache energy via way-prediction and selective direct-mapping2001. Monterrey, Mexico, January 20-24, 2001. p. 54-65.
JETTY: Filtering snoops for reduced energy consumption in SMP servers2001. Monterrey, Mexico, January. p. 85-96.
Reference idempotency analysis: A framework for optimizing speculative execution2001. Snowbird, Utah, USA, June. p. 2-11.
Low-Overhead and High-Performance Implementations of Sequential Consistency2000. Vancouver, BC, June.
Address partitioning in DSM clusters with parallel coherence controllers2000. Philadelphia, PA, October. p. 47-56.
Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters2000. Bar Harbor, ME, July. p. 79-88.
Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator1997. Denver, CO, June.
Reactive NUMA: A design for unifying S-COMA and CC-NUMA1997. Denver, CO, June. p. 229-240.
Coherent network interfaces for fine-grain communication1996. Philadelphia, PA, May. p. 247-258.
Cost/performance of a parallel computer simulator1994. Edinburgh, Scotland, July. p. 173-182.
Application-specific protocols for user-level shared memory1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380-389. DOI : 10.1109/SUPERC.1994.344301.
Kernel support for the Wisconsin Wind Tunnel1993. San Diego, CA, January. p. 73-89.
Component Labeling Algorithms on an Intel iPSC/2 Hypercube1990. Charleston, SC, April. p. 159-164.
BibTex for all references found