hardware-optimization
an archive of posts in this category
Aug 06, 2023 | FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning |
---|---|
Mar 28, 2023 | FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness |
Feb 28, 2023 | Jetson Nano Tensorrt 적용 |
Jul 13, 2022 | Quantization과 inference speed |
Jul 12, 2022 | Pytorch Tensorrt 적용 |
Jul 11, 2022 | Pytorch Quantization 적용 |