Replies: 2 comments
-
Hi! I'm in a similar boat. I think that would make sense. The main points of TE specialised kernels and fp8 support right? Since the only computation from the embedding layer is indexing, I'm not sure if te would have any fancy kernel. And as for fp8 support, the two aspects are storage and computation. I don't think te supports fp8 storage, and again there isn't much computation going on. Perhaps I'm missing some advantage of te modules? |
Beta Was this translation helpful? Give feedback.
0 replies
-
There are a few challenges with FP8 embedding layers:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to port my end-to-end transformer implementation to TE, but I'm missing the embedding layer - everything else is available here (Linear, Softmax, MHA, MLP). Is this expected or out of scope? Should I use raw pytorch embedding layers for token and position embedding?
Beta Was this translation helpful? Give feedback.
All reactions