So, I’ve heard that ML manipulates tokens and specifically for the English corpora they take place of words. If we want model to be polite and not to speak uncomfortable language we can remove certain words from the internal array where all tokens and their associative data are stored, for example “fuck”.
You can.
With OpenAI for instance, you can modify the probability of a token to be output by setting its logit bias, as described here: https://platform.openai.com/docs/api-reference/chat/create#logit_bias
By setting it to -100 or +100 you can effectively ban or force it.