Based on my testing, HTML5 pattern attributes supports Unicode character code points in the exact same way that JavaScript does and does not:
- It only supports
\u
notation for unicode code points so \u00a1
will match '¡'.
- Because these define characters, you can use them in character ranges like
[\u00a1-\uffff]
.
will match Unicode characters as well.
You don't really specify how you want to pre-validate so I can't really help you more than that, but by looking up the unicode character values, you should be able to work out what you need in your regex.
Keep in mind that the pattern regex execution is rather dumb overall and isn't universally supported. I recommend progressive enhancement with some javascript on top of the pattern value (you can even re-use the regex more or less).
As always, never trust user input - It doesn't take a genius to make a request to your form endpoint and pass more or less whatever data they like. Your server-side validation should necessarily be more explicit. Your client-side validation can be more generous, depending upon whether false positives or false negatives are more problematic to your use case.