How to Extract a Twitter Profile URL (But Not Status URL) with a Regex
I've fairly recently added support within my site to replace URLs to Twitter profiles with @-username, so when the note was syndicated to Twitter it would be correct. However, because this is within the site's theme it depends completely on what is being used to render my site. This is less than ideal, and means if I were to move themes in the future, it would need to be reimplemented.
On Thursday, I noticed that the regex I'd been using hadn't quite worked:
#WiTNotts is kicking off with the wonderful @anna_hax and https://twitter.com/CarolSaysThings (https://t.co/OuhIIDsBjO) pic.twitter.com/erJejo6Un3
— Jamie Tanna | www.jvt.me (@JamieTanna) February 6, 2020
In this case the URL hadn't been caught as it didn't handle the URL being at the end of line, and with the two of these reasons in mind, I sought to rewrite it.
My goal was to match only the profile URLs i.e. https://twitter.com/JamieTanna
, not status URLs such as https://twitter.com/JamieTanna/status/1225494506558164992
.
The Regex I've come up with is:
(https:\/\/twitter.com\/(?![a-zA-Z0-9_]+\/)([a-zA-Z0-9_]+))
In my case, I've implemented this with a negative lookahead, which allows me to ignore the whole match if it ends with a /
, as that would indicate it's a status URL.
You can see it in action at regexr.com/4tsfr.