Udacity AI for Trading - Project 6: Analyzing Stock Sentiment from Twits ( Sentiment Analysis) needs a HTML parsering. Examples came from the actual course exercise.
Future improvements
I could identify, make it a group and parser each /texttext/
pattern, if necessary ← perhaps it may have some value in the future, but I am not quite sure about it
I could identify if the text was a result of a search pattern ?s=SSS?
(SSS ← for Some Stock Symbol), so if the user was looking for something about a specific stock or not, for future improvement on training
I could identify the key=9*
pattern, so I know the information was retrieved from a logged user into a site and not from a general search ← perhaps I should give more weight for this message
Note: as i make the text.lower() statement BEFORE processing this pattern, I can improve performance just removing CAPITAL letters. So, you can use something as:
patt_url = "https?:\/\/(www\.)?[-a-z0-9@:%._\+~#=]{1,256}\.[a-z0-9()]{1,6}(\?s=|\/)[a-z0-9()-.?_=&;#\/]{1,256}"