An architecture for Malay Tweet normalization
作者:
Highlights:
• To observe features of Malay Tweets, three distinct corpus-based analyses are done.
• A rule-based architecture is developed based on results of the analyses.
• The architecture consists of seven distinct modules in a pipeline structure.
• Experimental results indicate high accuracy in term of BLEU score.
• The architecture outperforms SMT-like normalization approach.
摘要
•To observe features of Malay Tweets, three distinct corpus-based analyses are done.•A rule-based architecture is developed based on results of the analyses.•The architecture consists of seven distinct modules in a pipeline structure.•Experimental results indicate high accuracy in term of BLEU score.•The architecture outperforms SMT-like normalization approach.
论文关键词:Malay,Twitter,Text normalization,Noisy text
论文评审过程:Received 10 October 2013, Revised 25 April 2014, Accepted 28 April 2014, Available online 24 May 2014.
论文官网地址:https://doi.org/10.1016/j.ipm.2014.04.009