PSU and Hitachi Jointly Develop Sentiment Analysis Engine for Thai Language
Achieves sentiment analysis of spoken language expressions unique to social media, based on a Thai language database containing around 100 million words
Tokyo, February 20, 2019 – Prince of Songkla University Phuket Campus (“PSU”; a national university whose headquarters are located in Hat Yai, Songkhla Province, Kingdom of Thailand, with five campuses in Southern Thailand) and Hitachi, Ltd. (TSE: 6501, “Hitachi”) have commenced a program of joint research in the field of Thai natural language processing*1. As the first phase of this research, both parties have jointly developed an AI-driven prototype sentiment analysis engine capable of classifying documents written in the Thai language into positive, negative, and neutral sentiment categories.
The engine analyses expressions using a sentiment dictionary refined based on approximately 100 million words of the Thai language data gathered from various social media. Although Thai has numerous specialised spoken language expressions and is known for being more difficult to process than other languages, the engine can perform high-precision sentiment analysis with support for spoken Thai expressions used on social media.
PSU and Hitachi will work towards practical applications of the engine, utilising real-time data posted on Facebook, Twitter and other social media to perform joint evaluations of the prototype, with plans to implement it as part of Hitachi’s Sentiment Analysis Service*2 come April 2019.
In Thailand, mobile devices are gaining widespread popularity at an exponential rate, with a high rate of social media penetration across the country. Thailand has a population of around 69 million, out of which approximately 50 million are Facebook users, and 12 million are Twitter users*3. There is demand to develop products and services based on messaging patterns unique to Thai people on social media. The Thai language has numerous unique spoken language expressions, and many users make frequent use of non-standard spellings, newly created onomatopoeic words and emoticons, which can be difficult to process. Because of such challenges, a massive amount of data pre-processing is required, in order to compensate for inconsistencies in how expressions are displayed in written form.
PSU has closely investigated around 100 million words of the Thai language data gathered from various social media, and constructed a large-scale, highly accurate Thai sentiment dictionary, making it one of the leading research institutions in the field of Thai language processing.
Hitachi, as of October 2018, began offering its Sentiment Analysis Service, a technology capable of classifying and visualising customer voices—gathered from Japanese language media, conversation records and various other sources—into around 1,300 topics, feelings and intentions.
The engine is the first collaboration product between PSU and Hitachi, taking advantage of the results of the research conducted by PSU and the system architecture design expertise and data processing technologies elaborated through Hitachi’s large-scale systems development activities. Utilising hybrid noise-removal functionality (which combines machine learning with several other technologies) and PSU’s large-scale, highly accurate sentiment dictionary, the engine is able to analyse Thai language on social media, while handling a diverse range of unique expressions and spellings.
PSU and Hitachi will continue working together to validate the performance of the engine when real-time data is used, further refine analysis accuracy (such as by providing support for seven-stage sentiment evaluation*4), and aim to commence service provision through Hitachi as of April 2019. Beyond that, both parties will continue to engage in joint development with the aim of further enhancing the analysis engine, such as by adding sentiments based on consideration of context.
Computer processing of language that is typically used for everyday communication (i.e. natural language).
As of January 2019. Source: StatCounter “Social Media Stats Thailand”
Seven-stage evaluation consists of 3 positive stages, 3 negative stages and one neutral stage, for a total of seven stages.
About Hitachi’s Sentiment Analysis Service
The sentiment analysis engine developed on this occasion will be offered as the Thai language version of Hitachi’s Sentiment Analysis Service as of April 2019. Leveraging Lumada Center Southeast Asia (established at Thailand’s Amata City Chonburi Industrial Estate) as a local hub for introducing the service, Hitachi Asia (Thailand) Co., Ltd. will take a central role in deploying and offering the service to a wide range of customers including both local branches of Japanese-based companies and local Thai companies, in areas such as automotive manufacturing, hotels, hospitals, banks and public institutions.
Hitachi, Ltd. (TSE: 6501), headquartered in Tokyo, Japan, delivers innovations that answer society’s challenges, combining its operational technology, information technology, and products/systems. The company’s consolidated revenues for fiscal 2017 (ended March 31, 2018) totaled 9,368.6 billion yen ($88.4 billion). The Hitachi Group is an innovation partner for the IoT era, and it has approximately 307,000 employees worldwide. Through collaborative creation with customers, Hitachi is deploying Social Innovation Business using digital technologies in a broad range of sectors, including Power/Energy, Industry/Distribution/Water, Urban Development, and Finance/Social Infrastructure/Healthcare. For more information on Hitachi, please visit the company’s website at http://www.hitachi.com.