At SenticNet, we are working on several projects spanning from fundamental knowledge representation problems to applications of commonsense reasoning in contexts such as big social data analysis and human-computer interaction. Each project is led by a single member of the Sentic Team but, in fact, every project is highly interdependent and interconnected with each other as we are all driven by the same vision. Some of the current projects include:
• Multimodal Affective Computing Initiative (Lead Investigator: Soujanya Poria)
• SeNTU (Lead Investigator: Iti Chaturvedi)
• PrimeNet (Lead Investigator: Rajiv Bajpai)
• Big Social Data Analysis (Lead Investigator: Sandro Cavallari)
• Natural Language based Financial Forecasting (Lead Investigator: Frank Xing)
• Smart Cities (Lead Investigator: Yukun Ma)
• One Belt, One Road, One Sentiment? (Lead Investigator: Haiyun Peng)
• PONdER (Lead Investigator: Ranjan Satapathy)
• Dialogue Systems (Lead Investigator: Tom Young)
• Mood of Singapore (Lead Investigator: Claudia Guerreiro)
• Twittener (Lead Investigator: Owen Fernando)
• Singlish SenticNet (Lead Investigator: Danyuan Ho)
MULTIMODAL AFFECTIVE COMPUTING INITIATIVE
This initiative was born from a collaboration between NTU and CMU (Prof Louis-Philippe Morency's group) and gravitates around techniques, tools, and applications of affective computing, an emerging field of research that aims to enable intelligent systems to recognize, feel, infer and interpret human emotions. It is an interdisciplinary field that spans from computer science to psychology, and from social science to cognitive science. Though sentiment analysis and emotion recognition are two distinct research topics, they are conjoined under the field of affective computing. Over the past two decades or so, AI researchers have been attempting to endow machines with cognitive capabilities to recognize, interpret and express emotions and sentiments. All such efforts can be attributed to affective computing research.
Video opinions provide multimodal data in terms of vocal and visual modality. The vocal modulations of opinions and facial expressions in the visual data, along with textual data, can provide important cues to better identify true affective states of the opinion holder. Thus, a combination of text and video data can help create a better emotion and sentiment analysis model. These videos often contain comparisons of products from competing brands, the pros and cons of product specifications, etc., which can aid prospective buyers in making an informed decision. The aim of multi-sensor data fusion is to increase the accuracy and reliability of estimates. Many applications, e.g., navigation tools, have already demonstrated the potential of data fusion. This depicts the importance and feasibility of developing a multimodal framework that could cope with all three sensing modalities: text, audio, and video, in human-centric environments.
SeNTU is a sentiment analysis project based at NTU that involves several students and team members. The main goal of the project is to explore how different paradigms, e.g., machine learning, linguistics, and knowledge representation, can improve the performance of sentiment analysis. The framework consists in an ensemble of supervised and unsupervised approaches. In a pre-processing phase, text is normalized and tokenized plus POS tags are identified. Later, dictionaries are applied to recognize and process emoticons but also other microtext items such as acronyms and abbreviations. Next, a semantic parser is used to deconstruct natural language text into concepts and, hence, linguistic patterns are used in concomitance with SenticNet to infer polarity from sentences. If no match is found in SenticNet or no pattern is triggered, machine learning is used.
An early version of the framework has participated in one of the sentiment analysis tasks of the 2015 edition of the International Workshop on Semantic Evaluation (SemEval). Today, the framework includes many more NLP modules such as subjectivity detection, to filter out neutral content, temporal tagging, for time expression analysis and recognition, named-entity recognition, to locate and classify named entities into pre-defined categories, personality recognition, for distinguishing between different personality types of the users, sarcasm detection, to detect and handle sarcasm in opinions, aspect extraction, for enabling aspect-based sentiment analysis, and more.
This collaboration with A*STAR IHPC (Dr Kenneth Kwok's group) aims to set out a framework for a scalable knowledge base which allows for efficient processing, to meet the demands for commonsense and common knowledge to support intelligent machine performance in real-world tasks. On-going efforts to codify world knowledge into a machine-readable knowledge base (KB) include general knowledge as in Google’s Freebase and Knowledge Graph, Microsoft’s Satorii and Probase, YAGO and DBpedia. Much of this knowledge is represented as RDF triples (subject, predicate, object). Each KB tends to have its own ontology of entity types (objects, subjects) and relations (predicates): for example, Freebase has 1500 entity classes and 35,000 relations, while YAGO has 350,000 entity types and 100 relations. On the other hand, other KBs focus on commonsense knowledge, which goes beyond factual knowledge and includes tacit and procedural knowledge that is typically omitted from written texts. As such, commonsense KBs tend to be crowd-sourced (OpenMind ConceptNet) or manually curated (Cyc).
PrimeNet is a new commonsense KB that attempts to improve generalization performance by leveraging on the idea of conceptual primitives, identified through hierarchical clustering and dimensionality reduction. Generalization helps improve query hit rate by associating instances of concepts with related higher-level concepts. For example, verb concepts such as eat, slurp, and munch could be related to a conceptual primitive INGEST, and noun concepts like pasta, noodles steak with an ontological parent FOOD. When a query “eat pasta” or “slurp noodles” is encountered and if such concepts are not present in the KB, they could then be handled through the generalised concept INGEST FOOD, instead of throwing up a not-found error. However, there are limitations to automatic discovery of conceptual primitives, which we hope to address in our current work by augmenting the effort with a hand-crafted approach.
BIG SOCIAL DATA ANALYSIS
Big social data analysis is about processing big volumes of online social data coming from different sources in various formats and at different paces. Big social data analysis represents a holistic approach to the study of interaction between web users. Early works focused on measuring either the intensity of such interaction (e.g., social network analysis) or the content of it (e.g., sentiment analysis). This project aims to concomitantly collect, aggregate, and process both content and intensity of online social interaction. In particular, we use sentic computing for the former, and community embeddings for the latter.
Most existing graph embedding methods focus on nodes, which aim to output a vector representation for each node in the graph such that two nodes being "close" on the graph are close too in the low-dimensional space. Despite the success of embedding individual nodes for graph analytics, we noticed that an important concept of embedding communities (i.e., groups of nodes) was missing. Embedding communities is useful, not only for supporting various community-level applications, but also to help preserve community structure in graph embeddings. In fact, we see community embeddings as providing a higher-order proximity to define the node closeness, whereas most of the popular graph embedding methods focus on first-order and/or second-order proximities. To learn community embeddings, we hinge upon the insight that community embeddings and node embeddings reinforce with each other.
NATURAL LANGUAGE BASED FINANCIAL FORECASTING
This project in collaboration with Prof Roy Welsch from MIT Sloan School of Management focuses on natural language based financial forecasting (NLFF). NLP has become increasingly powerful due to data availability and various techniques developed in the past decade. This increasing capability makes it possible to capture sentiments more accurately and semantics in a more nuanced way. Naturally, many applications are starting to seek improvements by adopting cutting-edge NLP techniques. Financial forecasting is no exception.
As a result, articles that leverage NLP techniques to predict financial markets are fast accumulating, gradually establishing the research field of NLFF, or from the application perspective, stock market prediction. This project clarifies the scope of NLFF research by ordering and structuring techniques and applications from related work. It also aims to increase the understanding of progress and hotspots in NLFF, and bring about discussions across many different disciplines.
Fine-grained sentiment analysis, such as aspect-based sentiment analysis and target-dependent sentiment analysis, has become an important task for NLP. We are developing a novel solution to targeted aspect-based sentiment analysis, which is a task combining challenges of aspect-based sentiment analysis and target-dependent sentiment analysis, with a specific focus on urban neighborhoods. Analyzing people’s opinions and sentiments towards a set of fine-grained aspects of targeted locations (e.g., streets or districts) attracts considerable attentions from both academia and industry for that it facilitates a number of applications in the context of smart cities.
We explicitly address three technical difficulties faced by this task: firstly, a target location might have multiple instances in a sentence, some of which may be objective (neutral) and, hence, should be ignored by the classifier; secondly, the sentiment of a target might be expressed via the global structure of a sentence, for example, via comparison with another target; lastly, external sentiment knowledge should be incorporated in the end-to-end training of a deep neural network for sentence-level sentiment classification.
ONE BELT, ONE ROAD, ONE SENTIMENT?
One Belt, One Road, One Sentiment? is a project in collaboration with Prof Andrea Nanetti, from NTU ADM, which aims to visualize the sentiment of the world towards President Xi's One Belt One Road initiative. Such an initiative is a $4-trillion development strategy and framework that focuses on connectivity and cooperation among 60 countries primarily between China and the rest of Eurasia and consists of two main components: the land-based Silk Road Economic Belt and ocean-going Maritime Silk Road.
One Belt, One Road, One Sentiment? aims to collect and analyze the reactions of the different countries involved in President Xi's initiative in real-time. Many economies, in fact, are affected by the initiative, which has been welcomed by some countries but contrasted by some others, e.g., the supporters of trading arrangements such as the Trans-Pacific Partnership and the Transatlantic Trade and Investment Partnership. The project employs SenticNet technologies to collect and analyze news and social media in many different languages from across the globe and, hence, visualize the real-time sentiment of the world towards the One Belt One Road initiative in real-time in a 3D dome.
PONdER (Public Opinion of Nuclear Energy) aims to collect, aggregate, and analyze opinions towards nuclear energy across Singapore, Malaysia, Indonesia, Thailand, and Vietnam. Understanding how the public perceives nuclear energy in the region enables policymakers to make informed national policies and decisions pertaining to nuclear energy, as well as shape communication strategies to inform the public about nuclear energy. More importantly, there is a dearth of research that looks at public opinion of nuclear energy in South-East Asia (SEA). Given the unique and distinct cultural and political landscape of SEA, it is pertinent to understand the socio-cultural factors that may influence public attitude towards nuclear energy. The results allow us to distill information packages that summarize and understand public opinion of nuclear energy that are beneficial for key stakeholders including policymakers, universities and industry players, and the community.
We use sentic computing to gauge public opinion towards nuclear energy in different languages. In particular, we focus on two sub-topics of sentiment analysis (besides, of course, polarity detection): subjectivity detection and aspect extraction. Subjectivity detection can be a useful tool for governmental agencies to find out which topics are particularly heated or controversial and, hence, act accordingly to prevent popular discontent. Aspect extraction is also a very important task in this context as opinions on nuclear energy are often expressed on multiple opinion targets. For example, “nuclear energy is good for the environment but safety is a great concern” is a subjective tweet about nuclear energy with “environment” and “safety” as two aspects. The proposed framework opens new avenues in helping government agencies to decide and act according to the need of the hour about management, planning and logistics related to nuclear power plants and other facilities across the nuclear fuel cycle.
Building dialogue agents that can converse naturally with humans is one of the most challenging yet intriguing problems of artificial intelligence. Most recent methods have focused on training conversational models by solely relying on a large number of message-response pairs mined from social media, where the dialogue modeling problem is defined as learning a function for mapping messages to responses.
In this project, we investigate the impact of providing commonsense knowledge about the concepts covered in the message. By using such knowledge as additional input, we transform the traditional dialogue modeling problem into a new one that maps message-commonsense tuples to responses. Our model represents the first attempt to integrating a large commonsense knowledge base into conversational models and shows considerable improvement over traditional models in an open-domain dialogue dataset.
MOOD OF SINGAPORE
Mood of Singapore is a project in collaboration with Prof Vibeke Sorensen, Chair of NTU ADM, which aims to visualize the emotions of the Little Red Dot in real time. Singapore-geolocated social data is collected and processed by SenticNet technologies and visualized according to the emotion-color mapping of the Hourglass of Emotions. The results of such mapping are displayed through an interactive sculpture, a dynamic architectural installation that has as its center-piece a large ‘arch’ or ‘doorway’ that emits colored light and animates in reflection of the live emotions expressed by people based in Singapore communicating through networks such as Twitter. It rethinks the term ‘public art’ in the context of social transmodal transmedia. The ‘arch’ or ‘doorway’ is iconic and references developmental transformation, the metaphoric passing from one state to another, of growth and change that is analogous to the transformative effect that communications technologies have upon our collective human condition.
The arch also signifies human transformation of the environment, today both physical and digital, as this iconic form has been used across different cultures in Singapore. This ‘doorway’ is reflected within a wall-mirrored room where it is repeated into a tunnel-like shape, an infinity of doorways that exist as an endless cycle, or echo, of past and future in space and time, and collapsing into the eternal present. A wooden pathway traverses the room through the doorway, and connects the two mirrored walls and thus creates the ‘infinite’ pathway for the audience to walk upon. The arch sculpture is made of 30 building blocks of ‘digi-tiles’. Consisting of crushed recycled glass and custom electronics, they emit colored light and the colors and shapes change in real time based on a real-time analysis of semantics and sentics emanating from sources such as Twitter from all over Singapore. The current mood of the people of Singapore through color and motion thus become an immersive presence, a dynamic rainbow that bathes us in light. One installation of Mood of Singapore is currently operational at NTU Experimental Medicine Building.
Twitter is a popular social media service, as it allows users to get rapid and concise information, and to follow the latest online trends and topics. This project aims to improve user experience by proposing an alternative way to interact with Twitter, by allowing users to listen to interesting tweets, instead of the conventional way of reading them. This will allow users to get up to date with tweets from publicly-listed, categorized Twitter accounts, without needing to pay full attention to their screens.
Twittener could be useful for populations with physical disabilities, visual impairments, the elderly and persons who multitask. A web application has been developed to allow users to listen to tweets. Additionally, a console application periodically crawls through Twitter retrieving and converting tweets from text to speech, thus making up the Twittener system.
Singlish, or Singapore Colloquial English, refers to an English-based creole language spoken in Singapore. A product of colonial implantation, English as spoken by the British is gradually 'nativized' into Singlish due to extensive language borrowing and mixing with other language in the linguistic environment of Singapore. The end product of language contact is Singlish, an English variety that shows a high degree of influence from other local languages such as Hokkien, Cantonese, Mandarin, Malay and Tamil. Singlish SenticNet (available both in RDF/XML format and as an API) is a concept-level resource for sentiment analysis in Singlish that provides the semantics and sentics (denotative and connotative information) associated with more than 5000 words and multiword expressions. These concepts are crowdsourced (e.g., through games) and encoded redundantly at three levels, namely: as a semantic network, as a matrix and as a vector space. Each representation is useful for a different kind of reasoning: the semantic network specifies the relationships between concepts and, hence, it is useful for tasks such as question answering; the matrix allows for the inference of new pieces of commonsense knowledge based on shared semantic features; finally, the vector space is a powerful tool for analogical reasoning.
The vector space model, in particular, emulates how the human mind constructs intelligible meanings by continuously compressing over vital relations. The compression principles aim to transform di use and distended conceptual structures to more focused versions so as to become more congenial for human understanding. To this end, principal component analysis is applied on the matrix representation of Singlish commonsense knowledge. In particular, truncated singular value decomposition has been preferred to other dimensionality reduction techniques for its simplicity, relatively low computational cost, and compactness it is particularly suitable for measuring the cross-correlations between affective commonsense concepts as it uses an orthogonal transformation to convert the set of possibly correlated commonsense features associated with each concept into a set of values of uncorrelated variables.