Data Warehouse Design to Support Social Media Analysis in a Big Data Environment
The volume of generated and stored data from social media has increased in the last decade. Therefore, analyzing and understanding this kind of data can offer relevant information in different contexts and can assist researchers and companies in the decision-making process. However, the data are scattered in a large volume, come from different sources, with different formats and are rapidly created. Such facts make the knowledge extraction difficult, turning it in a complex and high costly process. The scientific contribution of this paper is the development of a social media data integration model based on a data warehouse to reduce the computational costs related to data analysis, as well as support the application of techniques to discover useful knowledge. Differently from the literature, we focus on both social media Facebook and Twitter. Also, we contribute with the proposition of a model for the acquisition, transformation and loading data, which can enable the extraction of useful knowledge in a context where the human capability of understanding is exceeded. The results showed that the proposed data warehouse improves the quality of data mining algorithms compared to related works, while being able to reduce the execution time.
This paper presented a normalized data warehouse schema for modeling social media data from two different social media platforms: Facebook and Twitter. Carlos Roberto Valêncio et al. / Journal of Computer Science 2020, 16 (2): 126.136 DOI: 10.3844/jcssp.2020.126.136 135 The normalized DW avoids redundant data to be stored, which can efficiently reduce the execution time of data mining algorithms. The ETL stage was described and four DM algorithms were applied to validate the model. Our experiments showed that our model can efficiently assist the analyst in the decision-making process, while being able to execute faster than the related works. In addition, the DW has focus on opinion analysis, which means that we do not concern about the content of the post or different analysis. Our major study considered the quantitative attributes from the publications and the classification of the comments into positive, negative and neutral. As future work, we recommend the use of a NoSQL database to treat the scarcity and the excess of attributes. It is also interesting to adapt the DW to include content of others social network, despite the fact that related works commonly deal with only one.
American Journal of Computer Science and Engineering Survey (AJCSES) is a peer review open access journal publishing the state of the art research in computer science and engineering survey.
American Journal of Computer Science and Engineering Survey