Real-Time Data and Smart Cities

Bryan Schafroth

2018/03/20

Real-Time Data and Smart Cities

    The Internet of Things (IoT) is a concept suggesting the number of devices connected to the Internet will continue to grow. The Internet has computers and mobile devices connected to it. The expectation is IoT will become more significant with the increasing use of sensors, actuators, and embedded devices. Currently, the Internet connects devices through wired and wireless technologies. Devices also can communicate over a network and then transfer the data to the Internet. According to Mahdavinejad et al. (2017), this technology is to make our activities and experiences easy and elevating.

    Under the umbrella of the IoT is the concept of smart cities. Sta (2017) identifies the smart city environment containing: city infrastructure, city transport, and people in the area. The smart city of the future will operate sustainably on only the resources needed. All the data collected from sensors that monitor the city infrastructure and environment will reveal insights. Real-time analytics can make the smart city operationally efficient. An IoT enabled smart city will operate resourcefully with minimal impact on resources. Grover and Walia (2015) indicate Philadelphia has been using sensors to monitor refuse containers throughout the city. The sensors transmit a signal when the bins are full and ready for pickup. The monitoring system reduces labor and fuel because it minimizes time wasted for city employees to drive and empty all the refuse bins when the bins may not need emptying. In one year of implementing the system, Philadelphia saved $1 million (p.1). Grover and Walia present another example from Ontario, Canada. The electric company installed 4.8 million smart electric meters that generated information for the electric utility company to distribute power throughout Ontario. The system enabled the efficient use of resources for power generation and distribution. The utility realized savings of CAD 1.6 billion (p. 5). The IoT will enable a smart city to become resourceful and an autonomously functioning urban system. A smart city will operate on a massive scale of data that the sensors generated, and IoT will allow transmission of the data for collection, monitoring, analysis, and decision making.

    In 2015 there were 2.4 quintillion bytes of data generated every day by an estimated nine billion sensor devices (Puschmann, Barnaghi, & Tafazolli, 2017, p.1). The cost of IoT sensors has become a reasonable expense for cities to start implementing the technology. Bibri (2017) said there are many different types of attached sensors in a smart city (p. 235):

Birbri and Krogstie indicate the sensor classification is monitor and measure: location, light, sound, temperature, heat, electrical, pressure, speed, motion, chemical, and air (p.15). Sensors are producing an increasingly varied amount of data at an accelerated rate (Zhang, Chen, Chen, & Chen, 2016). According to Nathali Silva, Khan, and Han (2017), one million smart meters of a city, in one year, will record data every 15 minutes and produce 2920 terabytes of data (p. 4). In contrast, when the one million meters record information every hour, the total data accumulated in one year is 730 terabytes (p.4). This example demonstrates an exponential increase in data depending on the sampling interval, and it isn’t feasible to record that much data in real-time. The smart city becomes “smart” only when all the real-time sensor resources operate in unison. Real-time data processing power is not fully capable at the scale needed for a large smart city now (Mohanty, Chipili, & Kougianos, 2016). Real-time processing of large quantities of data is critical to the future potential of smart cities. Real-time data processing in smart cities is a significant challenge to overcome. The literature review is to understand and compile what has been written about real-time streaming big data in the context of the smart city. This literature review analyzes real-time data processing research and the limitations in the current knowledge.

Research Problem

    Full-scale smart cities will generate large amounts of data. Currently, there are no technologies capable of processing and analyzing the complexity of generated data in a large-scale smart city. Barbar and Arif (2017) say the massive increase in real-time sensor data of a smart city will not be processed and analyzed using the current procedures used on static datasets today. Sta (2017) also says to consider the ambiguities and imperfections when generating multi-source data in real-time. One of the many challenges to overcome in smart city research is discovering ways to process flawed and inconstant data streams for the best real-time data analysis. Mentioned by Nathali Silva, Khan, and Han (2017), current data processing and analytic platforms are incapable of handling the real-time processing demands of a smart city. Rathore et al. (2017a) stated it would be challenging to maintain the current methods used to collect, aggregate, and analyze data in a high-velocity smart city environment. Therefore, it is critical to find new ways to process real-time data.

    There is no straightforward way to overcome the complex challenges of how smart cities will process mass quantities of sensor data. There are several smart city testbeds in the world. However, the smart city concept is still in its infancy stages. In line with Barbar and Arif (2017), the challenge and difficulty in changing over to a capable real-time system will not have a straightforward method to create a smart city in a short time. One of the issues is timing. Big data takes longer to process and analyze. Individual circumstances, like hazards or life-threatening situations, require immediately actionable results from the data stream. Nathali Silva, Khan, and Han (2017) state that the smart city is not fully developed and is still in its early development stages. There is a need to process large volumes of real-time data and act on it. Ali et al. (2017) believe real-time streaming data analysis is still relatively new, and few solutions exist. However, the demand is there for the fast-moving and varied exchange of information between sensors and processing technology. Lu, Wang, Wu, and Qiu (2017) explain that in smart cities, the traffic surveillance systems need to operate in real-time with low latency. There must be an accurate detection of data patterns to prevent issues in the traffic.

Relevance

    This literature review will identify prior research and studies regarding real-time data processing and analysis applied to the smart city context. The literature review will describe the methods, models, and platforms used in smart city data research. The goal is to find current case studies or experiments with real-time processing at the smart city scale. The intended investigation of the research problem is to find what is currently in the literature to address real-time analytics of streaming data and the application toward smart cities. There now are experiments that model a smart city’s massive scale of data in simulated real-time. This literature review provides an overview of the best real-time analytics for smart cities. This review is a good entry point for further research on the topic. The following will describe numerous studies performed by scholarly leaders in the research community who address real-time data in the smart city concept. This literature review provides information from the studies and experiments and presents a brief introduction and summary of each article.

Literature Review

    Malek et al. (2017) designed a real-time data experiment in the context of the healthcare environment. The authors utilized Apache Storm and found they were able to stream a small dataset. The data was monitored and processed successfully. In subsequent experiments, the researchers said there is a need to integrate more sensor data and test additional algorithm modeling. They would like to see more sensor data to simulate the more significant network processes of a smart city. Further research would also address applications in smart buildings, a component of the smart city.

    Rathore, Paul, Ahmad, Anisetti, and Jeon (2017b) did an experiment related to healthcare. The authors were able to develop a small-scale intelligent care system. The system streamed and analyzed data. The authors showed streaming data had correlations between data generated from body sensors that measured blood pressure, pulse, diabetes signs, and the temperature of the skin. Hadoop with MapReduce was used to process sensor data from multiple devices. The system incorporated the smart building because there was a more extensive computing system that could collect and manage the streaming data. The collected data were exported and analyzed with statistical algorithms to make decisions. The outcomes could be: alerting the patient to take a prescription, calling a physician, calling an ambulance, or notify the police department. The authors confirmed the findings by reprocessed and checked for accuracy to identify the outcomes (p. 3) correctly. The authors concluded the system did as intended and was successful in sending out alerts. There were more tests needed to improve the capabilities that support the complex statistical analysis. This research could have future benefits by applying the model to other smart city infrastructure, such as traffic monitoring.

    Akbar, Khan, Carrez, and Moessner (2017) propose a solution for traffic events. The study uses real pre-recorded data generated by the city of Madrid in Spain. The analytic model is a machine learning algorithm that predicted the early warning of pending traffic events with 96% accuracy (p.1571). The experiment simulated near real-time using complex event processing (CEP) to identify meaningful events. And the authors said the architecture was reliable and effective at predicting traffic events. Where the system used machine learning applications in Python SciKit-Learn, the model could run larger scaled datasets. However, the cost to run the larger dataset was not efficient because of the hardware resources needed. A fully operational smart city could easily have more than 100,000 sensors producing data that measures vehicles per hour and average traffic speeds (p.1576 & 1580). This study used data from under 1,000 sensors from Madrid’s test program. The author’s believed they could apply the model to other smart city applications such as supply chain logistics. There is still room to develop the earlier prediction capabilities to administer solutions quickly.

    Kumar et al. (2016) produced an experiment that used four of Melbourne, Australia’s datasets from the smart city test program. The datasets were created from nine sensors using 10-minute sampling intervals for 72 days (p. 37). The environmental sensing network included measurements of light, humidity, and temperature. The experiment used a processing model called visual assessment of tendency (VAT) and improved visual assessment of trends (iVAT) (p.3). The VAT/iVAT models use heat maps to represent clusters of data points. Optical anomaly detection for streaming data showed the application worked in several tests. The experiment used a time-series machine learning technique, starting with a sliding window to make smaller chunks of data for the clustering analysis. The observed results from the data stream were cluster formations that demonstrated the ability of the models to detect unusual patterns called anomalies (p. 29). The authors have noted, the VAT model has limitations on the maximum number of visualized data points at 5,000 (p. 38). Constraints on the software platform and hardware capabilities ended the experiment. Hence, more powerful computing hardware is needed, and a scalable version of the software proposed to run in incremental steps so as not to overload the computations of the software at once. The experiment was in a closed system simulating the datasets as real-time data. More tests are needed to get to a smart city scale.

    Aly, Elmogy, and Barakat (2015) cite a study by other researchers who used a MapReduce model. The study analyzed the ability to mine data and valuable information related to an IoT simulation. The authors said the traditional way to mine data is with the statistical model called the Apriori algorithm (p. 309). Data produced by smart city sensors was not suitable for using the Apriori model. Operating efficiency was lower because the memory in the system was full. Using up full memory resources means there are limits on what the hardware can do and its cost-effectiveness to use more hardware to meet the demands. The authors concluded by proposing a more efficient model that uses fewer memory resources when streaming data.

    Grover and Walia (2015) present a case study to review a big data platform named City-Data and Analytics Platform (CiDAP) (p. 8-9). CiDAP is used to analyze the data from a smart city test site in Santander, Spain. Santander has sensors deployed and recording real data. CiDAP can utilize historical, near real-time, and real-time sensor data (p.9) and built specifically for Santander. Grover and Walia explain the ambulances have sensors that can interface with sensors in the roadway. The communication between sensors is direct and in real-time and helps the ambulance to navigate through the city during emergency responses. The authors explain the 1,112 road sensors frequently send data every 60 seconds and, in three months, send 50 GB of data (p. 9). However, during an emergency response, the sensors communicate directly in real-time with the ambulance sensors helping to navigate the fastest route. The CiDAP platform collects the data and communicates with city applications (CityModel API’s) (p. 9). Apache Spark runs the data and can also manage more intensive data analysis (p. 9). This system is not an entirely real-time system because it uses historical data for analysis. The researchers said the platform is scalable and integrable into other smart city infrastructure. Li, Ota, and Dong (2018) experimented using a deep learning concept called a convolutional neural network (CNN), which is an image classification algorithm (p. 96). The experiment analytics is based on deep learning and uses edge computing to limit network latency by moving the pre-processing hardware closer to the sensors. Edge computing (network) is under cloud networking and reduces the amount of data transferred on the network (p. 98). Using the edge network optimizes the performance of the algorithm. The experiment uses images of dogs and cats to simulate a video feed. The CNN algorithm identifies the objects in the video. This application can apply to traffic surveillance video streams. Limitations were found in the edge network because the CNN deep learning algorithm runs in multiple layers, ten in this experiment (p. 99). Each layer makes for better accuracy. However, more layers will increase the computing overhead of the edge server. The edge server reached a limit in this case. The recommendation by the authors is better scheduling of learning tasks in the server for the CNN to run deeper layers more efficiently. The authors conclude the test outcome can run more deep learning tasks with an edge computing network. This experiment did not use real-world data streams, as mentioned above. However, the authors propose future research and shall apply real-time data to this deep learning and edge computing framework.

    Malek et al. (2017) conducted a smart building experiment. The experiment locates people in the building who are wearing a monitoring device that measures heart rate and blood oxygen levels. Inside the building, there are environmental monitoring sensors that measure the concentration of gases in the air. The participants are exposed to CO2, a bioeffluent (organic air pollutant expelled from humans or animals), (p. 432) for one hour. The sensor data is streaming in real-time. “Kaa” is the IoT application that does real-time data collection (p. 433) and enables the connection of devices. The data is processed in Apache Storm, measuring the real-time levels of carbon dioxide concentrations in the air. The authors were able to see a real-time relationship between the air pollution levels and the occupant’s sensor measurements. The study was considered a successful small-scale prototype capable of real-time processing and monitoring. It is a starting point for a smart building application that can measure air pollution and automatically turn on the building’s ventilation system to exhaust the gases and bring in the fresh air, which is energy efficient since the ventilation would only run when needed. The experiment was preliminary and has created a useful framework for future IoT studies.

    Nathali Silva, Khan, and Han (2017) have based an experiment on a smart city environment. The traffic sensor data, the parking space data, the water demand data, and city pollution data were used in this experiment and are real historical data derived from two test cities. The datasets contained raw unstructured data. The authors use the Kalman filter algorithm, also known as a linear quadratic estimation (p. 6). The algorithm runs in real-time and filters the data stream, which removes the noise from the data. Noise can lead to reduced accuracy in the analysis. The output of the cleaned data is an estimated variable based on a previous state (p. 6). The authors deemed the algorithm very effective for this experiment because it reduces the processing time in subsequent steps. The data is rapidly processed and streamed through two Hadoop clusters to measure the processing time and speed of data, which simulates a real data stream. The authors discovered the larger the dataset, the longer to process the data. Because they streamed the data chronologically, with the traffic dataset, the system predicted pending traffic congestion. Streaming the city parking dataset, the model created alerts in real-time notifying where open parking spots are available. The third dataset came from Surrey, British Columbia, and was a water usage analysis that could detect when there would be an upswing in water consumption. In a practical application, the event detection can alert the water department of increases in water consumption and take precautionary or corrective action such as increasing flows through water treatment plants. The fourth dataset came from Aarhus, Denmark, and was air pollution levels over a chronological timeline. The author’s analytics model can predict a threshold value when pollution was about to increase. The smart city application could use the predicted spike in air pollution to send an automated alert. Traffic detection patterns can change the flow of traffic in affected areas to allow the air to stabilize. The Kalman algorithm did the pre-process filtering before sending the data to the Hadoop framework. The results showed the platform could successfully predict events within a preset value threshold. The experiment demonstrated the applicability of the system using historical data. It seemed adaptable to a real-time data environment.

    Puschmann et al. (2017) conducted an experiment based on smart city data from the test city of Aarhus, Denmark. The investigation looks at how different data streams can show correlation. The first dataset was traffic data. The traffic dataset is composed of one hundred sensors that measured average speed and counted the vehicles passing at five-minute intervals (p. 8). The second dataset was measurements of wind speed and temperature taken at twenty-minute intervals for two months (p. 8). The authors were interested in finding the correlation and meaning between the two datasets. The authors believe that by finding a functioning framework, they could further apply the working model to other smart city situations. The authors took different data structures and used a semantic model to interpret the various patterns. Then the datasets could be analyzed together, looking at correlations and co-occurrence patterns (p. 2). Symbolic Aggregate approximation (SAX) (p.2) does the transforming of numeric to text data. While a latent topic model (latent Dirichlet allocation, LDA) (p.2) processes the data, which is considered a natural language processing model useful for finding the hidden structures and relationships in text-based data (p. 2). The authors said the model they developed could collect streaming data and predict the patterns of correlation. The model can provide proactive insight into streaming real-time for any IoT data stream (p. 2). The work done in this experiment was to develop and train a model that would analyze streaming data. The outcome was to make predictions in near real-time. The usefulness of making predictions in real-time would be for preventing congested traffic. The experiment was promising. The authors suggest testing in future studies with more than two datasets. Testing with multiple data streams will further reinforce the method that has wider-reaching applications across many large and varied data streams within the smart city. The authors were careful to publish everything used in this experiment for other researchers to reproduce the experiment and contribute further to this experimental concept.

    Ranjan, Thakker, Haller, and Buyya (2017) discuss the state of IoT and big data decision making processes. The authors say looking at the data for patterns and correlations are suitable methodologies for smart city data analysis. The authors argue that some of the current technologies like cloud computing, batch and stream processors (Hadoop & Spark), and NoSQL databases do not work well with exploratory data analysis, data browsing and visualization, and knowledge graph search engines (p. 495). The authors suggest researchers look at the semantic statistical models to address the challenges with exploring large datasets. The article focuses on the effectiveness of using semantic context (transforming numeric data into text data for analytics). It is possible to deploy the semantic models throughout a smart city analytics system. The purpose is to effectively utilize multiple sensor streams to assimilate information to use in the analysis of the data. The authors have stated, more cities need to leverage the streaming data produced by the increasing number of installed sensors. The benefit of leveraging streaming data is to solve city problems, save money, and to become more efficient by using fewer resources. The authors outlined the significant challenges of combining a multitude of smart city sensor streams in real-time. The context-aware communications between sensors are one area to investigate. The authors made their point clear that current analytics and data processing frameworks would not fulfill the task of real-time streaming data processing in a smart city.

    According to Rathore et al. (2017a), it is a challenge to integrate all the numerous sensors found in a smart city environment. The problem is due to the enormous data volume produced by the city sensor network. Linking all the data and transmitting it to one place for processing (p. 3) is also an area of concern. The authors experimented in the Hadoop eco-system with Spark and Giraph (p.10). The experiment’s focus was on real smart city data from Aarhus, Denmark, Surrey, British Columbia, and Madrid, Spain (p. 6-7). Each experiment simulated realistic streaming velocity seen in the urban infrastructure. The purpose of the simulation was to decrease the processing time and increase the throughput of the data. Big data produced from sensors in a smart city need impeccable real-time processing (p. 9). Recreating a test experiment to produce the desired real-time output was accomplished by replaying the datasets chronologically based on the timestamp attribute in the dataset. The authors noted that the larger the data quantities became (data attributes, number of timestamp intervals, duration of test data, etc.), the more time it takes to process the data shown by the increase in latency. This experiment’s test method performed adequately using the existing datasets to achieve near to real-time. The authors concluded the five cluster nodes in Hadoop with Spark, and Giraph was scalable and capable in a real-time environment. However, live data streams are needed to build upon this experiment. The method still needs further testing due to the examination only simulating real-time data in a controlled test environment.

    Tang et al. (2017) focused the experiment on the networking aspect of IoT and smart cities. The concept they explored is called “Fog Computing” (p. 2141). Fog Computing is relevant to Edge Computing discussed earlier in this review and is where the data pre-processing is closer to the sensors’ physical location. Fog Computing reduces the latency when sending large amounts of data over the Internet to cloud servers. Low latency is vital for real-time analytics. A low latency environment provides the data analytics system the ability to act quickly on detected data events and anomalies. One of the problems described is in the central Cloud network where data can be loaded to and analyzed with sophisticated algorithms; however, the real-time component is lacking due to slower response time over a network. Fog Computing puts the computational work at the site of data collection, and data is processed quickly under this networking model, making it suitable for real-time applications in smart cities. The advantage goes to Fog Computing for smart city monitoring because it can support large amounts of data from sensors and does not use up network bandwidth and the energy resources needed for communications. Fog Computing supports real-time capabilities because the data is processed close to the source of data retrieval and is in a bandwidth limited place to respond to changes in a smart city environment.

    Zhang et al. (2016) conducted a study to address the concept of IoT and smart cities. One of the issues is the interpretation of large volumes of data generated in a smart city. The smart city has sensors in the surrounding environment, buildings, and infrastructure. With a massive amount of data streamed from sensors, there will need to be ways to incorporate the knowledge back into the system. Though several test cities use various technologies today, the fully realized concept of a smart city is still a futuristic concept. Future research will help to create a functional smart city. The authors recognize the challenges for smart cities of the future will be in dealing with the rapid growth of big data. Also challenging is processing the vast amount of data from various data streams cohesively. The authors propose a semantic framework using machine learning algorithms. The semantic framework improves upon the challenge of heterogeneous data by making it semantically uniform for faster analysis. The experiment used Hadoop Spark for streaming data. Two case studies had good preliminary results. The system the authors proposed was efficient in doing real-time computations. The experiment demonstrated the possibility of scaling the system with large data volume while measuring the variable latency. The data volume increased, and the latency delay stayed low enough to be acceptable in near or real-time streaming. The data used in one case study was pollution emissions from the test city Hangzhou, China (p.9). The data was sampled at intervals of one hour three times per day. A knowledge graph can determine that the intervals recorded data during rush hour time. It was essential to have a high number of data records (sensor data), which increases the accuracy of the semantic model. The model predicted future pollution. The second experiment used traffic data in Hangzhou. The model identified potential traffic patterns in the city by mining the data. The model showed when traffic was increasing and decreasing in specific areas of the city (p.9). The scalability of the model revealed a single node Spark cluster had a slower execution time than the 70,000 nodes execution time. The conclusion was scaling data volume up will work for other data streams. The authors never discussed the semantic model details but said the model demonstrated high efficiency (p.10). The authors did mention limitations in missing data records (incomplete data), such as broken or malfunctioning sensors. Data does not get generated when sensors stop functioning, and this reduces the amount of information for the semantic model. Missing data makes some calculations less accurate. Future tests are needed. The future needs will be more consistent data volume, real-world applications, the addition of historical data for better understanding of the semantic model, and creating a bigger distributed network to process larger data streams.

Conclusion

    Real-time data processing in smart cities is a significant challenge to overcome, as the literature review has illustrated. In the literature review, there were many different examples of how to achieve accurate predictions using real-time data. However, only one of the research teams in China had access to streaming data. The remainder of experiments simulated real-time data by streaming static datasets. The Apache Hadoop framework was the most popular system used to simulate streaming data. It was common to use machine learning algorithms for data analysis. Many types of research used semantic models for use on heterogeneous data. There are smart city testbeds that have implemented some real-time analytics on a small scale. The datasets created by the testbed cities were publicly available and used by most researchers. All the researchers had recommendations for further applicability of their analytic models.

    Future research should include the use of live data streams with many more varied sources of data and conduct over low latency networking. The experiments reviewed used only one to four sources of data together. Future experiments should use as many data sources as possible for a broader synthesis of factors that considers all types of data and showing relationships between different sources. More researchers need to have direct access to implemented real-time sensor systems. Allowing access could be an issue with security and privacy rights. Real-time data analysis will be dependent on low latency networking. Build the fog and edge networks into the experiments to narrow down latency issues with a variety of data sources.

    There is no straightforward way to overcome the complex challenges of how smart cities will process mass quantities of sensor data. Managing the exponential increase in data is a challenge, and according to the research, the systems are already having a hard time synthesizing the data. The infrastructure is not ready for increasing data at an accelerated rate. Increasing data would be like drinking water from a firehose. The other reason the research supports the problem is limitations in the current knowledge base. Smart cities are in the infant stages regarding networks, data flow, technology, and security in IoT systems. Currently, there is not an efficient model to stream the data, multiple systems that are not communicating make it unable to process and manage the databases, and demonstrate potential usefulness. To better understand the potential of IoT, continued research is needed.

References

Akbar, A., Khan, A., Carrez, F., & Moessner, K. (2017). Predictive analytics for complex IoT data streams. IEEE Internet of Things Journal, 4(5), 1571-1582. doi:10.1109/JIOT.2017.2712672

Ali, M. I., Ono, N., Kaysar, M., Shamszaman, Z. U., Pham, T. L., Gao, F., Griffin, K., & Mileo, A. (2017). Real-time data analytics and event detection for IoT-enabled communication systems. Web Semantics: Science, Services and Agents on the World Wide Web, 42, 19-37. doi:10.1016/j.websem.2016.07.001

Aly, H., Elmogy, M., & Barakat, S. (2015). Big data on Internet of Things: Applications, architecture, technologies, techniques, and future directions. International Journal Computer Science Engineering, 4, 300-313. Retrieved from: https://www.researchgate.net/publication/288749349_Big_Data_on_Internet_of_Things_Applications_Architecture_Technologies_Techniques_and_Future_Directions

Babar, M., & Arif, F. (2017). Smart urban planning using big data analytics to contend with the interoperability in Internet of Things. Future Generation Computer Systems, 77, 65-76. doi:10.1016/j.future.2017.07.029

Bibri, S. E. (2017). The IoT for smart sustainable cities of the future: An analytical framework for sensor-based big data applications for environmental sustainability. Sustainable Cities and Society, 38, 230-253. doi: 10.1016/j.scs.2017.12.034

Bibri, S. E., & Krogstie, J. (2017). The core enabling technologies of big data analytics and context-aware computing for smart sustainable cities: a review and synthesis. Journal of Big Data, 4(1), 38, 1-49. doi:10.1186/s40537-017-0091-6

Grover, A., & Walia, N. (2015) Big data in smart cities. Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada. 1-20. Retrieved from: https://www.researchgate.net/publication/313074360_Big_Data_in_Smart_Cities

Li, H., Ota, K., and Dong, M. (Jan.-Feb. 2018) Learning IoT in edge: Deep learning for the Internet of Things with edge computing, IEEE Network, 32(1), 96-101. doi:10.1109/MNET.2018.1700202

Lu, Z., Wang, N., Wu, J., & Qiu, M. (2017). IoTDeM: An IoT big data-oriented MapReduce performance prediction extended model in multiple edge clouds. Journal of Parallel and Distributed Computing, 1-12. doi: 10.1016/j.jpdc.2017.11.001

Kumar, D., Bezdek, J. C., Rajasegarar, S., Palaniswami, M., Leckie, C., Chan, J., & Gubbi, J. (2016). Adaptive cluster tendency visualization and anomaly detection for streaming data. ACM Transactions on Knowledge Discovery from Data, 11(2), 1-40. doi:10.1145/2997656

Mahdavinejad, M. S., Rezvan, M., Barekatain, M., Adibi, P., Barnaghi, P., & Sheth, A. P. (2017). Machine learning for Internet of Things data analysis: A survey. Digital Communications and Networks, 1-56. doi:10.1016/j.dcan.2017.10.002

Malek, Y. N., Kharbouch, A., El Khoukhi, H., Bakhouya, M., De Florio, V., El Ouadghiri, D., … & Blondia, C. (2017). On the use of IoT and big data technologies for real-time monitoring and data processing. Procedia Computer Science, 113, 429-434. doi:10.1016/j.procs.2017.08.281

Mohanty, S. P., Choppali, U., & Kougianos, E. (2016). Everything you wanted to know about smart cities: The Internet of Things is the backbone. IEEE Consumer Electronics Magazine, 5(3), 60-70. doi:10.1109/MCE.2016.2556879

Nathali Silva, B., Khan, M., & Han, K. (2017). Big data analytics embedded smart city architecture for performance enhancement through real-time data processing and decision-making. Wireless Communications and Mobile Computing, 2017, 1-13. doi.org/10.1155/2017/9429676

Puschmann, D., Barnaghi, P., & Tafazolli, R. (2017). Using LDA to uncover the underlying structures and relations in smart city data streams. IEEE Systems Journal, 1-12. doi:10.1109/JSYST.2017.2723818

Ranjan, R., Thakker, D., Haller, A., & Buyya, R. (2017). A note on exploration of IoT generated big data using semantics. Future Generation Computer Systems, 72, 495-498. doi:10.1016/j.future.2017.06.032

Rathore, M. M., Paul, A., Hong, W. H., Seo, H., Awan, I., & Saeed, S. (2017a). Exploiting IoT and big data analytics: Defining smart digital city using real-time urban data. Sustainable Cities and Society, 1-12. doi:10.1016/j.scs.2017.12.022

Rathore, M. M., Paul, A., Ahmad, A., Anisetti, M., & Jeon, G. (2017b). Hadoop-based intelligent care system (HICS): Analytical approach for big data in IoT. ACM Transactions on Internet Technology (TOIT), 18(1), 1-24. doi:10.1145/3108936

Sta, H. B. (2017). Quality and the efficiency of data in “Smart-Cities”. Future Generation Computer Systems, 74, 409-416. doi: 10.1016/j.future.2016.12.021

Tang, B., Chen, Z., Hefferman, G., Pei, S., Wei, T., He, H., & Yang, Q. (2017). Incorporating intelligence in fog computing for big data analysis in smart cities. IEEE Transactions on Industrial Informatics, 13(5), 2140-2150. doi:10.1109/TII.2017.2679740

Zhang, N., Chen, H., Chen, X., & Chen, J. (2016). Semantic framework of Internet of things for smart cities: Case studies. Sensors, 16(9), 1-13. doi:10.3390/s16091501