Petabytes of high-dimensional data are nowadays acquired by diverse sensing modalities and sources. Mining and cross correlating these heterogeneous data sources can unlock new insights, accelerate new market opportunities and lead to innovative products. In the smart cities industry, for example, conventional structured datasets can be cross-correlated with private big data sources, such as web search engine’s data, commercial vendors’ data sets, mobile positioning data, social media data (Twitter, Youtube) as well as open data. Leveraging the potential of vast corpora of data can help to effectively cope with new challenges.

Data Sources (we have worked with)

  • Social Media Data: Twitter, Periscope, Reddit, Tumblr, HealthMap.
  • IoT Data: air pollution data; diverse sensor data (temperature, pressure, light) acquired by InterNET Ltd., Ro; visual sensor network data (collected by Xetal nv, BE) for behavioral analytics.
  • Big visual data: visual data mining and analytics for BAFTA, UK.
  • RSS Feeds and web APIs.
  • Multimodal imaging: visual macrophotography, infrared macrophotography, infrared reflectography and X-radiography, THz imaging, millimeter wave imaging.
  • Text analytics: word embeddings, cross-modal deep learning.

Toolkit:

  • Harvesters and pre-processing modules for high-dimensional data gleaned from diverse, online sources;
  • Distributed computing machinery and state-of-the-art big data tools (Hadoop, Python, JSON, C++, Tableaux).