Difference between revisions of "Training Projects"
(→4. EDA) |
(→4. EDA) |
||
Line 95: | Line 95: | ||
handle missing values, maximize insight into a data set and discover patterns, extract important variables, detect outliers and anomalies, find interesting relations among the variablestest hypothesis and check assumptions. Drawing reliable conclusions from a massive quantity of data by just gleaning over it is very difficult or almost impossible—instead, you have to look at it carefully through an analytical lens. | handle missing values, maximize insight into a data set and discover patterns, extract important variables, detect outliers and anomalies, find interesting relations among the variablestest hypothesis and check assumptions. Drawing reliable conclusions from a massive quantity of data by just gleaning over it is very difficult or almost impossible—instead, you have to look at it carefully through an analytical lens. | ||
+ | [https://drive.google.com/file/d/1fc5M3Znn8OaPQ2E-7k-nBbDrTbz9xpf8/view?usp=sharing Dataset] | ||
+ | |||
+ | [https://drive.google.com/file/d/1iLAoQLJlQcShkUkff4N--o_m2KADXA7W/view?usp=sharing Download Instructions] | ||
+ | |||
+ | [https://drive.google.com/file/d/1uZv4f1HfqFhXVqhY5U3T_mfAtOytO8tt/view?usp=sharing Steps to do EDA] | ||
+ | |||
+ | [https://drive.google.com/file/d/1NG80DcJL01gJgjLJAV0JOE5gFxSjnsbT/view?usp=sharing Link for Notebook] | ||
</div> <!-- autonum --> | </div> <!-- autonum --> |
Revision as of 15:30, 8 June 2021
DeepSense has compiled a few data sets for students, and others interested in the ocean and AI, so they can have the opportunity to complete AI projects independently. We hope participants can learn about a specific type of ocean related data, and experience an explicit AI project. It is expected that the participants work on the project alone, but we have provided some guidance that includes notebooks, data, outputs and models to try to improve upon.
We have found that the data cleaning step can take a long time, so our hope is that these datasets will be reasonably clean, allowing the participants to explore ocean AI.
1. Object Detection
We used the google open images database to obtain approximately 650 images of starfish. The images were already separated into train, test and validation sets. The metadata linked below is only for the starfish images, not for the entire dataset. The metadata includes coordinates for bounding boxes around the starfish.
This contains the files:
If you want to download other categories of images from the open images database, you can do so by following the instructions here:
After you have the datasets, you can download and install YOLO v4 using the following instructions:
If you want to run this on google colab, check out the following wiki: Darknet Wiki. At the top there is a link to a colab notebook, and a video tutorial.
2. Regression
The buoy collects environment measurements including wind speed and direction, surface temperature, current speed, wave height, and peak wave period. This wind and wave data are used to decide if conditions allow the safe transfer of pilots and passage of vessels, as they require a minimum depth of water which may not be met if the waves are too large. The current Red Shoal Buoy is under maintenance. Such a duration without accurate environmental measurements would significantly impair the ability to ensure the safe guidance of vessels. In this project, we are trying to predict the environment measurements of the buoy which is under maintenance using the values of other active operational buoy so that the authorities could allow the safe passage of vessels.
Predicting the values of one buoy using the parameters of another buoy. In this project, we are using the dataset of Mouth of Placentia Bay Buoy, Pilot Boarding Station / Red Island Shoal Buoy, Placentia Bay: Ragged Islands – KLUMI( Land station) which are located in Newfoundland and Labrador.
Datasets
Pilot Boarding Station / Red Island Shoal Buoy
Placentia Bay: Ragged Islands – KLUMI( Land station)
The dataset available here is till April 19, 2021. You can get the latest dataset from smartatlantic.
You can find the instructions to clean the dataset, merging of files and training the ML models from the below link:
Instructions for Cleaning/Merging/Training
We have implemented the code on IBM Watson Cloud and encourage you to use this to get the experience of Cloud. Below link will provide you the instructions for using the IBM Watson Cloud. The Lite version of this cloud is free and provide you 25GB storage which is enough for this project.
Instructions for using IBM Watson Cloud
We have created notebooks with the code for your reference in the below link.
REFERENCES
A MACHINE LEARNING REDUNDANCY MODEL FOR THE HERRING COVE SMART BUOY
3. NLP
NLP project is related to the Sentiment Analysis on Climate change. We have used the dataset available on data.world(Link provided below). We have applied BERT to do Sentiment analysis. BERT has become a new standard for Natural Language Processing (NLP). It achieved a whole new state-of-the-art on eleven NLP task, including text classification, sentiment analysis, sequence labeling, question answering, and many more
Instruction for using Google Colab and download dataset
Steps to do Sentiment Analysis
4. EDA
Exploratory Data Analysis (EDA) is an approach of analyzing or investigating the dataset using statistical graphics and other data visualization methods. Analysis may include: handle missing values, maximize insight into a data set and discover patterns, extract important variables, detect outliers and anomalies, find interesting relations among the variablestest hypothesis and check assumptions. Drawing reliable conclusions from a massive quantity of data by just gleaning over it is very difficult or almost impossible—instead, you have to look at it carefully through an analytical lens.