Training Projects

From DeepSense Docs
Revision as of 13:02, 17 June 2021 by Bgeetika (talk | contribs) (5. Text Cleaning)
Jump to: navigation, search

DeepSense has compiled a few data sets for students, and others interested in the ocean and AI, so they can have the opportunity to complete AI projects independently. We hope participants can learn about a specific type of ocean related data, and experience an explicit AI project. It is expected that the participants work on the project alone, but we have provided some guidance that includes notebooks, data, outputs and models to try to improve upon.

We have found that the data cleaning step can take a long time, so our hope is that these datasets will be reasonably clean, allowing the participants to explore ocean AI.

1. Object Detection

We used the google open images database to obtain approximately 650 images of starfish. The images were already separated into train, test and validation sets. The metadata linked below is only for the starfish images, not for the entire dataset. The metadata includes coordinates for bounding boxes around the starfish.

Google Drive Directory

This contains the files:

Starfish Dataset

Training metadata

Test metadata

Validation metadata

If you want to download other categories of images from the open images database, you can do so by following the instructions here:

Download Instructions

After you have the datasets, you can download and install YOLO v4 using the following instructions:

Installation Instructions

Configuration Instructions

Metadata Conversion Script

If you want to run this on google colab, check out the following wiki: Darknet Wiki. At the top there is a link to a colab notebook, and a video tutorial.

2. Regression

The buoy collects environment measurements including wind speed and direction, surface temperature, current speed, wave height, and peak wave period. This wind and wave data are used to decide if conditions allow the safe transfer of pilots and passage of vessels, as they require a minimum depth of water which may not be met if the waves are too large. The current Red Shoal Buoy is under maintenance. Such a duration without accurate environmental measurements would significantly impair the ability to ensure the safe guidance of vessels. In this project, we are trying to predict the environment measurements of the buoy which is under maintenance using the values of other active operational buoy so that the authorities could allow the safe passage of vessels.

Predicting the values of one buoy using the parameters of another buoy. In this project, we are using the dataset of Mouth of Placentia Bay Buoy, Pilot Boarding Station / Red Island Shoal Buoy, Placentia Bay: Ragged Islands – KLUMI( Land station) which are located in Newfoundland and Labrador.

Datasets

Mouth of Placentia Bay Buoy

Pilot Boarding Station / Red Island Shoal Buoy

Placentia Bay: Ragged Islands – KLUMI( Land station)

The dataset available here is till April 19, 2021. You can get the latest dataset from smartatlantic.

Download Instructions

Data Dictionary

Visual Representation of data

Buoy Location


You can find the instructions to clean the dataset, merging of files and training the ML models from the below link:

Instructions for Cleaning/Merging/Training


We have implemented the code on IBM Watson Cloud and encourage you to use this to get the experience of Cloud. Below link will provide you the instructions for using the IBM Watson Cloud. The Lite version of this cloud is free and provide you 25GB storage which is enough for this project.

Instructions for using IBM Watson Cloud


We have created notebooks with the code for your reference in the below link.

Links for Notebooks


REFERENCES

A MACHINE LEARNING REDUNDANCY MODEL FOR THE HERRING COVE SMART BUOY

3. Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is an approach of analyzing or investigating the dataset using statistical graphics and other data visualization methods. Analysis may include: handle missing values, maximize insight into a data set and discover patterns, extract important variables, detect outliers and anomalies, find interesting relations among the variables, test hypothesis and check assumptions. Drawing reliable conclusions from a massive quantity of data by just gleaning over it is very difficult or almost impossible—instead, you have to look at it carefully through an analytical lens.

Dataset

Download Instructions

Steps to do EDA

Link for Notebook

4. Natural Language Processing (NLP)

NLP project is related to the Sentiment Analysis on Climate change. We have used the dataset available on data.world(Link provided below). We have applied BERT to do Sentiment analysis. BERT has become a new standard for Natural Language Processing (NLP). It achieved a whole new state-of-the-art on eleven NLP task, including text classification, sentiment analysis, sequence labeling, question answering, and many more

Dataset

Instruction for using Google Colab and download dataset

Steps to do Sentiment Analysis

Link to Notebook

5. Text Cleaning

The purpose of this project is to show you how to clean the text before using it for NLP problems. In this project, I am using the Tweets posted by the users on World Ocean day which was on June 8, 2021. I created a Python Script to pull the tweets and then showed how to clean that raw data. Notebooks are attached for your reference.

Steps for Tweets Extraction

Steps for Text Cleaning

Link for Notebook

Dataset

You can try the same dataset which I have attached here for practicing Data Cleaning. If you want to see how to pull tweets from Twitter, follow instructions given in Steps for Tweets Extraction and Steps for Text Cleaning links.