Difference between revisions of "Training Projects"
(→3. Natural Language Processing(NLP)) |
|||
(22 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
We have found that the data cleaning step can take a long time, so our hope is that these datasets will be reasonably clean, allowing the participants to explore ocean AI. | We have found that the data cleaning step can take a long time, so our hope is that these datasets will be reasonably clean, allowing the participants to explore ocean AI. | ||
− | == 1. Object Detection == | + | == '''Beginner Level''' == |
+ | |||
+ | == 1. Python Basics== | ||
+ | |||
+ | Before exploring the AI Training projects, it is recommended to go through the Python Libraries required for Machine Learning projects. This would help you to understand the other projects effectively. This section contains the information about the essential Python Libraries required for AI project and also you can see the practical usage of these libraries through notebook attached. | ||
+ | |||
+ | [https://drive.google.com/file/d/1rP5rrB2SwBl5Be2QmNArHzSSiQ_2USdB/view?usp=sharing Dataset] | ||
+ | |||
+ | [https://drive.google.com/file/d/108zO3LZwuRBSxRjLUBXUNRcXKc2DSdPt/view?usp=sharing Link for Notebook] | ||
+ | |||
+ | [https://drive.google.com/file/d/18eLxX91h4N_8936bJBkWuazF51UHub6V/view?usp=sharing Introduction to Python Libraries] | ||
+ | |||
+ | |||
+ | == '''Intermediate Level''' == | ||
+ | |||
+ | ==2. Exploratory Data Analysis (EDA) == | ||
+ | |||
+ | Exploratory Data Analysis (EDA) is an approach of analyzing or investigating the dataset using statistical graphics and other data visualization methods. Analysis may include: | ||
+ | handle missing values, maximize insight into a data set and discover patterns, extract important variables, detect outliers and anomalies, find interesting relations among the variables, test hypothesis and check assumptions. Drawing reliable conclusions from a massive quantity of data by just gleaning over it is very difficult or almost impossible—instead, you have to look at it carefully through an analytical lens. | ||
+ | |||
+ | [https://drive.google.com/file/d/1fc5M3Znn8OaPQ2E-7k-nBbDrTbz9xpf8/view?usp=sharing Dataset] | ||
+ | |||
+ | [https://drive.google.com/file/d/1iLAoQLJlQcShkUkff4N--o_m2KADXA7W/view?usp=sharing Download Instructions] | ||
+ | |||
+ | [https://drive.google.com/file/d/1uZv4f1HfqFhXVqhY5U3T_mfAtOytO8tt/view?usp=sharing Steps to do EDA] | ||
+ | |||
+ | [https://drive.google.com/file/d/1NG80DcJL01gJgjLJAV0JOE5gFxSjnsbT/view?usp=sharing Link for Notebook] | ||
+ | |||
+ | ==3. Text Cleaning== | ||
+ | The purpose of this project is to show you how to clean the text before using it for NLP problems. In this project, I am using the Tweets posted by the users on World Ocean day which was on June 8, 2021. I have created a Python Script to pull the tweets and then showed how to clean that raw data. Notebooks are attached for your reference. | ||
+ | |||
+ | [https://drive.google.com/file/d/1dxCTeF60NAFNPgvyx6dgllGKuiq-fdoH/view?usp=sharing Steps for Tweets Extraction] | ||
+ | |||
+ | [https://drive.google.com/file/d/1n_O3lQ9LP-f9vI3-kLz33MRQi9lzt2x8/view?usp=sharing Steps for Text Cleaning] | ||
+ | |||
+ | [https://drive.google.com/file/d/15PzwZ2UdIpB7XELHVyTB4r6rAdrBfGjg/view?usp=sharing Link for Notebook] | ||
+ | |||
+ | [https://drive.google.com/file/d/1uvdtV0PnU78vzdxKVKHtn7UCQVzdMSes/view?usp=sharing Dataset] | ||
+ | |||
+ | You can try the same dataset which I have attached here for practicing Data Cleaning. If you want to see how to pull tweets from Twitter, follow instructions given in Steps for Tweets Extraction and Steps for Text Cleaning links. | ||
+ | |||
+ | |||
+ | == '''Advanced Level''' == | ||
+ | |||
+ | == 4. Object Detection == | ||
We used the google open images database to obtain approximately 650 images of starfish. The images were already separated into train, test and validation sets. The metadata linked below is only for the starfish images, not for the entire dataset. The metadata includes coordinates for bounding boxes around the starfish. | We used the google open images database to obtain approximately 650 images of starfish. The images were already separated into train, test and validation sets. The metadata linked below is only for the starfish images, not for the entire dataset. The metadata includes coordinates for bounding boxes around the starfish. | ||
Line 35: | Line 79: | ||
If you want to run this on google colab, check out the following wiki: [https://github.com/AlexeyAB/darknet/wiki Darknet Wiki]. At the top there is a link to a colab notebook, and a video tutorial. | If you want to run this on google colab, check out the following wiki: [https://github.com/AlexeyAB/darknet/wiki Darknet Wiki]. At the top there is a link to a colab notebook, and a video tutorial. | ||
− | == | + | == 5. Regression == |
The buoy collects environment measurements including wind speed and direction, surface temperature, current speed, wave height, and peak wave period. This wind and wave data are used to decide if conditions allow the safe transfer of pilots and passage of vessels, as they require a minimum depth of water which may not be met if the waves are too large. The current Red Shoal Buoy is under maintenance. Such a duration without accurate environmental measurements would significantly impair the ability to ensure the safe guidance of vessels. In this project, we are trying to predict the environment measurements of the buoy which is under maintenance using the values of other active operational buoy so that the authorities could allow the safe passage of vessels. | The buoy collects environment measurements including wind speed and direction, surface temperature, current speed, wave height, and peak wave period. This wind and wave data are used to decide if conditions allow the safe transfer of pilots and passage of vessels, as they require a minimum depth of water which may not be met if the waves are too large. The current Red Shoal Buoy is under maintenance. Such a duration without accurate environmental measurements would significantly impair the ability to ensure the safe guidance of vessels. In this project, we are trying to predict the environment measurements of the buoy which is under maintenance using the values of other active operational buoy so that the authorities could allow the safe passage of vessels. | ||
Line 79: | Line 123: | ||
[https://www.thejot.net/article-preview/?show_article_preview=1193 A MACHINE LEARNING REDUNDANCY MODEL FOR THE HERRING COVE SMART BUOY] | [https://www.thejot.net/article-preview/?show_article_preview=1193 A MACHINE LEARNING REDUNDANCY MODEL FOR THE HERRING COVE SMART BUOY] | ||
− | == | + | ==6. Natural Language Processing (NLP)== |
NLP project is related to the Sentiment Analysis on Climate change. We have used the dataset available on data.world(Link provided below). We have applied BERT to do Sentiment analysis. BERT has become a new standard for Natural Language Processing (NLP). It achieved a whole new state-of-the-art on eleven NLP task, including text classification, sentiment analysis, sequence labeling, question answering, and many more | NLP project is related to the Sentiment Analysis on Climate change. We have used the dataset available on data.world(Link provided below). We have applied BERT to do Sentiment analysis. BERT has become a new standard for Natural Language Processing (NLP). It achieved a whole new state-of-the-art on eleven NLP task, including text classification, sentiment analysis, sequence labeling, question answering, and many more | ||
Line 90: | Line 134: | ||
[https://colab.research.google.com/drive/18r3qvyJhNJ4gkNAB9dENGUQUBM2L-P4s?usp=sharing Link to Notebook] | [https://colab.research.google.com/drive/18r3qvyJhNJ4gkNAB9dENGUQUBM2L-P4s?usp=sharing Link to Notebook] | ||
− | == | + | ==7. Time Series== |
+ | |||
+ | A Time Series is simply a series of data points ordered in time. In a Time Series, time is often the independent variable and the goal is usually to make a forecast for the future. Plot the points on a graph, and one of your axes would always be time. You can see the analysis ,plotting and building machine learning model for time series data in this project. This project is done in IBM Watson cloud and instructions are given in attached links. The Lite version of this cloud is free and provide you 25GB storage which is enough for this project. | ||
− | + | [https://docs.google.com/document/d/1t1HBDLnDKzd1trzE07tXEcuB_hJpiziu/edit?usp=sharing&ouid=117309982983716033255&rtpof=true&sd=true Instruction for setting account in IBM Watson cloud] | |
− | + | ||
+ | [https://docs.google.com/document/d/1wIPmktFBhH01dY5AQrGQbg-aYJor3r_V/edit?usp=sharing&ouid=117309982983716033255&rtpof=true&sd=true Steps to download dataset] | ||
+ | |||
+ | [https://drive.google.com/file/d/1rYcX5DbCLV7gEjiCUX3TyiqaH5khMwci/view?usp=sharing Dataset] | ||
− | [https:// | + | [https://docs.google.com/document/d/1Uvh6cZIckDVNYKbPLmNbjW2jARiLg5tw/edit?usp=sharing&ouid=117309982983716033255&rtpof=true&sd=true Steps to handle Time Series data] |
− | [https:// | + | [https://docs.google.com/document/d/13FzlitRQ1b_SL3jZsQWkvcLNYf1zf30U/edit?usp=sharing&ouid=117309982983716033255&rtpof=true&sd=true Link to notebook] |
− | |||
− | |||
</div> <!-- autonum --> | </div> <!-- autonum --> |
Latest revision as of 15:20, 26 August 2021
DeepSense has compiled a few data sets for students, and others interested in the ocean and AI, so they can have the opportunity to complete AI projects independently. We hope participants can learn about a specific type of ocean related data, and experience an explicit AI project. It is expected that the participants work on the project alone, but we have provided some guidance that includes notebooks, data, outputs and models to try to improve upon.
We have found that the data cleaning step can take a long time, so our hope is that these datasets will be reasonably clean, allowing the participants to explore ocean AI.
Contents
Beginner Level
1. Python Basics
Before exploring the AI Training projects, it is recommended to go through the Python Libraries required for Machine Learning projects. This would help you to understand the other projects effectively. This section contains the information about the essential Python Libraries required for AI project and also you can see the practical usage of these libraries through notebook attached.
Introduction to Python Libraries
Intermediate Level
2. Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) is an approach of analyzing or investigating the dataset using statistical graphics and other data visualization methods. Analysis may include: handle missing values, maximize insight into a data set and discover patterns, extract important variables, detect outliers and anomalies, find interesting relations among the variables, test hypothesis and check assumptions. Drawing reliable conclusions from a massive quantity of data by just gleaning over it is very difficult or almost impossible—instead, you have to look at it carefully through an analytical lens.
3. Text Cleaning
The purpose of this project is to show you how to clean the text before using it for NLP problems. In this project, I am using the Tweets posted by the users on World Ocean day which was on June 8, 2021. I have created a Python Script to pull the tweets and then showed how to clean that raw data. Notebooks are attached for your reference.
You can try the same dataset which I have attached here for practicing Data Cleaning. If you want to see how to pull tweets from Twitter, follow instructions given in Steps for Tweets Extraction and Steps for Text Cleaning links.
Advanced Level
4. Object Detection
We used the google open images database to obtain approximately 650 images of starfish. The images were already separated into train, test and validation sets. The metadata linked below is only for the starfish images, not for the entire dataset. The metadata includes coordinates for bounding boxes around the starfish.
This contains the files:
If you want to download other categories of images from the open images database, you can do so by following the instructions here:
After you have the datasets, you can download and install YOLO v4 using the following instructions:
If you want to run this on google colab, check out the following wiki: Darknet Wiki. At the top there is a link to a colab notebook, and a video tutorial.
5. Regression
The buoy collects environment measurements including wind speed and direction, surface temperature, current speed, wave height, and peak wave period. This wind and wave data are used to decide if conditions allow the safe transfer of pilots and passage of vessels, as they require a minimum depth of water which may not be met if the waves are too large. The current Red Shoal Buoy is under maintenance. Such a duration without accurate environmental measurements would significantly impair the ability to ensure the safe guidance of vessels. In this project, we are trying to predict the environment measurements of the buoy which is under maintenance using the values of other active operational buoy so that the authorities could allow the safe passage of vessels.
Predicting the values of one buoy using the parameters of another buoy. In this project, we are using the dataset of Mouth of Placentia Bay Buoy, Pilot Boarding Station / Red Island Shoal Buoy, Placentia Bay: Ragged Islands – KLUMI( Land station) which are located in Newfoundland and Labrador.
Datasets
Pilot Boarding Station / Red Island Shoal Buoy
Placentia Bay: Ragged Islands – KLUMI( Land station)
The dataset available here is till April 19, 2021. You can get the latest dataset from smartatlantic.
You can find the instructions to clean the dataset, merging of files and training the ML models from the below link:
Instructions for Cleaning/Merging/Training
We have implemented the code on IBM Watson Cloud and encourage you to use this to get the experience of Cloud. Below link will provide you the instructions for using the IBM Watson Cloud. The Lite version of this cloud is free and provide you 25GB storage which is enough for this project.
Instructions for using IBM Watson Cloud
We have created notebooks with the code for your reference in the below link.
REFERENCES
A MACHINE LEARNING REDUNDANCY MODEL FOR THE HERRING COVE SMART BUOY
6. Natural Language Processing (NLP)
NLP project is related to the Sentiment Analysis on Climate change. We have used the dataset available on data.world(Link provided below). We have applied BERT to do Sentiment analysis. BERT has become a new standard for Natural Language Processing (NLP). It achieved a whole new state-of-the-art on eleven NLP task, including text classification, sentiment analysis, sequence labeling, question answering, and many more
Instruction for using Google Colab and download dataset
Steps to do Sentiment Analysis
7. Time Series
A Time Series is simply a series of data points ordered in time. In a Time Series, time is often the independent variable and the goal is usually to make a forecast for the future. Plot the points on a graph, and one of your axes would always be time. You can see the analysis ,plotting and building machine learning model for time series data in this project. This project is done in IBM Watson cloud and instructions are given in attached links. The Lite version of this cloud is free and provide you 25GB storage which is enough for this project.
Instruction for setting account in IBM Watson cloud
Steps to handle Time Series data