tech

Fast.ai Course v3 Lesson 6 Notes

platform.ai To label unlabelled data for you. Tabular learner Rossmann Store Sales data set When dealing with time series data, most of the time in practice is not using recurrent neural network, which is powerful when the sequence of time point is the ONLY information we have. In real-world cases, when we are given time data, we can generate or look for more information regarding this field, such as hour, date, day of week, week, month, etc.

Fast.ai Course v3 Lesson 5 Notes

Back propagation To calculate the loss between output layer/final activations and actual target values. Use the resulting losses to: Calculate the gradients with respect to the parameters and Update the parameters: $\text{parameters} -= \text{learning rate} \cdot \text{gradient of parameters}$. Fine tuning Example: ResNet-34 The final layer, i.e. that final weight matrix, of ResNet-34 has 1000 columns because the images can be in one of 1000 different categories, i.

Fast.ai Course v3 Lesson 4 Notes

Sentiment analysis (IMDB) Transfer learning in NLP Difficulties in training NLP model: How to speak a language. World knowledge Start with a pre-trained model: a language model - a model that learns to predict the next word of a sentence. Get benefit from a pre-trained model from a much bigger dataset, e.g. Wikipedia. No preset label is needed: self-supervised learning - Yann LeCun, labels still exist in this kind of classification problem, but not created by human, instead, are built into the data set.

Fast.ai Course v3 Lesson 3 Notes

Multi-label prediction with Planet Amazon dataset The data block API Classes involved: Dataset: an abstract class for retrieving an item by index and return the length of the Dataset. DataLoader: get data from Dataset and feed the data batches to processors. DataBunch: bind the DataLoaders of training dataset, validation dataset and optionally a test dataset, and is ready to be sent to the learner. General steps using the data block API to create a DataBunch:

Fast.ai Course v3 Lesson 2 Notes

General notes regarding previous lesson No need to feel intimidated for all the good projects in lesson 1. Just open your imagination and start an interesting project! Keep going: code and experiment –> “The whole game” –> concepts –> lesson 2 Creating your own dataset from Google Images Download images After opened image search result page, run the following javascript code in javascript console, by pressing Ctrl+Shift+J in Windows/Linux or Cmd+Opt+J in Mac.

Fast.ai Course v3 Lesson 1 Notes

Lesson 1 notes (41:48) In practice, use fit_one_cycle for “1 cycle policy learning” instead of fit. Some discussions of the cyclical learning rate can be found here, here and here. (56:20) Advice on how to get the most out of the course: “Pick one project, do it really well, make it fantastic." (1:06:19) Deep neural network architectures for image classification: ResNet-50 is good enough, take a look at the DAWNBench.

Correlation analysis of ETF using Python

Previously, we have covered why and how to create a correlation matrix of ETFs available in Hong Kong market using Python. Now we should do some actual correlation analyses on these securities, with the matrix just created. There are two kinds of analyses I am going to demonstrate, which are actually quite similar: one is to find out the n most uncorrelated ETFs in the whole market; the other one is to find out n most uncorrelated ETFs corresponding to a given specific ticker.

Finding correlation coefficients between ETFs with Python

Several months ago I finished reading the book The Intelligent Asset Allocator by William Bernstein. It is a really nice book if you want to have a solid idea and examples on portfolio theory, as well as guidance for building your own investment portfolio by allocating your asset into different classes. One of the main points of building effective portfolio is building with uncorrelated, or less correlated in reality, assets. Whether two assets are correlated or not, or more precisely, the level of correlation, is measured by correlation coefficient, which is ranging from -1 to +1.

Next Word Prediction using Katz Backoff Model - Part 3: Prediction Model Implementation

Executive Summary The Capstone Project of the Johns Hopkins Data Science Specialization is to build an NLP application, which should predict the next word of a user text input. In Part 1, we have analysed and found some characteristics of the training dataset that can be made use of in the implementation. We have also discussed the Good-Turing smoothing estimate and Katz backoff model that powering our text prediction application in Part 2.

Next Word Prediction using Katz Backoff Model - Part 2: N-gram model, Katz Backoff, and Good-Turing Discounting

Executive Summary The Capstone Project of the Johns Hopkins Data Science Specialization is to build an NLP application, which should predict the next word of a user text input. In Part 1, we have analysed the data and found that there are a lot of uncommon words and word combinations (2- and 3-grams) can be removed from the corpora, in order to reduce memory usage and speed up the model building time.