Implement a deep neural network model that learns to expand single variable polynomials. Model input is factorized sequence and output is predicted expanded sequence.
(7-3*z)*(-5*z-9)=15*z**2-8*z-63(7-3*z)*(-5*z-9)is the factorized input15*z**2-8*z-63is the expanded target
For the expanded form, only the form provided is considered as correct.
- The directory
./datacontainstrain.txt,validation.txtandtest.txt - The source and target sequence vocabulary is stored in the directory
./vocab - The trained model (
best_model.pt) is present in the directory./model - All predictions made by the model on the test is stored in the file
./output/predictions.txt - Summary for the model and it's trainable parameters is stored in
network.txt - The classes for the transformer model are in -
backbone.pyandtransformer.py data.pysplits the dataset into train,val and train datasets randomly based on the input split ratio (already split dataset is provided in the repo)train.pytrains the model using the defined configurationstest.pyruns the trained model on the test data to generate predictions and calculates the accuracytext_EDA.ipynbcontains the preliminary exploratory data analysis of the datasetrequirements.txtcontains dependencies
Model was trained on a single NVIDIA RTX 3090 GPU with CUDA 10.2 and torch == 1.11.0 (You might have to change the torch version depending upon the GPU and CUDA version of your machine)
- Set up a new conda virtual environment
conda create --name <env_name> python=3.9.2- Activate the environment
conda activate <env_name>- Install the dependencies (Run this command in the
/Attentiondirectory)
pip install -r requirements.txtThis solution uses the sacred library for logging, running, configuring and organizing the code.
All the commands should be run only from the parent directory (i.e. /Attention)
- Split data into train, val and test set:
python data.py with 'split_ratio=0.8'- Train the model (All configurations can be observed in
train.pyand they can also be passed from command line as shown):
python train.py with 'hyperparameters.n_iters=20'Evaluate model on test set (This will utilize GPU if available):
python test.pyEvaluate model on test set (Using CPU only):
python test.py with 'device="cpu"'The model is evaluated against a strict equality between the predicted target sequence and the groud truth target sequence of the test dataset. The model achieved an accuracy of 98.63% (trained for 20 epochs for 45 minutes on a single GPU).
For a more comprehensive description of the solution, parameter choices and loss plots, please refer to the file Solution-Report.pdf