[Home]NeuralTargeting/TDNN

Robo Home | NeuralTargeting | Changes | Preferences | AllPages

Difference (from prior major revision) (author diff)

Changed: 356,358c356
--SSO

TD (Temporal difference) is a reinforcement learning concept, yes. The TD in TDNN is however not related to reinforcement learning but stands for Time Delay or Tap Delay. I.e. you just keep a backlog of previous inputs and feed them into the net each time. Your net will be lost without the capability to handle temporal information an TDNNs are the simplest form of nets that can handle it. -- Anon
--SSO

This class implements a Temporal Difference NN (like the one used in TD-Backgammon). It comes from an academic proyect by 3 students in Tel-Aviv (Chen Levkovich, Tali Hildeshaim and Orly Levkovich-Frank). More information about the project can be found in http://www.geocities.com/chen_levkovich/tdlearningproject.html

I just modified the class to make it compatible with robocode. Unfortunately is not commented (original source code wasn´t also).

I wasn't aware of this kind of networks till PEZ mentioned them... so thanks, PEZ :-)

-- Albert


package apv.TDNet;
import robocode.*;
import java.util.Random;
import java.util.Vector;
import java.io.*;
import java.util.zip.*;

public class NueralNet implements Serializable {
    protected Random rand;

    protected double m_OutputLayer;
    protected double[] m_InputLayer;
    protected double[] m_HiddenLayerOutputs;
    protected double[] m_HiddenLayerWeights;
    protected double[] m_OutputLayerWeights;
    protected double[] m_Sigma;
    protected double m_LastOutput;
    protected Vector m_Grads;
    protected int m_InputNum = 3;
    protected int m_NueronsNum = 3;
    protected int m_t;
    //protected int m_TimePeriod = 10;
	protected int m_TimePeriod = 150;
    protected double m_alpha = 0.1; // learning parameter	
    protected double[] m_LambdaPower;
    protected double m_Lambda = 0.1;
    protected double m_AlphaChangeRatio = 0.998;

    private void initNet() {
        prepareLambdaPower();
        rand = new Random(System.currentTimeMillis());
        m_OutputLayer = 0;
        m_InputLayer = new double[m_InputNum];
        m_HiddenLayerOutputs = new double[m_NueronsNum];
        m_HiddenLayerWeights = new double[m_NueronsNum * m_InputNum];
        m_OutputLayerWeights = new double[m_NueronsNum];
        m_Sigma = new double[m_NueronsNum];
        initWeights();
        m_t = 1;
        m_Grads = new Vector();
    }
    public NueralNet() {
        initNet();
    }
public NueralNet(int InputSize, int HiddenLayerSize) {
    m_InputNum = InputSize;
    m_NueronsNum = HiddenLayerSize;
    initNet();
}
    // compute the delta weights vector
    private void computeDeltaW(double reward, double Yt1, double Yt) {
        int i;
        int j;
        int k;
        double sigma = 0;
        double[] Grad;

        double dY;

        dY = m_alpha * (reward + Yt1 - Yt);

        // compute the delta for output layer weights

        for (i = 0; i < m_NueronsNum; i++) {
            if (m_Lambda > 0) {
                sigma = 0;
                for (k = 1; k <= m_t - 1; k++) {
                    Grad = (double[]) m_Grads.elementAt(k - 1);
                    sigma += m_LambdaPower[(m_t - 1) - k] * Grad[i];
                }
                m_OutputLayerWeights[i] += dY * sigma;
            } else {
                Grad = (double[]) m_Grads.elementAt((m_t - 1) - 1);
                m_OutputLayerWeights[i] += dY * Grad[i];
            }
        }
        for (i = 0; i < m_NueronsNum; i++) {
            for (j = 0; j < m_InputNum; j++) {
                if (m_Lambda > 0) {
                    sigma = 0;
                    for (k = 1; k <= m_t - 1; k++) {
                        Grad = ((double[]) m_Grads.elementAt(k - 1));
                        sigma += m_LambdaPower[(m_t - 1) - k] * Grad[i * m_InputNum + j + m_NueronsNum];
                    }
                    m_HiddenLayerWeights[i * m_InputNum + j] += sigma * dY;
                } else {
                    Grad = (double[]) m_Grads.elementAt((m_t - 1) - 1);
                    m_HiddenLayerWeights[i * m_InputNum
                        + j] += dY * Grad[i * m_InputNum
                        + j
                        + m_NueronsNum];
                }
            }
        }
    }
    private double[] computeGrad() {
        int i;
        int j;
        int sigma = 0;
        double[] Grad = new double[m_NueronsNum * m_InputNum + m_NueronsNum];

        for (i = 0; i < m_NueronsNum; i++) {
            Grad[i] = m_HiddenLayerOutputs[i] / m_NueronsNum;
        }

        for (i = 0; i < m_NueronsNum; i++) {
            for (j = 0; j < m_InputNum; j++) {
                Grad[i * m_InputNum + j + m_NueronsNum] =
                    dTangh(m_Sigma[i])
                        * m_OutputLayerWeights[i]
                        * m_InputLayer[j]
                        / (m_NueronsNum * m_InputNum);
            }
        }
        return Grad;
    }

    private void computeHiddenOutputs() {
        double sigma = 0;
        int i;
        int j;

        for (i = 0; i < m_NueronsNum; i++) {
            sigma = 0;
            for (j = 0; j < m_InputNum; j++) {
                sigma += m_HiddenLayerWeights[i * m_InputNum + j] * m_InputLayer[j];
            }
            m_Sigma[i] = sigma / m_InputNum;
            m_HiddenLayerOutputs[i] = tangh(sigma);
        }
    }
    // compute output for every layer in the net, including the final output
    private void computeNetOutput() {
        int i;
        double sigma = 0;

        for (i = 0; i < m_NueronsNum; i++) {
            sigma += m_OutputLayerWeights[i] * m_HiddenLayerOutputs[i];
        }
        m_OutputLayer = sigma / m_NueronsNum;
    }
    private double dTangh(double x) {
        double exp = Math.exp(x);
        double mexp = Math.exp(-x);
        double Ret = 4.0 / ((exp + mexp) * (exp + mexp));
        return Ret;
    }
    public void finishUpdateWeights(double Score) {
        computeDeltaW(Score, 0.0, m_OutputLayer);
		finishUpdateWeights();
    }
	public void finishUpdateWeights() {
        m_t = 1;
        m_alpha = m_alpha * m_AlphaChangeRatio;
        initOutputs();
    }
    /**
     * Insert the method's description here.
     * Creation date: (9/22/2001 11:58:28 AM)
     * @return double
     */
    public double getAlphaChangeRatio() {
        return m_AlphaChangeRatio;
    }
    public int getHiddenLayerSize() {
        return m_NueronsNum;
    }
    /**
     * Insert the method's description here.
     * Creation date: (9/22/2001 2:30:06 PM)
     * @return double
     */
    public double getLambda() {
        return m_Lambda;
    }
    public double getScore(double[] Input) {
        initInputLayer(Input);
        computeHiddenOutputs();
        computeNetOutput();
        return m_OutputLayer;
    }
    /**
     * Insert the method's description here.
     * Creation date: (9/22/2001 2:32:17 PM)
     * @return double
     */
    public double getAlpha() {
        return m_alpha;
    }
    // initialized input layer	
    private void initInputLayer(double[] Arr) {
        System.arraycopy(Arr, 0, m_InputLayer, 0, m_InputNum);
    }
    // initialized weights
    private void initWeights() {
        for (int i = 0; i < m_NueronsNum; i++) {
            m_OutputLayerWeights[i] = rand.nextDouble() * 2 - 1;
        }
        for (int i = 0; i < m_InputNum * m_NueronsNum; i++) {
            m_HiddenLayerWeights[i] = rand.nextDouble() * 2 - 1;
        }
    }
    public static NueralNet loadNet(String filename, AdvancedRobot bot) {
        NueralNet Net = null;

        if (filename != null) {
            try {
                //ObjectInputStream in = new ObjectInputStream(new FileInputStream(FileName));
                //Net = new NueralNet();

				ZipInputStream zipin = new ZipInputStream(new
				FileInputStream(bot.getDataFile(filename + ".zip")));
				zipin.getNextEntry();
				ObjectInputStream in = new ObjectInputStream(zipin);
				
				Net = new NueralNet();

                Net.m_OutputLayer = in.readDouble();
                Net.m_InputLayer = (double[]) in.readObject();
                Net.m_HiddenLayerOutputs = (double[]) in.readObject();
                Net.m_HiddenLayerWeights = (double[]) in.readObject();
                Net.m_OutputLayerWeights = (double[]) in.readObject();
                Net.m_Sigma = (double[]) in.readObject();
                Net.m_LastOutput = in.readDouble();
                Net.m_Grads = (Vector) in.readObject();
                Net.m_InputNum = in.readInt();
                Net.m_NueronsNum = in.readInt();
                Net.m_t = in.readInt();
                Net.m_TimePeriod = in.readInt();
                Net.m_alpha = in.readDouble();
                Net.m_LambdaPower = (double[]) in.readObject();
                Net.m_Lambda = in.readDouble();
                Net.m_AlphaChangeRatio = in.readDouble();
                in.close();
            } catch (Exception E) {
				Net = null;
            }
        }
        return Net;
    }
    private void prepareLambdaPower() {
        m_LambdaPower = new double[m_TimePeriod];
        int i;

        for (i = 0; i < m_TimePeriod; i++) {
            m_LambdaPower[i] = Math.pow(m_Lambda, i);
        }
    }

    public void saveNet(String filename, AdvancedRobot bot) {
        if (filename != null) {
            try {
                //ObjectOutputStream out =
                //   new ObjectOutputStream(new FileOutputStream(FileName, false));

				ZipOutputStream zipout = new ZipOutputStream(new RobocodeFileOutputStream(bot.getDataFile(filename + ".zip")));
				zipout.setLevel(9);
				zipout.putNextEntry(new ZipEntry(filename));
				ObjectOutputStream out = new ObjectOutputStream(zipout);

                out.writeDouble(m_OutputLayer);
                out.writeObject(m_InputLayer);
                out.writeObject(m_HiddenLayerOutputs);
                out.writeObject(m_HiddenLayerWeights);
                out.writeObject(m_OutputLayerWeights);
                out.writeObject(m_Sigma);
                out.writeDouble(m_LastOutput);
                out.writeObject(m_Grads);
                out.writeInt(m_InputNum);
                out.writeInt(m_NueronsNum);
                out.writeInt(m_t);
                out.writeInt(m_TimePeriod);
                out.writeDouble(m_alpha);
                out.writeObject(m_LambdaPower);
                out.writeDouble(m_Lambda);
                out.writeDouble(m_AlphaChangeRatio);

				out.flush();
				zipout.closeEntry();
				out.close();
       
            } catch (Exception E) {

            }
        }
    }
    // Saving Outputs for current network run
    private void saveOutputs() {
        m_LastOutput = m_OutputLayer;
        m_Grads.addElement(computeGrad());
    }

    private void initOutputs() {
        m_Grads.removeAllElements();
    }
    /**
     * Insert the method's description here.
     * Creation date: (9/22/2001 11:58:28 AM)
     * @param newAlphaChangeRatio double
     */
    public void setAlphaChangeRatio(double newAlphaChangeRatio) {
        m_AlphaChangeRatio = newAlphaChangeRatio;
    }

    public void setLambda(double newLambda) {
        m_Lambda = newLambda;
        prepareLambdaPower();
    }
    /**
     * Insert the method's description here.
     * Creation date: (9/22/2001 2:32:17 PM)
     * @param newStartAlpha double
     */
    public void setAlpha(double newStartAlpha) {
        m_alpha = newStartAlpha;
    }
    public void updateWeights(double reward, double[] Input) {
        getScore(Input);
        if (m_t > 1) {
            computeDeltaW(reward, m_OutputLayer, m_LastOutput);
        }
        getScore(Input);
        saveOutputs();
        m_t++;
    }
    private double tangh(double x) {
        double exp = Math.exp(x);
        double mexp = Math.exp(-x);
        double Ret = (exp - mexp) / (exp + mexp);
        return Ret;
    }
}


Has someone tried TD nets for anything in Robocode yet? How would you use it? -- PEZ

Err... it'll be usefull if there is a discription of what each function does -- nfwu

o.o I don't get any of it. Like NFWU said... -- Jonanin?

You can understand the most things if you know how NeuralNetwork´s work. I think i´ll try it in my Christmas-Holidays :) --Krabb

I'm not sure but... TD is kind of reinforcement learning. It uses sort of punishement and backpropagation like. The general problem in Robocodo related to NN is that Robocode feeds the data time based. For instace every new scan time or bullet speed. However BP, SOM, ect NN requries a training data set and it takes long time to train. (Reinforced learning is much more slower). I suggest to look backprop NN or any other with sequantial or online learning. Basicly it learns from each new data.

--SSO


Robo Home | NeuralTargeting | Changes | Preferences | AllPages
Edit text of this page | View other revisions
Last edited November 13, 2006 17:30 EST by c213-100-59-237.swipnet.se (diff)
Search: