Discussion:
Best parameters for FANN training?
Yaroslav Efremov
2009-01-22 14:37:22 UTC
Permalink
------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
Anatoly Yakovenko
2009-01-28 22:34:15 UTC
Permalink
you could try a genetic algorithm to guess the best input. but those
have a huge set of inputs of their own

btw, i notice your correlation is pretty high, how are you scaling
your data? a common mistake is to find the scaling paremeters over
all the data, including the one thats used for corss validation. that
leaks some properties of the distribution.

Anatoly
Hi all,
I use FANN to predict time series of economic data. I have about 1500 000
training data, then 350 000 data for (cross) validation, no test data.
I am doing it for last 2 years and I have one network which shows results
better then others. But I did not remember the way, how did I train it :(.
Best NN: correlation = 0.57
Other NNs: correlation = 0.55
For me that difference is very important! Could you please suggest me the
way to improve "generalization"?
I have 16 enters, that is fixed.
1 hidden layer with 24 neurons.
1 output.
All enters are scaled so 90% of data are within [-2 .. +2]
Output is scaled, it is the future value of time series data, so it is also
within [-2 .. + 2]
Activation function is sigmoid with my small modification to generate
results in [-2 .. +2], derivation function is also with 2x multiplication.
data = read train data
CV = read CV data from separate file
max_iterations = 7000
fann_set_activation_function_hidden(ann, FANN_SIGMOID_SYMMETRIC);
fann_set_activation_function_output(ann, FANN_SIGMOID_SYMMETRIC);
for ( i=0; i<max_iterations; i++ )
{
fann_reset_error( ann );
for ( j=0; j<data->num_data; j++ )
fann_train( ann, data->input[j], data->output[j] );
fErr = fann_get_error( ann );
if ( fErr < fMinError )
{
nMinErrorStep = i;
fMinError = fErr;
//save best network
fann_save(ann, "best_test.net");
}
//calculate CV error
fann_reset_error( ann );
for ( j=0; j<cv_data->num_data; j++ )
fann_test( ann, cv_data->input[j], cv_data->output[j] );
fCVErr = fann_get_error( ann );
if ( fCVErr < fCVMinError )
{
nCVMinErrorStep = i;
fCVMinError = fCVErr;
fMinSteepness = fSteepness;
//save best network, CV
fann_save(ann, "cv.net");
}
if ( i%10 == 0 )
{
learning_rate *= fMul;
fann_set_learning_rate( ann, learning_rate );
}
/* OPTIONAL
if ( i > 200 )
{
fSteepness *= 0.99;
fann_set_activation_steepness_hidden( ann, fSteepness );
fann_set_activation_steepness_output( ann, fSteepness );
}
*/
} //end of training cycle
//here I print statistics: minimum error, mimimum CV error, CV step so on
fMinErr = 0.757854 step = 2629
fMinCVErr = 0.758024 step = 3881 steepness = 0.5
I know that my best network is with steepness = 0.055 so "OPTIONAL" code was
switched on.
With my algorithm when I diminish learning_rate in training cycle I am
getting better results then with fixed learning_rate.
Very strange, that with training process "error on training data" starts
increasing from some step!!! I can not understand that!
And CV error also increasing, which is normal. But the minimum step for CV
is 3881 which is much above then error step 2629. Note, I am using *= 0.96
so learning_rate is exponencially decreasing!
There are too many parameters to vary :(
-Activation function. Sigmoid, Gaussian, others
-Steepness. Should it be fixed or variable?
-Number of neurons in the hidden layer. Number of hidden layers. Recurrent
networks.
-How to scale input and output data
-Which data should I use as input parameters (model specific)
-Training algorithm. I am using "INCREMENTAL". I can also use Batch, RPROP,
QuickProp. Here is a link
http://scien.stanford.edu/class/ee368/projects2000/project2/node5.html#SECTION00032000000000000000
author states that "Conjugate gradient" method gives much better results
then usual "backpropagation".
Or should I go on with entirely new method like "Bayesian learning"?
Please help me.
Thanks in advance,
Yaroslav
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
D. Cooper Stevenson
2009-01-28 23:33:50 UTC
Permalink
Hi Anatoly,
Post by Anatoly Yakovenko
btw, i notice your correlation is pretty high, how are you scaling
your data? a common mistake is to find the scaling paremeters over
all the data, including the one thats used for corss validation. that
leaks some properties of the distribution.
Can you please elaborate on this a little? If I interpreted the above
correctly, I understand that scaling the train and test data as a single
unit is actually a bad idea. Am I correct in my interpretation?

This is important to me as I am changing my scaling algorithm to do just
that and I would rather not continue down a path that is likely to yield
incorrect results.

Would you please clarify?

Very Best,


-Coop
--
D. Cooper "Coop" Stevenson, Systems Engineer
Ph: 541.971.0366
Em: cooper-/kSeq5SmJ+***@public.gmane.org
Ww: http://cooper.stevenson.name
Anatoly Yakovenko
2009-01-29 00:03:22 UTC
Permalink
in finance its really easy to see, if i could tell you that for the
next year the average price of some equity will be N, then a dead
simple and profitable algorithm would be to buy when the price is
below N and to sell when its above.

depending on your fitness function, even leaking variance or any
knowledge of the future distribution would pollute your results.
Post by D. Cooper Stevenson
Hi Anatoly,
Post by Anatoly Yakovenko
btw, i notice your correlation is pretty high, how are you scaling
your data? a common mistake is to find the scaling paremeters over
all the data, including the one thats used for corss validation. that
leaks some properties of the distribution.
Can you please elaborate on this a little? If I interpreted the above
correctly, I understand that scaling the train and test data as a single
unit is actually a bad idea. Am I correct in my interpretation?
This is important to me as I am changing my scaling algorithm to do just
that and I would rather not continue down a path that is likely to yield
incorrect results.
Would you please clarify?
Very Best,
-Coop
--
D. Cooper "Coop" Stevenson, Systems Engineer
Ph: 541.971.0366
Ww: http://cooper.stevenson.name
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
D. Cooper Stevenson
2009-01-29 00:09:15 UTC
Permalink
Hi Anatoly,

I see exactly. Thank you for helping me see clearly and saving me a lot of
work.


Very Best,


-Coop
Post by Anatoly Yakovenko
in finance its really easy to see, if i could tell you that for the
next year the average price of some equity will be N, then a dead
simple and profitable algorithm would be to buy when the price is
below N and to sell when its above.
depending on your fitness function, even leaking variance or any
knowledge of the future distribution would pollute your results.
Post by D. Cooper Stevenson
Hi Anatoly,
On Wed, Jan 28, 2009 at 2:34 PM, Anatoly Yakovenko <
Post by Anatoly Yakovenko
btw, i notice your correlation is pretty high, how are you scaling
your data? a common mistake is to find the scaling paremeters over
all the data, including the one thats used for corss validation. that
leaks some properties of the distribution.
Can you please elaborate on this a little? If I interpreted the above
correctly, I understand that scaling the train and test data as a single
unit is actually a bad idea. Am I correct in my interpretation?
This is important to me as I am changing my scaling algorithm to do just
that and I would rather not continue down a path that is likely to yield
incorrect results.
Would you please clarify?
Very Best,
-Coop
--
D. Cooper "Coop" Stevenson, Systems Engineer
Ph: 541.971.0366
Ww: http://cooper.stevenson.name
------------------------------------------------------------------------------
Post by D. Cooper Stevenson
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
--
D. Cooper "Coop" Stevenson, Systems Engineer
Ph: 541.971.0366
Em: cooper-/kSeq5SmJ+***@public.gmane.org
Ww: http://cooper.stevenson.name
Yaroslav Efremov
2009-01-29 21:55:01 UTC
Permalink
Hi,

One learning process of my NN takes 24 hours. I can not use genetic
optimization here. I am trying to find best parameters with hands'
optimization.

As for the scaling, I do not setup any range for the final array, nor do I
use CV statistics. For example, my input data vary from -8 till +8, but most
of the data is within [-2 .. +2] interval. I do not use mean and variance
when scaling data.



During the last few days I have been trying different training algorithms.
TRAIN_BATCH was worse for my problem then simple TRAIN_INCREMENTAL. It seems
TRAIN_INCREMENTAL trains according with incoming data so that it does not
take so long time.

TRAIN_QUICKPROP was the worst one. It produces neuron' weights around +20
and -20 which influences on the result. These weights give me border values
-2 and +2 on the exit. MSE, correlation statistics is not good.

I tried also TRAIN_RPROP. It is better then TRAIN_QUICKPROP. But it also
produces large weights which influences the all picture.

Now I am trying to load Neural Network which was trained using
TRAIN_INCREMENTAL and then train it with TRAIN_RPROP. I do not see better
results, but I need some time to train several networks.

What I like about TRAIN_RPROP is that best step for test set is very close
to the best step for CV set. That may be the sign of generalization, not the
overfitting. I will take some out of sample data to test actual performance.


Could you advise on best parameters for RPROP algorithm for final tuning?
Which activation functions shall I try? Does anybody have experience with
Bayesian Learning?

Thanks in advance,
Yaroslav
-----Original Message-----
Sent: Thursday, January 29, 2009 1:34 AM
Subject: Re: [Fann-general] Best parameters for FANN training?
you could try a genetic algorithm to guess the best input. but those
have a huge set of inputs of their own
btw, i notice your correlation is pretty high, how are you scaling
your data? a common mistake is to find the scaling paremeters over
all the data, including the one thats used for corss validation. that
leaks some properties of the distribution.
Anatoly
----------------------------------------------------------------------------
--
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
Anatoly Yakovenko
2009-01-30 19:27:16 UTC
Permalink
are your results reproducible? meaning that for the same inputs, with
different randomized weights, you get the same output? if not, then
you may just be seeing some lucky results due to how your initialize
your weights.

I would try to see what weights you are getting from your inputs, see
if its repeatable, and remove the ones that the NN tends not to use as
much to shrink down your problem.

plot your MSE function for each run, see which parameters work well
and under what stage. you can design the NN to switch to the better
configuration when that configuration will work best. its really hard
to say what numbers work best, it all depends on your data.
Post by Yaroslav Efremov
As for the scaling, I do not setup any range for the final array, nor do I
use CV statistics. For example, my input data vary from -8 till +8, but most
of the data is within [-2 .. +2] interval. I do not use mean and variance
when scaling data.
i hope 0 is not the average value.
Post by Yaroslav Efremov
During the last few days I have been trying different training algorithms.
TRAIN_BATCH was worse for my problem then simple TRAIN_INCREMENTAL. It seems
TRAIN_INCREMENTAL trains according with incoming data so that it does not
take so long time.
batch trains everything at once. an NN is really time indifferent,
given a random set of weights, it shouldn't matter what order you
train your inputs. This lecture, although on RBMs is very applicable



you "should" be able to arrive at the same result via batch or
incremental, just may take more time. as far as what activation
function to use, think of it as a fourier transform. periodic data is
modled really well with periodic functions. for markets, maybe
Gaussian is the best, who knows. if you know the answer to that you
are 90% of the way there.

Loading...