Cross Validation

Discussion:

Cross Validation

Chris Spencer

2007-03-09 14:35:59 UTC

Is there any way to perform cross-validation during training with
either the backprop or cascade correlation methods? I don't see any
explicit methods for this in the docs. I see the short paragraph
"Avoid Over-Fitting" on the advanced usage page, but it doesn't
actually describe how to do this.

Regards,
Chris

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

Josh Menke

2007-03-09 15:18:40 UTC

Permalink

It's not built in to fann, but you can do it yourself. With ANNs I don't
usually use cross-validation, but instead just a single validation set. This
is mostly due to the size of the sets I use though (millions).

--Josh

Post by Chris Spencer
Is there any way to perform cross-validation during training with
either the backprop or cascade correlation methods? I don't see any
explicit methods for this in the docs. I see the short paragraph
"Avoid Over-Fitting" on the advanced usage page, but it doesn't
actually describe how to do this.
Regards,
Chris
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke
Machine Learning Scientist
Trust and Safety Applied Research
ebay, Inc
josh-***@public.gmane.org

Chris Spencer

2007-03-09 15:46:40 UTC

Permalink

But how do you do it? Do you just call train_on_data until test_data
gives a small enough MSE for your validation set?

Regards,
Chris

Post by Josh Menke
It's not built in to fann, but you can do it yourself. With ANNs I don't
usually use cross-validation, but instead just a single validation set. This
is mostly due to the size of the sets I use though (millions).
--Josh

-------------------------------------------------------------------------

Post by Chris Spencer
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share

your

Post by Chris Spencer
opinions on IT & business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

Post by Chris Spencer
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke
Machine Learning Scientist
Trust and Safety Applied Research
ebay, Inc
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

Josh Menke

2007-03-09 15:59:46 UTC

Permalink

The basic idea is you train once on all your data, and then test on your
validation set. Then you monitor whichever performance metric you care
about. MSE, accuracy, cross-entropy, precision, recall, preferably cost if
you can.

If fann doesn't support your performance metric, then you need to either
write a test call back, or your own test_on_data function.

Then, you stop training when your performance stops improving. How you
define that is also up to you. I do something like, if there has been no
improvement in 100 epochs, I randomly choose from the best nets I've seen to
tie-break.

I've wanted to try something like a sign test to measure statistical
significance between epochs, but then I got into Bayesian methods where you
don't need a validatoin set.

--Josh

Post by Chris Spencer
But how do you do it? Do you just call train_on_data until test_data
gives a small enough MSE for your validation set?
Regards,
Chris

Post by Josh Menke
It's not built in to fann, but you can do it yourself. With ANNs I don't
usually use cross-validation, but instead just a single validation set.

This

Post by Josh Menke
is mostly due to the size of the sets I use though (millions).
--Josh

-------------------------------------------------------------------------

Post by Josh Menke

Post by Chris Spencer
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to

Post by Josh Menke
your

Post by Chris Spencer
opinions on IT & business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

Post by Josh Menke

Post by Chris Spencer
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke
Machine Learning Scientist
Trust and Safety Applied Research
ebay, Inc

-------------------------------------------------------------------------

Post by Josh Menke
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share

your

Post by Josh Menke
opinions on IT & business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

Post by Josh Menke
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke
Machine Learning Scientist
Trust and Safety Applied Research
ebay, Inc
josh-***@public.gmane.org

Vincenzo Di Massa

2007-03-09 18:40:08 UTC

Permalink

you can just use FANNTrainer... it does cross validation.
just look at the archive.

Vincenzo

Post by Josh Menke
The basic idea is you train once on all your data, and then test on your
validation set. Then you monitor whichever performance metric you care
about. MSE, accuracy, cross-entropy, precision, recall, preferably cost if
you can.
If fann doesn't support your performance metric, then you need to either
write a test call back, or your own test_on_data function.
Then, you stop training when your performance stops improving. How you
define that is also up to you. I do something like, if there has been no
improvement in 100 epochs, I randomly choose from the best nets I've seen
to tie-break.
I've wanted to try something like a sign test to measure statistical
significance between epochs, but then I got into Bayesian methods where you
don't need a validatoin set.
--Josh

Post by Chris Spencer
But how do you do it? Do you just call train_on_data until test_data
gives a small enough MSE for your validation set?
Regards,
Chris

Post by Josh Menke
It's not built in to fann, but you can do it yourself. With ANNs I
don't usually use cross-validation, but instead just a single
validation set.

This

Post by Josh Menke
is mostly due to the size of the sets I use though (millions).
--Josh

-------------------------------------------------------------------------

Post by Josh Menke

Post by Chris Spencer
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to

Post by Josh Menke
your

Post by Chris Spencer
opinions on IT & business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

Post by Josh Menke

Post by Chris Spencer
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke
Machine Learning Scientist
Trust and Safety Applied Research
ebay, Inc

-------------------------------------------------------------------------

Post by Josh Menke
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share

your

Post by Josh Menke
opinions on IT & business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

Post by Josh Menke
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general