Discussion:
Very large datasets
Poul-Erik Andreasen
2007-01-31 18:08:28 UTC
Permalink
Hi

There have earlier been some discussion about datasets larger than 4G,
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as far as i
can say, correct mee if i am worong). But i should like to know if any
one have tried that with succes. Else i have to try it:-)

Poul-Erik Andreasen.





-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Mark Knecht
2007-01-31 18:21:43 UTC
Permalink
Hi,
I have a 64-bit machine Gentoo machine but I'm probably not enough
of a programmer to do this on my own. (I've not used FANN yet at all
actually.) If you have some code and a large data set then I'd be
happy to try building it and seeing if it runs.

Contact me off-line if that's of interest.

Cheers,
Mark
Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets larger than 4G,
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as far as i
can say, correct mee if i am worong). But i should like to know if any
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Josh Menke
2007-01-31 18:30:55 UTC
Permalink
As a quick work around, I wrote a batch in SAS that split an 8 GB data set
into manageable chunks one at at time and then ran my FANN code on each
split.

--Josh
Post by Mark Knecht
Hi,
I have a 64-bit machine Gentoo machine but I'm probably not enough
of a programmer to do this on my own. (I've not used FANN yet at all
actually.) If you have some code and a large data set then I'd be
happy to try building it and seeing if it runs.
Contact me off-line if that's of interest.
Cheers,
Mark
Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets larger than 4G,
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as far as i
can say, correct mee if i am worong). But i should like to know if any
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.
-------------------------------------------------------------------------
Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web services,
security?
Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
Post by Poul-Erik Andreasen
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
josh-***@public.gmane.org
Conor Robinson
2007-01-31 19:21:48 UTC
Permalink
If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.
Post by Josh Menke
As a quick work around, I wrote a batch in SAS that split an 8 GB data set
into manageable chunks one at at time and then ran my FANN code on each
split.
--Josh
Post by Mark Knecht
Hi,
I have a 64-bit machine Gentoo machine but I'm probably not enough
of a programmer to do this on my own. (I've not used FANN yet at all
actually.) If you have some code and a large data set then I'd be
happy to try building it and seeing if it runs.
Contact me off-line if that's of interest.
Cheers,
Mark
Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets larger than 4G,
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as far as i
can say, correct mee if i am worong). But i should like to know if any
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.
-------------------------------------------------------------------------
Post by Mark Knecht
Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web services,
security?
Post by Mark Knecht
Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Mark Knecht
Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Mark Knecht
Post by Poul-Erik Andreasen
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Mark Knecht
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Mark Knecht
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Mark Knecht
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Josh Menke
2007-01-31 19:32:18 UTC
Permalink
There are a couple of reasons in general for wanting a very large data set
if you can have one instead of sampling:

1. Like you mentioned, sparse data. But I also mean sparse as in the target
classes may have very few members compared to the whole population. For
example, a concept learning (2-class) problem where one class represents
99.99% of the population. In this case, if you want to both have enough data
to learn the concept AND automatically infer the prior distributions
correctly, then having a lot of data is an easy way to go.

2. If you have a very difficult problem, then research has shown that neural
networks have an uncanny ability to continue to improve accuracy by using
more and more data and larger and larger networks. A group out of ICSI at
Berkely showed this a few years back for large-scale speaker-independent
phoneme recognition. They were using MASSIVE speech corpora and showing the
accuracy kept increasing at a rate worth the cost.

--Josh
Post by Conor Robinson
If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.
Post by Josh Menke
As a quick work around, I wrote a batch in SAS that split an 8 GB data
set
Post by Josh Menke
into manageable chunks one at at time and then ran my FANN code on each
split.
--Josh
Post by Mark Knecht
Hi,
I have a 64-bit machine Gentoo machine but I'm probably not enough
of a programmer to do this on my own. (I've not used FANN yet at all
actually.) If you have some code and a large data set then I'd be
happy to try building it and seeing if it runs.
Contact me off-line if that's of interest.
Cheers,
Mark
Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets larger than
4G,
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as far
as i
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
can say, correct mee if i am worong). But i should like to know if
any
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.
-------------------------------------------------------------------------
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web services,
security?
Post by Mark Knecht
Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to make your
job
Post by Josh Menke
easier.
Post by Mark Knecht
Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Mark Knecht
Post by Poul-Erik Andreasen
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Josh Menke
Post by Mark Knecht
Using Tomcat but need to do more? Need to support web services,
security?
Post by Josh Menke
Post by Mark Knecht
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Mark Knecht
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
Post by Josh Menke
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Mark Knecht
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
-------------------------------------------------------------------------
Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,
security?
Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
Post by Josh Menke
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
josh-***@public.gmane.org
Conor Robinson
2007-01-31 21:22:00 UTC
Permalink
I agree with you on the fact that more data is better, however was not
aware of the study you mentioned in (2).

For the 2-class problem you mentioned where one of your classes is
99.99% as with many problems such as tumor detection. Would you not
duplicate your .01% class to 1:1 with your second class, for example,
in your training batch (thus really needing more than 4G in some
cases)? I try and keep my distributions as accurate as possible,
however, I find apply cost matrices and other methods post training
much less effective. No matter how large your data set gets, I don't
see the network becoming more effective unless youre changing your
ratio. As for your second point, I guess increasing the size of your
network would depend on your data, many times I find larger networks
becoming more prone to over fit.

What are your thoughts on encoding very large 1ofC categories for
neural nets, even with a very large data set, you encounter sparse
areas, that impede training. What types of intelligent compression
might be effective for 1ofC?

Thanks for your thoughts.

Conor
Post by Josh Menke
There are a couple of reasons in general for wanting a very large data set
1. Like you mentioned, sparse data. But I also mean sparse as in the target
classes may have very few members compared to the whole population. For
example, a concept learning (2-class) problem where one class represents
99.99% of the population. In this case, if you want to both have enough data
to learn the concept AND automatically infer the prior distributions
correctly, then having a lot of data is an easy way to go.
2. If you have a very difficult problem, then research has shown that neural
networks have an uncanny ability to continue to improve accuracy by using
more and more data and larger and larger networks. A group out of ICSI at
Berkely showed this a few years back for large-scale speaker-independent
phoneme recognition. They were using MASSIVE speech corpora and showing the
accuracy kept increasing at a rate worth the cost.
--Josh
Post by Conor Robinson
If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.
Post by Josh Menke
As a quick work around, I wrote a batch in SAS that split an 8 GB data
set
Post by Conor Robinson
Post by Josh Menke
into manageable chunks one at at time and then ran my FANN code on each
split.
--Josh
Post by Mark Knecht
Hi,
I have a 64-bit machine Gentoo machine but I'm probably not enough
of a programmer to do this on my own. (I've not used FANN yet at all
actually.) If you have some code and a large data set then I'd be
happy to try building it and seeing if it runs.
Contact me off-line if that's of interest.
Cheers,
Mark
Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets larger than
4G,
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as far
as i
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
can say, correct mee if i am worong). But i should like to know if
any
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.
-------------------------------------------------------------------------
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web services,
security?
Post by Mark Knecht
Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to make your
job
Post by Conor Robinson
Post by Josh Menke
easier.
Post by Mark Knecht
Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Mark Knecht
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
Post by Conor Robinson
Post by Josh Menke
--
Joshua Menke
-------------------------------------------------------------------------
Post by Conor Robinson
Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Conor Robinson
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Conor Robinson
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Josh Menke
2007-02-06 17:51:11 UTC
Permalink
Hi Conor,

I have some experience in these areas, but not a huge amount.

I haven't had a whole lot of luck doing the 1:1 approach and using
cost/prior matrices. Especially in those 99.99% situations. I found just
adding a ton more data worked a lot better than balanced + multipliers
afterwards.

I "sort-of" believe that what you really need is "enough" examples for each
class, whether it be 2-class or a large 1-of-C situation.

I have seen good results from either using "a lot" of data and or from
"balancing" the data in large, sparse 1-of-C situations.

But I have also seen 1:1 approaches with 2-classes and cost-matrices fail
miserably compared to just using a lot more data.

This tells me that if you want to use something other than the true
distribution, it's not a trivial thing to find which distribution you should
use.

So in short, the simplest approach is to have a lot of data, enough that you
represent even the less frequent classes. This way you can maintain the
prior in the data, and still learn each class. I think this works best and
is only time and space intensive.

Failing that, it's trial and error, which is "resource" intensive where YOU
the scientist are the resource. This is because you have to try and
"engineer" distributions of the classes that will fit your needs. This may
be "balanced" or it may be something between balanced and the true
distribution. But finding that "sweet spot" may be more costly in terms of
YOUR time, than just getting more data or making your training handle more
data that you already have.

I hope I haven't just rambed too much.

--Josh
Post by Conor Robinson
I agree with you on the fact that more data is better, however was not
aware of the study you mentioned in (2).
For the 2-class problem you mentioned where one of your classes is
99.99% as with many problems such as tumor detection. Would you not
duplicate your .01% class to 1:1 with your second class, for example,
in your training batch (thus really needing more than 4G in some
cases)? I try and keep my distributions as accurate as possible,
however, I find apply cost matrices and other methods post training
much less effective. No matter how large your data set gets, I don't
see the network becoming more effective unless youre changing your
ratio. As for your second point, I guess increasing the size of your
network would depend on your data, many times I find larger networks
becoming more prone to over fit.
What are your thoughts on encoding very large 1ofC categories for
neural nets, even with a very large data set, you encounter sparse
areas, that impede training. What types of intelligent compression
might be effective for 1ofC?
Thanks for your thoughts.
Conor
Post by Josh Menke
There are a couple of reasons in general for wanting a very large data
set
Post by Josh Menke
1. Like you mentioned, sparse data. But I also mean sparse as in the
target
Post by Josh Menke
classes may have very few members compared to the whole population. For
example, a concept learning (2-class) problem where one class represents
99.99% of the population. In this case, if you want to both have enough
data
Post by Josh Menke
to learn the concept AND automatically infer the prior distributions
correctly, then having a lot of data is an easy way to go.
2. If you have a very difficult problem, then research has shown that
neural
Post by Josh Menke
networks have an uncanny ability to continue to improve accuracy by
using
Post by Josh Menke
more and more data and larger and larger networks. A group out of ICSI
at
Post by Josh Menke
Berkely showed this a few years back for large-scale speaker-independent
phoneme recognition. They were using MASSIVE speech corpora and showing
the
Post by Josh Menke
accuracy kept increasing at a rate worth the cost.
--Josh
Post by Conor Robinson
If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.
Post by Josh Menke
As a quick work around, I wrote a batch in SAS that split an 8 GB
data
Post by Josh Menke
set
Post by Conor Robinson
Post by Josh Menke
into manageable chunks one at at time and then ran my FANN code on
each
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
split.
--Josh
Post by Mark Knecht
Hi,
I have a 64-bit machine Gentoo machine but I'm probably not
enough
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
of a programmer to do this on my own. (I've not used FANN yet at
all
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
actually.) If you have some code and a large data set then I'd be
happy to try building it and seeing if it runs.
Contact me off-line if that's of interest.
Cheers,
Mark
Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets larger
than
Post by Josh Menke
4G,
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as
far
Post by Josh Menke
as i
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
can say, correct mee if i am worong). But i should like to know
if
Post by Josh Menke
any
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web services,
security?
Post by Mark Knecht
Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to make
your
Post by Josh Menke
job
Post by Conor Robinson
Post by Josh Menke
easier.
Post by Mark Knecht
Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on
Apache
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Get stuff done quickly with pre-integrated technology to make your
job
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
easier.
Post by Mark Knecht
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
Post by Conor Robinson
Post by Josh Menke
--
Joshua Menke
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make your
job
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Using Tomcat but need to do more? Need to support web services,
security?
Post by Josh Menke
Post by Conor Robinson
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Conor Robinson
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
Post by Josh Menke
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
-------------------------------------------------------------------------
Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,
security?
Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
Post by Josh Menke
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
josh-***@public.gmane.org
Conor Robinson
2007-02-06 18:55:00 UTC
Permalink
Thanks Josh,

This seems to be a large area along with encoding, which seem to get a
glossed over in most papers and research. "we just did this, because
it worked". Encoding, many times doesn't get any mention.

I will have to run some more big data tests. My pos class is about
.001%., and I've had issues getting networks to "recognize", without
1:1. The bias becomes 100% towards the negative class. I don't have
an unlimited amount of data, but at best about 5mil rows, so I may be
able to pull a large enough set. I now understand exactly what you
were referring to when you reference the data set size study from
before.

My second issue (which exacerbates the first) is what to do with one
of my attributes with has about 400+ categories. I've been
compressing the length 400 bins down to about 50 using basic stats and
moving the low probability categories into the first bin position and
saving a key to translate future bins. PCA makes no sense for a 1ofC
scenario. If I can come up with or find a better technique for 1ofC
compression, a big data test may be fruitful.

My optimal solution is a GA which selects different input combinations
and again within each attribute, however this becomes extremely
intensive.

Thanks again for your thoughts, don't worry about rambling, that's
about what I'm reduced to most days :)

Conor
Post by Josh Menke
Hi Conor,
I have some experience in these areas, but not a huge amount.
I haven't had a whole lot of luck doing the 1:1 approach and using
cost/prior matrices. Especially in those 99.99% situations. I found just
adding a ton more data worked a lot better than balanced + multipliers
afterwards.
I "sort-of" believe that what you really need is "enough" examples for each
class, whether it be 2-class or a large 1-of-C situation.
I have seen good results from either using "a lot" of data and or from
"balancing" the data in large, sparse 1-of-C situations.
But I have also seen 1:1 approaches with 2-classes and cost-matrices fail
miserably compared to just using a lot more data.
This tells me that if you want to use something other than the true
distribution, it's not a trivial thing to find which distribution you should
use.
So in short, the simplest approach is to have a lot of data, enough that you
represent even the less frequent classes. This way you can maintain the
prior in the data, and still learn each class. I think this works best and
is only time and space intensive.
Failing that, it's trial and error, which is "resource" intensive where YOU
the scientist are the resource. This is because you have to try and
"engineer" distributions of the classes that will fit your needs. This may
be "balanced" or it may be something between balanced and the true
distribution. But finding that "sweet spot" may be more costly in terms of
YOUR time, than just getting more data or making your training handle more
data that you already have.
I hope I haven't just rambed too much.
--Josh
Post by Conor Robinson
I agree with you on the fact that more data is better, however was not
aware of the study you mentioned in (2).
For the 2-class problem you mentioned where one of your classes is
99.99% as with many problems such as tumor detection. Would you not
duplicate your .01% class to 1:1 with your second class, for example,
in your training batch (thus really needing more than 4G in some
cases)? I try and keep my distributions as accurate as possible,
however, I find apply cost matrices and other methods post training
much less effective. No matter how large your data set gets, I don't
see the network becoming more effective unless youre changing your
ratio. As for your second point, I guess increasing the size of your
network would depend on your data, many times I find larger networks
becoming more prone to over fit.
What are your thoughts on encoding very large 1ofC categories for
neural nets, even with a very large data set, you encounter sparse
areas, that impede training. What types of intelligent compression
might be effective for 1ofC?
Thanks for your thoughts.
Conor
Post by Josh Menke
There are a couple of reasons in general for wanting a very large data
set
Post by Conor Robinson
Post by Josh Menke
1. Like you mentioned, sparse data. But I also mean sparse as in the
target
Post by Conor Robinson
Post by Josh Menke
classes may have very few members compared to the whole population. For
example, a concept learning (2-class) problem where one class represents
99.99% of the population. In this case, if you want to both have enough
data
Post by Conor Robinson
Post by Josh Menke
to learn the concept AND automatically infer the prior distributions
correctly, then having a lot of data is an easy way to go.
2. If you have a very difficult problem, then research has shown that
neural
Post by Conor Robinson
Post by Josh Menke
networks have an uncanny ability to continue to improve accuracy by
using
Post by Conor Robinson
Post by Josh Menke
more and more data and larger and larger networks. A group out of ICSI
at
Post by Conor Robinson
Post by Josh Menke
Berkely showed this a few years back for large-scale speaker-independent
phoneme recognition. They were using MASSIVE speech corpora and showing
the
Post by Conor Robinson
Post by Josh Menke
accuracy kept increasing at a rate worth the cost.
--Josh
Post by Conor Robinson
If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.
Post by Josh Menke
As a quick work around, I wrote a batch in SAS that split an 8 GB
data
Post by Conor Robinson
Post by Josh Menke
set
Post by Conor Robinson
Post by Josh Menke
into manageable chunks one at at time and then ran my FANN code on
each
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
split.
--Josh
Post by Mark Knecht
Hi,
I have a 64-bit machine Gentoo machine but I'm probably not
enough
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
of a programmer to do this on my own. (I've not used FANN yet at
all
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
actually.) If you have some code and a large data set then I'd be
happy to try building it and seeing if it runs.
Contact me off-line if that's of interest.
Cheers,
Mark
Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets larger
than
Post by Conor Robinson
Post by Josh Menke
4G,
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as
far
Post by Conor Robinson
Post by Josh Menke
as i
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
can say, correct mee if i am worong). But i should like to know
if
Post by Conor Robinson
Post by Josh Menke
any
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.
-------------------------------------------------------------------------
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web services,
security?
Post by Mark Knecht
Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to make
your
Post by Conor Robinson
Post by Josh Menke
job
Post by Conor Robinson
Post by Josh Menke
easier.
Post by Mark Knecht
Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on
Apache
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Get stuff done quickly with pre-integrated technology to make your
job
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
easier.
Post by Mark Knecht
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
--
Joshua Menke
-------------------------------------------------------------------------
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make your
job
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Conor Robinson
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
Post by Conor Robinson
Post by Josh Menke
--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
-------------------------------------------------------------------------
Post by Conor Robinson
Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Conor Robinson
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Conor Robinson
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Josh Menke
2007-02-06 19:10:31 UTC
Permalink
Hi Conor,

One common way to deal with a large number of categories is to map based on
how they associate with your target class.

If it's a two class problem, for example, you can map each of the possible
categories to how likely it is for that category to belong to the target
class. This results in a serious amount of reduction in information, but is
sometimes enough.

If some categories are sparse, you can use Laplace or Beta-Binomial
Empirical Bayes (or even hierarchical Bayes if you're feeling ambitious) to
smooth them.

--Josh
Post by Conor Robinson
Thanks Josh,
This seems to be a large area along with encoding, which seem to get a
glossed over in most papers and research. "we just did this, because
it worked". Encoding, many times doesn't get any mention.
I will have to run some more big data tests. My pos class is about
.001%., and I've had issues getting networks to "recognize", without
1:1. The bias becomes 100% towards the negative class. I don't have
an unlimited amount of data, but at best about 5mil rows, so I may be
able to pull a large enough set. I now understand exactly what you
were referring to when you reference the data set size study from
before.
My second issue (which exacerbates the first) is what to do with one
of my attributes with has about 400+ categories. I've been
compressing the length 400 bins down to about 50 using basic stats and
moving the low probability categories into the first bin position and
saving a key to translate future bins. PCA makes no sense for a 1ofC
scenario. If I can come up with or find a better technique for 1ofC
compression, a big data test may be fruitful.
My optimal solution is a GA which selects different input combinations
and again within each attribute, however this becomes extremely
intensive.
Thanks again for your thoughts, don't worry about rambling, that's
about what I'm reduced to most days :)
Conor
Post by Josh Menke
Hi Conor,
I have some experience in these areas, but not a huge amount.
I haven't had a whole lot of luck doing the 1:1 approach and using
cost/prior matrices. Especially in those 99.99% situations. I found just
adding a ton more data worked a lot better than balanced + multipliers
afterwards.
I "sort-of" believe that what you really need is "enough" examples for
each
Post by Josh Menke
class, whether it be 2-class or a large 1-of-C situation.
I have seen good results from either using "a lot" of data and or from
"balancing" the data in large, sparse 1-of-C situations.
But I have also seen 1:1 approaches with 2-classes and cost-matrices
fail
Post by Josh Menke
miserably compared to just using a lot more data.
This tells me that if you want to use something other than the true
distribution, it's not a trivial thing to find which distribution you
should
Post by Josh Menke
use.
So in short, the simplest approach is to have a lot of data, enough that
you
Post by Josh Menke
represent even the less frequent classes. This way you can maintain the
prior in the data, and still learn each class. I think this works best
and
Post by Josh Menke
is only time and space intensive.
Failing that, it's trial and error, which is "resource" intensive where
YOU
Post by Josh Menke
the scientist are the resource. This is because you have to try and
"engineer" distributions of the classes that will fit your needs. This
may
Post by Josh Menke
be "balanced" or it may be something between balanced and the true
distribution. But finding that "sweet spot" may be more costly in terms
of
Post by Josh Menke
YOUR time, than just getting more data or making your training handle
more
Post by Josh Menke
data that you already have.
I hope I haven't just rambed too much.
--Josh
Post by Conor Robinson
I agree with you on the fact that more data is better, however was not
aware of the study you mentioned in (2).
For the 2-class problem you mentioned where one of your classes is
99.99% as with many problems such as tumor detection. Would you not
duplicate your .01% class to 1:1 with your second class, for example,
in your training batch (thus really needing more than 4G in some
cases)? I try and keep my distributions as accurate as possible,
however, I find apply cost matrices and other methods post training
much less effective. No matter how large your data set gets, I don't
see the network becoming more effective unless youre changing your
ratio. As for your second point, I guess increasing the size of your
network would depend on your data, many times I find larger networks
becoming more prone to over fit.
What are your thoughts on encoding very large 1ofC categories for
neural nets, even with a very large data set, you encounter sparse
areas, that impede training. What types of intelligent compression
might be effective for 1ofC?
Thanks for your thoughts.
Conor
Post by Josh Menke
There are a couple of reasons in general for wanting a very large
data
Post by Josh Menke
set
Post by Conor Robinson
Post by Josh Menke
1. Like you mentioned, sparse data. But I also mean sparse as in the
target
Post by Conor Robinson
Post by Josh Menke
classes may have very few members compared to the whole population.
For
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
example, a concept learning (2-class) problem where one class
represents
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
99.99% of the population. In this case, if you want to both have
enough
Post by Josh Menke
data
Post by Conor Robinson
Post by Josh Menke
to learn the concept AND automatically infer the prior distributions
correctly, then having a lot of data is an easy way to go.
2. If you have a very difficult problem, then research has shown
that
Post by Josh Menke
neural
Post by Conor Robinson
Post by Josh Menke
networks have an uncanny ability to continue to improve accuracy by
using
Post by Conor Robinson
Post by Josh Menke
more and more data and larger and larger networks. A group out of
ICSI
Post by Josh Menke
at
Post by Conor Robinson
Post by Josh Menke
Berkely showed this a few years back for large-scale
speaker-independent
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
phoneme recognition. They were using MASSIVE speech corpora and
showing
Post by Josh Menke
the
Post by Conor Robinson
Post by Josh Menke
accuracy kept increasing at a rate worth the cost.
--Josh
Post by Conor Robinson
If your running an intel chip, I used icc and recompiled, worked
fine.
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your
data
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
set, but a random sample may be more practical. You could get
better
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
results with x-fold validation. I'm curious as to what kind of
data
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.
Post by Josh Menke
As a quick work around, I wrote a batch in SAS that split an 8
GB
Post by Josh Menke
data
Post by Conor Robinson
Post by Josh Menke
set
Post by Conor Robinson
Post by Josh Menke
into manageable chunks one at at time and then ran my FANN code
on
Post by Josh Menke
each
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
split.
--Josh
Post by Mark Knecht
Hi,
I have a 64-bit machine Gentoo machine but I'm probably not
enough
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
of a programmer to do this on my own. (I've not used FANN yet
at
Post by Josh Menke
all
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
actually.) If you have some code and a large data set then I'd
be
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
happy to try building it and seeing if it runs.
Contact me off-line if that's of interest.
Cheers,
Mark
Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets
larger
Post by Josh Menke
than
Post by Conor Robinson
Post by Josh Menke
4G,
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine
(as
Post by Josh Menke
far
Post by Conor Robinson
Post by Josh Menke
as i
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
can say, correct mee if i am worong). But i should like to
know
Post by Josh Menke
if
Post by Conor Robinson
Post by Josh Menke
any
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web
services,
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
security?
Post by Mark Knecht
Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to
make
Post by Josh Menke
your
Post by Conor Robinson
Post by Josh Menke
job
Post by Conor Robinson
Post by Josh Menke
easier.
Post by Mark Knecht
Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on
Apache
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Post by Poul-Erik Andreasen
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Using Tomcat but need to do more? Need to support web
services,
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
security?
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
Get stuff done quickly with pre-integrated technology to make
your
Post by Josh Menke
job
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
easier.
Post by Mark Knecht
Download IBM WebSphere Application Server v.1.0.1 based on
Apache
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Mark Knecht
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
--
Joshua Menke
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make
your
Post by Josh Menke
job
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
easier.
Download IBM WebSphere Application Server v.1.0.1 based on
Apache
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
Get stuff done quickly with pre-integrated technology to make your
job
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
easier.
Post by Conor Robinson
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
Post by Conor Robinson
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
Post by Conor Robinson
Post by Josh Menke
--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,
security?
Post by Conor Robinson
Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make your
job
Post by Josh Menke
Post by Conor Robinson
Post by Josh Menke
easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
Post by Josh Menke
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Post by Josh Menke
Post by Conor Robinson
Using Tomcat but need to do more? Need to support web services,
security?
Post by Josh Menke
Post by Conor Robinson
Get stuff done quickly with pre-integrated technology to make your job
easier.
Post by Conor Robinson
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
Post by Josh Menke
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Post by Conor Robinson
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
-------------------------------------------------------------------------
Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,
security?
Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
Post by Josh Menke
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
josh-***@public.gmane.org
poulerik
2007-02-01 01:29:59 UTC
Permalink
Post by Conor Robinson
If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.
I dont have any problem, as such, but id remember one here in the list
who had, In an erlier thread
where somone who where talking about 10G of data. And the 64-bit
sollution was not mentionet then.
so i would just bring it up to se if there is any experience.

And the fact that i have an AMD-64-machine, as my home computer makes me
curius.
Maybe Steffen will lay out the sorce package 2.0 for debian for download.
Then i will try too build a binery, for the fun of it.

Poul-Erik Andreasen






-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
poulerik
2007-02-01 13:26:42 UTC
Permalink
Post by poulerik
Post by Conor Robinson
If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.
I dont have any problem, as such, but id remember one here in the list
who had, In an erlier thread
where somone who where talking about 10G of data. And the 64-bit
sollution was not mentionet then.
so i would just bring it up to se if there is any experience.
And the fact that i have an AMD-64-machine, as my home computer makes me
curius.
Maybe Steffen will lay out the sorce package 2.0 for debian for download.
Then i will try too build a binery, for the fun of it.
I can now see that there are rules file in all of the available
sourcepackages.
I will try to make some debian 64 binaries



Poul-Erik Andreasen

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
poulerik
2007-02-05 17:37:32 UTC
Permalink
Post by poulerik
Post by poulerik
Post by Conor Robinson
If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.
I dont have any problem, as such, but id remember one here in the list
who had, In an erlier thread
where somone who where talking about 10G of data. And the 64-bit
sollution was not mentionet then.
so i would just bring it up to se if there is any experience.
And the fact that i have an AMD-64-machine, as my home computer makes me
curius.
Maybe Steffen will lay out the sorce package 2.0 for debian for download.
Then i will try too build a binery, for the fun of it.
I can now see that there are rules file in all of the available
sourcepackages.
I will try to make some debian 64 binaries
Poul-Erik Andreasen
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
If anyone should have any doubth, fann compile an work in an AMD-64
inviroment (except for the pyfann part), thou
when compiling for building debian package it comes with lot of warnings
like the following.

fann_io.c:177: warning: format '%u' expects type 'unsigned int', but
argument 3 has type 'long int'fann_io.c:239: warning: format '%u'
expects type 'unsigned int', but argument 3 has type 'long int'

fann_io.c:301: warning: format '%u' expects type 'unsigned int', but
argument 3 has type 'long int'

fann_io.c: In function 'fann_create_from_fd':

fann_io.c:387: warning: dereferencing type-punned pointer will break
strict-aliasing rules

fann_io.c:388: warning: dereferencing type-punned pointer will break
strict-aliasing rules

fann_io.c:389: warning: dereferencing type-punned pointer will break
strict-aliasing rules

fann_io.c:506: warning: dereferencing type-punned pointer will break
strict-aliasing rules


these warning do not occur when making a normal make-compile.

The debian package do however install an work (only simple test)

Poul-Erik Andreasen



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Continue reading on narkive:
Loading...