From: Antonio on
Dear all,

I have a dataset that contains individuals from different age brackets.

I want to create a program that select the biggest possible group of individuals that obeys certain optimisation rules. For example, I want to determine, from my dataset, the largest group possible that obey the following:

Target:
16-34 years old - 26%
35-54 years old - 30%
55+ years old - 15%
Male - 40%
Female - 60%

So I will have to exclude individuals from my dataset until I achieve the target above, whilst maximising the number of individuals left in the datatest,

Any ideas how to best do this optimisation?

Thank you very much in advance.
From: Matt J on
If you have the Optimization Toolbox, you could use bintprog()

http://www.mathworks.com/access/helpdesk/help/toolbox/optim/ug/bintprog.html
From: Antonio on
Hi,

Thank you very much for your reply.

Unfortunatelly, I only have the statistics toolbox. Is there any other way of doing it?

Many thanks,
Antonio


"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hoask4$pd3$1(a)fred.mathworks.com>...
> If you have the Optimization Toolbox, you could use bintprog()
>
> http://www.mathworks.com/access/helpdesk/help/toolbox/optim/ug/bintprog.html
From: Matt J on
"Antonio" <ribeiro.carvalho(a)gmail.com> wrote in message <hoati7$ce2$1(a)fred.mathworks.com>...
> Hi,
>
> Thank you very much for your reply.
>
> Unfortunatelly, I only have the statistics toolbox. Is there any other way of doing it?
==========

For the example you've given, a closed form solution is possible. If the example is too much of a simplification of what you're really working with, I don't know if this solution would be applicable to that.

You have 3 age groups, indexed i=1,2,3 and 2 genders indexed j=1,2

Your job is to choose unknown integer variables x(i,j) representing a number of people picked from your total population and belonging both to age group i and gender j. The x(i,j) are to be chosen to maximize

N=sum(x(:)) %Eq. 1

subject to the following bounds for all i,j

x(i,j)<=X(i,j), %Eqs. 2

where X(i,j) is the total number of people in age group i and gender j that exist in your total population.

From your 5 target percentages, you also obtain 5 linear equality constraints expressible in terms of the variables x(i,j) and N. Together with Eq. 1, this leads to 6 linear equalities in 7 unknowns. You can solve these 6 equalities to obtain a formula for the vector x(:) in terms of N. It will be of the form

x(:)=N*v %Eq. 3

where v is some vector. The solution to the problem is to choose the largest integer value of N, such that

(1) The vector x(:) as given by Eq. 3 is a vector of integers.

(2) The vector x(:) as given by Eq. 3 satisfies the bounds in Eqs. 2