Searching and finding an element of vector<struct> for updating and deleting a specific item

Hello, I would like to ask you a question about finding an index of a record in a structure that is in a vector. The searching should be according to some parameters.


There is a structure together with a vector<structure>:

struct SPerson {
string name;
string address;
string birthcertificatenumber;
};
std::vector<SPerson> People;

Lets say that there are already some records:

Name: John, address: street 1, Birth certficate number: 801206/2393
Name: Jean, address: street 2, Birth certficate number: 701206/2393
Name: Matt, address: street 3, Birth certficate number: 601206/2393
Name: Peter, address: street 4, Birth certficate number: 501206/2393
Name: Pierre, address: street 5, Birth certficate number: 901206/2393
Name: Ian,address: street 6, Birth certficate number: 861206/2393
Name: Matrix, address: street 7, Birth certficate number: 781208/2393



Now I would like to delete and update a specific record. The first thing I have to do is to find the specific item.
The criteria of searching should be according to:
1) name and address
DeletePerson( "Jean", "street 2" ));
UpdateAddressOfPerson("Jean", "street 2", "new street"));
or
2) birth certificate number
DeletePerson ( "701206/2393" ) );
UpdateAddressOfPerson("701206/2393", "new street"));

Lets say that there are thousands and thousands records and I need to find it quicky - non-linearly for the purpose of deleting and updating.
I can neither use map nor set, it is a requirement of the task that I have to solve.
I would like to mention that it should work insensitevely, so it should ignore the size of letters, for example: John is jOhn is JoHn etc. Street 1 ..sTrEet 1...


Examples that I have now which works, but are slow:

Updating function:

bool UpdatePerson(const string & name, const string & address, const string & newaddress)
{
auto it = find_if(begin(People), end(People), [=] (SPerson const& f) {
return (strcasecmp(name.c_str(), f.name.c_str()) == 0) and (strcasecmp(address.c_str(), f.address.c_str()) == 0);
});
bool found = (it != end(People));
if (found == true)
{
it->address = newaddress;
return true;
}
return false;
}


Deleting function:

bool DeletePerson(const string & name, const string & address)
{
auto it = find_if(begin(People), end(People), [=] (SPerson const& f) {
return ((strcasecmp(name.c_str(), f.name.c_str()) == 0) and (strcasecmp(address.c_str(), f.address.c_str()) == 0));
});
bool found = (it != end(People));
if (found == true)
{
People.erase(it);
items--;
return true;
}
return false;

Last edited on
It can't be done in sublinear time using a vector. The requirement of a vector that all elements must be stored contiguously means that any auxiliary search structures must be updated (which takes O(n)) whenever an element is removed.
If you relax that requirement and store the SPersons non-contiguously (by newing each one independently), the process of search and removal can be done in O(log n) (worst case) using binary search structures, or in O(1) (average case) using hash tables.
You might be able to speed up the deletion process by swapping the element to delete with the last element and remove the last one with pop_back()
Factor out the code that finds a person by name and address.
closed account (48T7M4Gy)
It might be cheating but why not use a <map> which has a key value mapped to the corresponding vector index.
you can actually reserve a bunch of space in the vector and hash the data yourself into the vector by computing the index off the data and having a deleted flag for dead data. It can give you the best of hash map and vector in one, but it can also be a huge effort depending on the data at hand, and it only works well off 1 key data item, you can't do it (well, you can reduce the search considerably but you can't get O(1)) for searches of multiple columns.

Delete flag data works for slowly changing data. If you delete and add nonstop, it gets horrible fast. Once in a while, in off hours, you can run a cleanup to cull the deleted items and re-map everything for the next day to maintain efficiency.

you can maintain each column in a smaller data structure designed to search for that data item, and have pointers to connect the structures to each other to get the whole record.... this is like making your own database though.

-----------
so what I am saying is if you made some objects...
class address
{
int bc_index;
int name index;
string address;
}
and vectors of those...
you can hash address and produce an index into your address vector to store this.
Then if you look up address, that costs O(1). And it gives you the rest of the record, just grab from your BC number vector at the given index, and the name vector at the given index.

This is obviously a ton of work to get right. High speed code takes a lot of time, design, and work -- which is why so many programs are so slow. This is a really rough starting point. You could use 4 hash maps and tie them together. The implementation can vary, this is just a c++ terms way of explaining the design if you want O(1) lookups. Its a deeper look at Kemort's comment.

Last edited on
Topic archived. No new replies allowed.