15 мая 2023 года "Исходники.РУ" отмечают своё 23-летие!
Поздравляем всех причастных и неравнодушных с этим событием!
И огромное спасибо всем, кто был и остаётся с нами все эти годы!

Главная Форум Журнал Wiki DRKB Discuz!ML Помощь проекту


CObArray sorting

jeffg@iqpuucp.aa.iqsc.com
Monday, February 12, 1996


     I am adding about 14,000 items to a CObArray. It has to be sorted, so 
     I am using the InsertAt() function to insert it correctly. When the 
     function starts, it adds about 100 items/second, but as it draws near 
     the end, it does 1 item/second. Potentially, if a client had over 
     100,000 items, this would slow down so much it could take days to 
     complete.
     
     What I am looking for is a way to sort the CObArray in a faster 
     manner, or alternatively, use something else that allows sorting, yet 
     is able to maintain its input speed.



David W. Gillett -- DGILLETT@expertedge.com
Tuesday, February 13, 1996

[Mini-digest: 13 responses - some sort of record, I bet ;-]

  Insertion sort is an O(n^2) algorithm, a poor choice for large data 
volumes.  It works well when data shows up in ones and twos to be 
added to an already-sorted list, but it works poorly for sorting an 
entire collection and isn't really very suitable for arrays.

  A better choice is to use something like QuickSort; the qsort() 
library routine should be able to work.  [The Help I have handy is 
for MFC 2.5, and indicates that the array is stored in a way which 
would permit this.  Presumably this constitutes a documented 
commitment for subsequent versions.]

  You'll need to write a comparison routine for qsort(), and note 
that the array entries are actually pointers.  Any decent C 
programming FAQ will include extensive coverage of qsort().
  Off the top of my head, I believe QuickSort is probably no worse 
than O(n log n).  For small data sets, it may be slower than 
insertion sort, but as the dataset size grows the time for QuickSort 
doesn't grow as quickly as it does for insertion sort -- and that's 
what you want.

Dave

-----From: Tim Philip 

>From your description I am going to conclude that InsertAt() is probably
an O(n) function.  Although I'm not certain about that.  I think that
is worst case.

By sorting every time you are processing 14,000 x 0(n).  If you add all
the members to the array and then do a simple BubbleSort (it is only 
O(n^2) and I have a function that does it for you) you will only be doing
14,000^2.  Wait a second, these are both the same.  Damn.  Well then,
try a MergeSort ( O(n*log n) ) or a QuickSort ( O(n*log n) ).  QuickSort
is considered faster.  Damn, really thought I had something there.

Ok, here is another idea.  Declare an array of space big enough to hold
your data using malloc (use realloc if you need to grow, etc) and use
qsort() and bsearch() from the stdlib.h library.

Or (I love this stuff), use a binary tree instead of an array.  This will
massively improve your performance.  You could add your 14,000 array
elements into a binary tree and then do an inorder traversal of the
tree and add the elements into a list.

When you start talking about 100,000 items I think you have a problem
no matter what though.  I think you are far better to go with my first
idea and use a QuickSort to sort the entire list in one go.  That way
you can warn the user ( "this may take a while ..." ) and do it all in
one fell swoop.

Sorry, I had originally thought I had a simple solution.

-------
Tim Philip, B.Sc.           Randco Software Corporation
Consultant                  2530 Hanover Avenue   
                            Saskatoon, SK          Phone: +1-306-343-3380
philip@cs.usask.ca          Canada   S7J 1G1         Fax: +1-306-343-3341

-----From: "David Ohlssen" 

You don't say win16 or win32 or NT.   Listbox (32) does something 
rather neat with this - trace the code!

I did a test on NT3.5 some time ago, running on an 8mb (believe me)
486SX33 with the first 32bit VC version (yes it compiled on the 8mb
machine).   I pumped 250 000 random strings of 100 bytes each into a
list box control with it's sort flag set and it was going faster than
1 per sec at the end. Furthermore, when I dragged the thumb fast
from end to end, the list box body (20 lines) stayed in sync, no
matter how fast I moved the mouse.   That sold me on win32 (and MFC)!  
MS didn't pay me to say this, but are welcome :).
David O. Ohlssen
I  _____________________________________  I
I  | Davido@Commerce.CTech.ac.ZA       |  I
I  |    __  ____________               |  I
I  |   /  \/ Cape Town  \  _           |  I
I  |__/     South Africa \/ \__________|  I
I_________________________________________I

-----From: jerasmussen@falconholm.dk

How are you doing the search for the correct position to insert the new
items?? By linear search??

If that's the case, I *strongly* recommend that you use binary search
which search time will only grow *very slowly* compared to that of linear
search. This is possible since the items already in the array are sorted.

If you don't know how to do a binary search, please let me know :-)

Jens-Erik Rasmussen, Falconholm International A/S, Denmark

-----From: Steini 

CObArray is not the right tool to use because the only way to search is
sequentially.
You should use binary tree and in-order algorithm ( which sorts the data
automaticly ).
However, if I had to sort so many items I would use a fast database like
Btrieve to avoid blowing the heap.

Steini
 stein@ismennt.is                steini@sjal.is
 Steingrimur Jonsson             Computer Scientist
 http://rvik.ismennt.is/~stein

-----From: mikeblas@interserv.com

You don't mention what version of VC++ you're using.  That makes a big 
difference because the memory manager changed between VC++ 2.x and VC++ 4.0 
pretty substantially.  Oh, well--if you don't care, then I don't care.

Where is all time you're spending?  Is it actually in CObArray::InsertAt(), 
or is it in your own code actually trying to find where you should do your 
next CObArray::InsertAt()?

Did you call CObArray::SetSize()?  When?  To what size?  What do you pass for 
the nGrowBy parameter?
     
>     What I am looking for is a way to sort the CObArray in a faster 
>     manner, or alternatively, use something else that allows sorting, yet 
>     is able to maintain its input speed.

It really depends on how you're using the array.  Are you keeping it sorted 
to avoid duplicates?  Maybe you should just add the elements as you get them 
and sort the array all at once--there are dozens of algorithm books which 
explain the pros and cons of different sorting techniques.

.B ekiM
--
TCHAR szDisc[] = _T("These words are my own; I do not speak for Microsoft.");

-----From: Niels Ull Jacobsen 

You really need to read up on data structures. And you have to decide
what you need:

Are all the items inserted at the same time, or are they inserted a
little by little.  Are items ever deleted? Can you tell (before
creating the data structure) how much data you'll need to fit in?
What are the acceptable memory overhead?

Do you just need to access the items sequentially?  Or do you have to
find item N? How fast? (constant time, logarithmic time, linear time)?
Or is it enough to be able to find and remove the first element?

You have several options:

If you know all the data "beforehand", just insert them into the
array unsorted and sort them using an efficient in-place sorting
algorithm like quicksort. BTW, inserting them will be *much* quicker
if you first set the size of the array. Or at least set GrowBy
to a large number.

If you need to maintain a sorted structure, inserting and deleting
items at various times, *and* you need to find items relatively fast,
you'll have to use a balanced tree of some sort (red-black tree, 2-3
tree or something like that).

If you just need to access item N fast, you should probably use a hash
table (a CMap).

If you only need to access the first item and you must be able to add
items relatively fast, use a heap.

If you know most of the items beforehand, and you just need to
traverse the data sequentially, use a plain list. Sort it with
mergesort when you've added most of the items. Updating the list won't
be as fast, though.

But go to the library and get a book on data structures.

--
Niels Ull Jacobsen, Kruger A/S

Everything stated herein is THE OFFICIAL POLICY of the entire Kruger
group and should be taken as legally binding in every respect. Pigs
will grow wings and fly.

-----From: Ken Freeman 

CObArray is probably the least effecient class you could use.  When you
insert an item, it has to move all the later items in memory.  CObList would
be better, but with 14,000 items even that would be slow.

I would probably look for a b-tree class; I think Rogue Wave sells one.
Alternatively, you could keep using CObArray, *but* 1) always add objects
to the end; and 2) when they are all added use qsort to sort the array.
I'm just guessing, but I think a b-tree would be faster.

Ken
-----From: jerry_albro@tcpgate.ibi.com

Have you tried using SetSize with a large initial value and a large
growby factor? This would reduce the number of allocations needed,
which are very expensive. If you have tried this and still no joy,
then I would try using a simple array (not Cobarray) of pointers, and
qsort(). If needed, the array could transferred to a CObArray and the
original deleted.

Jerry Albro

-----From: Brad Wilson 

An array is a horrible thing to sort.  Every time you insert something in
the array, all the items after that have to be moved.  You should use
CObList instead.

The problem there, of course, if that you need the random access usefulness
of an array, you're still kinda out of luck.  In this case, find some way
to sort the objects _before_ you add them to the array, then add them in
the correct order.  This should speed things up quite a bit.

Good luck!
Brad

--
class CBradWilson : public CWorldWatchProgrammingTeam {
  public:
    void GetInetAddr  ( CString& s ) { s = "bradw@exptech.com";      }
    void GetE164Addr  ( CString& s ) { s = "+1 (810) 620-9803";      }
    void GetURL       ( CString& s ) { s = "http://www.exptech.com"; }
    void GetDisclaimer( CString& s ) { s = "All I say is fact :-p";  }
};

//  QOTW:  "Music nowadays is merely the art of executing difficulties and in
//          the end that which is only difficult ceases to please."

-----From: "Graham Stalker-Wilde" 

why are you using an array?
Inserting into an array is never going to be a good idea (it's an O(N)
operation
since every insert has to shift up all the elements above it)
Use CObList, (and then copy the list into an array when you're through
if you really need an array) The STL's set<> collection, is probably even
more efficient,
but your best bet, if this is a one-off operation is to add them in any old
order then sort them once and for all.

- Graham

-----From: David Stidolph 

If you can, set the size of the array up front.  What is happening is =
that as you add items in the middle a new chunk of memory is allocated =
for the new array and all the old contents are copied to the new memory =
(saving space for the new item) and the old memory is freed.  This also =
has the effect of requiring enough memory for both arrays at the same =
time.

If you cannot set the size of the array you might want to switch to a =
linked list (CObList).

Lastly, I believe there is a limit on the number of CObArray items (at =
least there was in 2.x) - 32,767 items.  Does anyone know if this limit =
has been lifted?
------------------------------------------------------------------------
David Stidolph

Gee that bastard smells. No wonder they call him Pooh. --Chris Robin.

-----From: Mats Manhav 

Hi jeffg, 

Take a look at the documentation for CObArray::SetSize()
> ...
> For efficiency, call SetSize to set the size of the array before using it.

> This prevents the need to reallocate and copy the array each time an item
is added.
> ...


It seems like you should not use the InsertAt for making sorting since it
will 
force reallocation each time, and along with that a copy of the complete
array.
Personally I think I would make this in some kind of binary tree instead. 
Maybe there are some MFC code somewhere which handles binary trees.

Mats
--
==========================================================================
Mats Mеnhav (Mats Manhav for 7-bit people)
email:manhav@connectum.skurup.se   WWW: http://connectum.skurup.se/~manhav
FAX:  (int) 46 (0) 414 243 05      Phone: (int) 46 (0) 414 243 05         
==========================================================================





| Вернуться в корень Архива |