Occlusion Culling; Optimization?

Forum

Forum
General C++ Programming
Occlusion Culling; Optimization?

Occlusion Culling; Optimization?

Pages: 12

I have just finished a section of a voxel engine that only renders visible faces of the cube, but it still runs at a unsatisfactory speed which I would like to change.

The rough code is:
Every Loop{
Check If Blocks To are Solid
Yes - Don't Render
No - Render
}

Is there a plausible way to optimize this?
Such as an array for every single block that contains the solid to non-solid blocks, and which faces to render; only updating the array when a block is changed.
Although this solution seemed VERY memory consuming, is there a better way?

Ogoyant (161)

What rendering API do you use?

Well, if we suppose that your voxel corresponds to one cubic unit of your rendering API (i.e., each dimension of your voxel is 1 unit in length), you could define solid voxels by specifying an array of 3 int values (integers are used, since you have a 3D voxel grid composed of equally-sized cubes composed of 1 unit each, so no floating point values are required) that will correspond to the number of units to be traversed in each axis (X, Y and Z), for the voxel position to be reached. So, (0,0,0) would create a solid voxel at the origin, and (1,0,0) would create a solid voxel one unit along the +X axis, and 0 units along the Y and Z.

Incidentally, if you consider your voxel's "pivot point" (local-coordinate-space origin) and position it properly, you can make this triple-int value correspond to the offset vector from the world origin required for rendering the voxel. So the three integer values would give you the translation transformation required to position your voxel in World-space (which actually is all the information you need for rendering the voxel, since your voxel grid doesn't require rotation and scale transformations).

So, based on all this, your solid voxel can be defined as a 1x1x1 cube, that is transformed to World coordinates using the three integer values specified. You can read a file containing a list of three-int sets/arrays, each of which defines a solid voxel in world-space. Like this:

//solid_voxels_array.txt

5 5 5
4 3 20
5 8 908
45 7 89
12 4 55

etc.

You write:

The rough code is:
Every Loop{
Check If Blocks To are Solid
Yes - Don't Render
No - Render
}

Your non-solid voxels should not be checked-for, since they are non-renderable to begin with. You should just read the list of three-int arrays and store it in memory, then render a 1x1x1 cube at each of the specified locations. As for renderable faces, enable backface-culling in the rendering API so that the polygons that face away from the camera are not rendered. This is a hardware-accelerated process and is very fast and memory efficient. Just specify the cubes in world-space, and the backface-culling process will be responsible for clipping anything non-visible before the rendering begins.

Also, make sure you use the hardware acceleration offered by your API (OpenGL, DirectX etc). In DirectX, use a static vertex buffer and index buffer in order to specify your re-usable solid voxel, and apply the world-space transformation for each solid voxel in your scene inside an .fx file in order to optimize speed (since the buffer can be static and not require updates, while the .fx code can calculate the world-view-projection transformations for each of your solid voxels).

This shouldn't be too unsatisfactory in performance, especially if you use integral offsets and not involve rotation and scale transformations. Still, it largely depends on your voxel resolution, which is the amount of voxels specified in your voxel box/scene.

Also, make sure you pre-calculate the normals for your solid voxel cube (which should also be specified in triangles -- so it would be 12 triangles rather than 6 quads). Normals are required for the lighting calculations (among other things), so precalculating them requires less runtime calculations (although the difference is trivial for a cube consisting of 12 triangular faces, but still).

Hope this helps somewhat,

Ogoyant

EDIT: Also, it's good to apply a frame-limiter so that you don't render more than 60 frames per second.

Last edited on

Pickle Gunner (139)

That's the problem, (I use Vsync) with SFML and OpenGL I can not obtain 50 fps with only a 50 by 60 by 70 cube chunk being rendered.

Why use Triangles over Quads?

Last edited on

Ogoyant (161)

If you use hardware acceleration features and structure your program so that you do not perform calculations for the non-solid voxels etc, I think you'd get a good framerate. Make sure to use buffer objects that can store data to GPU memory, and to define them as static (since your voxel does not have per-vertex animation).

Does SFML have back-face culling functionality? I haven't used it. If it uses OpenGL's settings, I think that back-face culling is on by default. If not, make sure to enable it, and then you don't have to worry about specifying faces to render, you specify whole objects and just before the rendering operation, the back-face culling process will omit faces whose normals point away from the camera.

Why use Triangles over Quads?

Any polygon with more than 3 sides is tesselated internally at render-time and converted to triangles because the GPU renders triangles only (it has partly to do with the fact that the vertices of a quad are not necessarilly coplanar -- that is , the quad is not necessarilly flat, even though it is supposed to be, which can affect the rendered result. While a triangle necessarilly is flat, which also means that the area of a triangle can be guaranteed to be lit accurately due to being definitely flat). I haven't used OpenGL for some time, but I just remembered that it actually has a GL_QUADS primitive type (although I've heard that it was going to be deprecated). DirectX supports triangles only and it became natural to me, so I didn't think of GL_QUADS at the time of my original post.

Even if triangulation is required for the GPU rendering process, the triangulation can be conducted internally at run-time by OpenGL, although using explicit triangles would make the run-time calculations less demanding than using quads.

Last edited on

BlackSheep (424)

I feel like there should be a way to pass single vertices to a geometry shader and have it draw the cubes for you.

What other types of culling do you use? You should only be passing data to the GPU if you absolutely have to. Using Frustum culling you can already avoid sending 3/4 of your viewable area (in terms of a circle of radius "viewDistance") to the GPU (assuming 90* FOV).

Ogoyant (161)

BlackSheep wrote:
I feel like there should be a way to pass single vertices to a geometry shader and have it draw the cubes for you.

Yes, I guess that would be computationally equivalent to my suggestion regarding the offset vector transform -- i.e. drawing world-space-transformed instances of the original cube using the offset vector specified by the int array I mentioned earlier. Since a world-space vertex would also constitute of 3 (XYZ) values, and an instance of the cube would be positioned and drawn at that area (probably using the vertex as the local-space origin for the cube).

Also -- if your voxels are calculated once (that is, they are not animated or change over time), and do not intend to move, rotate or scale them or their grid (which is not likely, but the performance difference of the suggestion still makes me mention it), you could get a considerable performance improvement by explicitly storing the voxels in their world-space positions (once they are calculated once), and then using a static buffer for representing them and rendering them (rather than calculating them by instantiating solid voxels in world-space on a per-frame basis). Still, that would be a limitation if you wanted to change the voxels in the grid over time, or even if you wanted to transform the grid itself by moving, rotating or scaling it.

EDIT: In case of wanting to move the grid as a whole, but not change the voxels in the grid itself, you could convert the voxels to a local coordinate system for the grid (perhaps with an origin at it's center, or one of it's corner vertices), and then store that in a static buffer. Then move, rotate and scale transformations to the whole grid itself could be applied without changing the contents of the buffer, but by applying transformations to it as a whole at run-time. Also, like before, the grid's solid voxels would be explicitly stated (albeit in object-space), and wouldn't have to be instantiated per-frame.

I think that my original post's suggestions should be sufficient to give both a good framerate, as well as allow the grid to dynamically change it's content without limitations imposed by the static buffer suggestions I mentioned later though.

Last edited on

NGen (689)

My two cents:

Whether or not you're using an octree vastly changes how easy it is to implement some of these features. Regardless of it, some general optimizations would include:

- Rendering a single face for a large group of voxels. i.e., if you have a flat wall made out of voxels, you don't need to render each individual voxel face for that wall. You can just draw a large quad and work your way toward the edges from there. This would effectively take your 50x60x70 voxel set and reduce it to 12 triangles for rendering. Quite an optimization. Having an octree system in place would greatly help this in implementation.

- Occlusion culling is just a matter of ray-casting, the implementation of which also varies based on whether or not you're using an octree for your voxels. It's not an easy task, but if you're interested in explicitly implementing ray-casting then I suggest opening a new thread, since this seems to have turned into a general make-this-go-faster thread.

- Frustum culling is a bit easier, since it doesn't require ray-casting. This is a widely-employed technique, and there are plenty of articles on it. If you're interested in it, you can find information here:
http://www.lighthouse3d.com/tutorials/view-frustum-culling/
But you can always Google for more.

Also, just want to point out that tessellating GL_QUADS isn't that much of a burden on the GPU. Every discussion I've seen always ends up with someone benchmarking and finding that performance between the two isn't that different. It's just a compatibility thing, since it was apparently dropped in OpenGL 3.1, but that's probably not a big deal to OP since they can still use versions older than 3.1 that still support it.

Also Also, if you haven't done this already, I recommend adding a wireframe rendering mode for debugging. You probably already have it, but this is just a friendly reminder.

Last edited on

Pickle Gunner (139)

I did end up remembering to include a wireframe rendering mode (However that runs slower than the solid rendering because its not optimized).

I have not got around to Frustum Culling because, all of the cubes have been view able on the screen at all times.

Is it better to use One Unit of 3d space for each voxel or say 10 units of 3d space for each voxel?

Rendering a single face for a large group of voxels.

How would you approach this if the voxels are textured are comprised of different materials?

Having an octree system in place would greatly help this in implementation.

Make my own? or use one such as :
http://nomis80.org/code/octree.html

For some functions that I call many times would it be more efficient to use
inline functions instead?

Thanks for all the help; I'm asking so many questions because this is slightly different than a console window :)

Last edited on

NGen (689)

Is it better to use One Unit of 3d space for each voxel or say 10 units of 3d space for each voxel?

In the end, the world-space units that are used to represent anything aren't significant if you're just changing the scale.

How would you approach this if the voxels are textured and different materials?

Grouping of voxels of the same textures and then using GL_REPEAT instead of GL_CLAMP for your texture mapping to allow repeats. When it comes to the materials, it depends. What type of materials are we dealing with?

Make my own? or use one such as :
http://nomis80.org/code/octree.html

In order to be able to do the processing that you need, I suggest making your own. If you can find one that allows you to do ray-casting and lets you delve into the chunks that make up the octree, then go ahead and use that. Otherwise, making your own would give you more insight to how everything works.

Even if you already know how octrees work, I suggest taking a look here:
http://www.shamusyoung.com/twentysidedtale/?p=15751
Shamus tends to make great posts explaining how things work.

For some functions that I call many times would it be more efficient to use inline functions instead?

It may be more efficient, but I would expect the speed gains to be minimal.

Additionally, this guy's building a voxel-based world:
http://www.sea-of-memes.com/
Go ahead and skim through those blog posts to see if there's anything worthwhile. I think he releases a lot of source code, too. His first entry seems to have a lot of things that have been mentioned here, so you can see some implementations first-hand.

Finally, what sort of hardware are you running on?

Last edited on

Pickle Gunner (139)

I'm running a XPS 8300, fairly newish.

For the record I'm essentially going for a engineering / entertainment purpose voxel engine. (If you want to call it minecraft-based; I'm trying to disassociate my self from that).

I was trying to only calculate blocks that modify themselves (plants, other interactive blocks) and blocks modified by the user.

NGen (689)

(If you want to call it minecraft-based; I'm trying to disassociate my self from that)

I've heard that voxels are heavily used in non-game related fields, so I try not to associate the two.

For the record I'm essentially going for a engineering / entertainment purpose voxel engine.

I was trying to only calculate blocks that modify themselves (plants, other interactive blocks) and blocks modified by the user.

In which case it doesn't seem that we'll need to go to the level of complexity that was being discussed by Ogoyant.

There's still the issue of:

When it comes to the materials, it depends. What type of materials are we dealing with?

To be honest, I suggest implementing the optimizations and worrying about the materials later. Unless the materials are crucial to the presentation, having an application that runs at a full 60FPS is preferable to one that runs at 20FPS and has material lighting. Once you have the optimizations in place, I would take a look at how rendering is happening and determine the best way to apply materials from there. You might end up having to implement them using framebuffers/shaders.

The last thing I'll mention is your use of glBegin/glEnd for rendering. How often are these two functions called?

Last edited on

Pickle Gunner (139)

I have heard/seen non-game voxel engines, however I was just looking for something to do and every program I have is useful but you, in real life, would never even consider using.

EveryLoop{

RenderFile.RenderWorld(All the view and translation Parameters)

}

RenderWorld(As listed above){

for()
for()
for()
renderblock(x,y,z);

}

In render block, the faces are drawn ( and unfortunately faces calculated too).
glBegin/end are used for ever block that is rendered; is that a performance issue?
should all the blocks be rendered withing glBegin/end?

All of the values are stored in an array.
(Which I was planning in the future to shift over to a Boost MultiArray for Customizable map sizes.)

The Idea of optimization (for me at least) is to increase speed at the potential cost of size, there are (within normal limitations) no problems with large executables.

Last edited on

NGen (689)

glBegin/end are used for ever block that is rendered; is that a performance issue?
should all the blocks be rendered withing glBegin/end?

... ****. That is a decent performance bottleneck. With every glBegin/glEnd tends to be a long wait cycle as that information is sent to the GPU for rendering. Instead, put the entire rendering of your voxels in its own glBegin/glEnd. Note that this will cause issues since you can't change textures in the middle of a glBegin/glEnd. Eventually, you may need to group voxels based on their texture and render each group with their own glBegin/glEnd.

Ogoyant (161)

Also, just want to point out that tessellating GL_QUADS isn't that much of a burden on the GPU. Every discussion I've seen always ends up with someone benchmarking and finding that performance between the two isn't that different. It's just a compatibility thing, since it was apparently dropped in OpenGL 3.1, but that's probably not a big deal to OP since they can still use versions older than 3.1 that still support it.

Interesting to know, thanks NGen. This is a very interesting thread, lots of great observations and discussion since my last log-in.

With every glBegin/glEnd tends to be a long wait cycle as that information is sent to the GPU for rendering. Instead, put the entire rendering of your voxels in its own glBegin/glEnd

That's quite important, and certainly a reason for reduced performance. Try to minimize the glBegin/glEnd calls, and to render as much of your content as possible in each respective call, since successive calls can impede performance significantly. Doing some pre-render grouping in dynamic arrays/vectors for your voxels prior to calling glBegin/glEnd could be worthwhile, since the performance cost for the grouping would be amended by the fewer glBegin/glEnd calls.

All of the values are stored in an array.

That's also an issue I think. In my opinion the values should be (eventually) stored in a buffer object to allow for a greater degree of hardware acceleration.

Telion (129)

GL_QUADS actually runs faster than GL_TRIANGLES in the case where it is appropriate: lots of non-strippable planar quads. By using GL_TRIANGLES you have to provide 6 vertices instead of 4. The main downside is that the tessellation order is unspecified, interfering with gourad shading. If you support GL_QUADS but need the shading, you should use GL_TRIANGLES by default and only switch to GL_QUADS if you can somehow determine how they tessellate on that machine.

Vertex buffer objects are about 3 times faster than immediate mode on my machine. If you have a high-end graphics card, you should have no problem rendering a 50x60x70 chunk in immediate mode (but remember to optimize for low-end cards). This indicates that you have another problem.

Combining adjacent faces is nontrivial. It isn't likely that you will see lots of huge planar regions of just one texture. Filling an arbitrary tiled shape with as few rectangles as possible takes a lot of effort and needs to be done again every time that part of the world changes.

You probably don't want to do this Ogoyant's way. That method might be appropriate for true voxel simulations, but you're dealing with a special case. You have a large number of static, axis-aligned, uniformly sized voxels, and you need to render them appropriately to get acceptable frame rates.

In your case, it is also possible to do back face culling manually. If a chunk is at a diagonal from the camera on all 3 axes, you only have to render the 3 front faces. However, this means that the 6 faces are each stored in separate vertex buffers. The actual processing time to determine which faces to draw should be low, but the loss of batching on the GPU might hurt you.

Most importantly, try to process your voxels as rarely as possible. If your world doesn't change and is finite, you can process the entire world at load time and store it all in vertex buffers. At each frame, all you need to do is draw the buffers.

Pickle Gunner (139)

Wait, there is no way that this:
for(EverySingleTypeOfCube)
glBegin
for(x)
for(y)
for(z)
if(cube == cubetobedrawn)
draw cube
glEnd

Vs

for(x)
for(y)
for(z)
glBegin
drawcube
glEnd

Is glBegin / end that big of a speed reducer?

Should I literally (or with another method) have something like a data structure for every single voxel like:

struct{
bool LeftFace;
bool RightFace;

bool etc.

}world[X][Y][Z];

Sfml does have a back face culling...glCullFace( GL_BACK );

Last edited on

NGen (689)

If you have a high-end graphics card, you should have no problem rendering a 50x60x70 chunk in immediate mode (but remember to optimize for low-end cards). This indicates that you have another problem.

Your statement seemed a bit weird to me (jumping from topic to topic), so this is for you to confirm and for everyone else to understand: Your point was that if OP has a powerful graphics card, then the issue isn't in the rendering. I was trying to determine if that was the issue by asking for his machine, but the XPS 8300 has a wide range of graphics cards, so I decided to just leave it there instead of guiding them through the process of figuring out what they have.

Combining adjacent faces is nontrivial. It isn't likely that you will see lots of huge planar regions of just one texture. Filling an arbitrary tiled shape with as few rectangles as possible takes a lot of effort and needs to be done again every time that part of the world changes.

I agree that it's non-trivial. But for this use-case the performance gains would be worth it, in contrast to your statements;

It doesn't seem like the use of the application is going to end in some extremely complex and esoteric geometry. From the sounds of it, he might end up with the bottom and sides of the voxel chunk being completely flat, which would definitely benefit from combining adjacent faces. Differences in texture may make the ability to combine less common, but I don't think it would make it uncommon.

Yes the calculation needs to be re-done with every change, but not on the entire octree, and it's not like it's going to be done every frame. The overhead will be negligible.

Is glBegin / end that big of a speed reducer?

Yes. How much of an impedance it causes depending on how you're using it, but there are plenty of articles out there regarding the overhead of rendering using glBegin/glEnd (also known as immediate-mode rendering).
http://stackoverflow.com/questions/6733934/what-does-immediate-mode-mean-in-opengl

In our discussion, I think we're doing more harm than good seeing how we all continue to throw our various suggestions at Pickle Gunner and then discuss each others' methods. Instead, can we all agree to come up with a top 3 and discuss from there?

Pickle, what's your programming and OpenGL experience?

Last edited on

Pickle Gunner (139)

OpenGL is fairly new to me, I have not worked with it for very long; other than that I am good with C++, to the extent that this and other projects would require.

Sorry if I have to look up some (ok, most) of the OpenGL terms your using, however the concepts are understandable.

NVIDIA GeForce GT 545 Graphics card, by the way.

What would be the best way to go here in just a basic pseudocode outline?
Please don't "Filter" the difficulty, I got time to waste and things to learn.

Last edited on

Ogoyant (161)

Telion wrote:
GL_QUADS actually runs faster than GL_TRIANGLES in the case where it is appropriate: lots of non-strippable planar quads. By using GL_TRIANGLES you have to provide 6 vertices instead of 4. The main downside is that the tessellation order is unspecified, interfering with gourad shading. If you support GL_QUADS but need the shading, you should use GL_TRIANGLES by default and only switch to GL_QUADS if you can somehow determine how they tessellate on that machine.

Telion wrote:

Good point, for non strippable quads you would indeed have to specify less vertices, even though the quad would have to be tesselated internally prior to rendering it.

NGen wrote:
In our discussion, I think we're doing more harm than good seeing how we all continue to throw our various suggestions at Pickle Gunner and then discuss each others' methods. Instead, can we all agree to come up with a top 3 and discuss from there?

I agree. For my part, I would prioritize the use of vertex buffers rather than arrays -- and also point out the importance of reducing glBegin/glEnd calls.

EDIT:

In pseudocode, I'd say that your program should:

-open the file containing your voxel data

-parse the content and store the data in a temporary array or vector

-analyze your read data and determine subsets of voxels that share attributes (as far as materials and textures are concerned). Do not copy these in a new array or reorganize the elements within the array, just use secondary arrays of indices for determining that. That is:

std::vector<voxelData> myVoxelVector;

//..
//initialize vector with voxel data read from file
//..

//temporary arrays of indices, each one representing a type of voxel that can be rendered in a single glBegin/glEnd call
std::vector<int> myVoxelIndexArray1; 
std::vector<int> myVoxelIndexArray2;
//etc..

for(size_t i;i<myVoxelVector.size();i++)
{
	//determine content variants and store the indices
	//for each voxel of similar attributes in it's
	//designated (temporary) array of indices
	
	if(/*voxel type is type_1*/)
	{
		myVoxelIndexArray1.push_back(i); //store the index in myVoxelIndexArray1
	}
	
	else if(/*voxel type is type_2*/)
	{
		myVoxelIndexArray2.push_back(i); //store the index in myVoxelIndexArray2
	}
	
	//etc..
}

- At this point, you can define buffer objects and store your data in those, then delete the vectors and arrays created for reading through the file content, and use the buffers for your rendering. Also, since your voxels are organized in types, you can batch-render those in a single glBegin/glEnd call without changing your buffers, textures, materials, or other settings.

(using dynamically allocated arrays would be better than using STL vectors in some cases, since vectors allocate additional memory for prospective use, while your file's content is probably not going to change after loading (so by the time you parse the file you know the exact number of elements you need, and don't have to sequentially push_back() but you can simply allocate a dynamic array of that many elements to store the same content).

Also, in case your solid voxels change per-frame (i.e. you have an animated voxel grid content), the above would be very computationally intensive and it would be best to simply use a dynamic buffer instead in the first place. However, from what I understand so far, you do not have animation in the solid voxels, so in that case, you could certainly benefit from batch-rendering voxels of the same attributes using a single buffer and glBegin/glEnd call.

Last edited on

closed account (zb0S216C)

Is this any good: http://sites.google.com/site/letsmakeavoxelengine/home/chunk-optimizations ?

_Wazzak

Last edited on

Pages: 12