Various links to papers/articles interesting to read and experiments

- General resources
- Maths
- VGA
- 2D Graphics
- 3D Graphics
- Books
- Optimizations
- i386+ Optimizations thinking
- Frameworks and libs
- To be classified

- Fast and compact sin/Cos generator
- Optimized 3D math library in C
- Fast Log 2
- Fixed point maths
- Fixed point maths library
- Sub-pixel positioning (ie., using fixed point position for slow movements)
- sub-pixel drawing correction (good series of articles)

About knots:

About attractors & fractals:

- Fast Gaussian Blur Algorithm in C#
- A Fast Gaussian Blur implementation
- Circular & Radial Blur
- Real-Time Radial Blur
- Fatest Gaussian Blur
- Radial Blur & Rendering To A Texture
- Four Tricks for Fast Blurring in Software and Hardware

To study after I reimplement the technics I used in milk shake and blublue

- Play with light and dark using ray casting and visibility polygons
- How to implement 2D raycasting light effect in GLSL

- Quaternions (4 parts)

- Faster 3D Game Graphics by Not Drawing What Is Not Seen
- Real-Time 3D Clipping
- World Class 3D Clipping

Michael Abrash's Graphics Programming Black Book:

- Fog's Software Optimization Resources (C++ vector library, subroutines library such as memcpy, test with counters, cache misses, etc. )
- Double Blend Trick - how to scale 4 numbers, or lerp 8 numbers, in only 2 multiplies (interesting to use to lerp between RGBA components)
- radix sort for floats
- Radix Sort Redux for floatsù
- Fast cross fade using MMX (32 BPP)
- SSE Optimization case study (matrix-Vector multiply)
- General MMX Optimization technics
- MMX™ Technology Manuals and Application Notes - examples for 2D Graphics (fractal, sprite overlay), sound processing (echo)

- LOOP: use DEC/JNZ is better than LOOP on 386+
- unroll loops (MOV/ADD) rather than REP STOS (not true for 486) : advantage is not to reorganize data for string instructions... REP MOVS is still better however.
- jump tables
- fixed point (16:16, 8:8, 12:4) : advantage of 16:16 or 8:8 is to have integer part for free
- alignment (2 bytes for 16 bits, 4 bytes for 32 bits) : ALIGN 4 Buffer32 dd ? / ALIGN 2 Buffer16 dw ?
- avoid branching to avoid emptying the prefetch queue (true accross the 80x86 family)
- minimize memory addressing, prefer using registers