Giovanni's blog

Friday, May 15, 2026

Announcing Museum OS

It's best not to judge a book by its cover. Museum OS is a concept operating system that runs on a 128-bit kernel, because it might be useful one day*. It will feature other state of the art technologies that are to be determined. It is also a timeless operating system, because it operates on general relativity. Therefore, it does not use network time, but rather, gravitational time dilation, or rather, the difference of elapsed time between two events.

*For example, the memory limit for address space in a 64-bit system is 16 million Terabytes (TB). The first person to find a need for 16 million TB and 1 byte will be the first person to get job offers and government grants to build a 128 bit memory address. Consider qubits, which can in quantum computing process far more data than a conventional computer. Would such a system ever need to utilize a traditional Von-Neumann architecture (or any other architecture that accesses a memory system, even if it is not the most efficient way of solving a problem- e.g. it might be the only way), or would the memory system not require 2^64+ TB?

Wednesday, April 22, 2026

You can't fool a microbiologist, Linux

Imagine if mainline linux were a species. Everyone who used it had to upgrade their systems. If they were a procaryote, they'd need to upgrade to a Eukaryote. This is what a version history would look like:

1991 We added a nucleus (MMU). Since 0.01.

1992 64 bit support address space added on DEC Alpha. (Larger genome supported!)

1998 uclinux developed. Loss of function (MMU) + Gain of function (performance/lower latency)

2007 nommu (uclinux) integrated (Mitochondria!) into 2.5.46

2024 RTLinux merged into mainline- an improvement, but RTOSes also offered in nommu systems.

2028 nommu scheduled for removal: (Procaryotes can be procaryotes again.)

203x 32-bit support removal. Sometime. (Divergent evolution).

More may be added as the ideas arrive to me.

Halobacteria is a photosynthetic archaea that feeds on salt and light. If photoheterotrophic bacteria can utilize light as an energy source, so can 32-bit and nommu linux machines.

Sunday, April 19, 2026

Ubuntu 7.04 Feisty Fawn in 2007 was my first Linux OS. In 2026, I'm using Anti-X. Here's why.

4-23-26: Update I've switched to Bodhi Linux on my Toshiba laptop.

Of Mice and Marsupials

Earlier this week, Linux 7.1 dropped 486 support, as they had previously considered, and while it was long discussed, I can't help but think of this diagram I drew a few months ago (I updated it slightly for linux 7.1):

32-bit is effectively another species, with little code-compatibility (according to others, apparently). It's the new 8-bit. Since my background was in microbiology, I tend to see code as a lot more mutually intelligible. But I can see why they want to drop 32 bit instructions. They're like sandbags on a hot air balloon. They need a little extra lift.

As chipmakers are getting back into memory production, EUV will be an opportunity to develop chips that embed memory in ways that are far more economical. Currently, DRAM is typically integrated on a different chiplet.

As I wrote in the Phoronix forum,

"My observation is that it would be far more economical for the foundry to integrate both DRAM and CPU on the same chip, but the leading edge's reservations are typically held by the highest bidders, which Nvidia recently overtook Apple in TSMC slots https://www.cnbc.com/2026/04/08/tsmc...ing-intel.html

"Nvidia has reserved the majority of TSMC’s leading CoWoS technology, and capacity is so heavily booked that TSMC has reportedly outsourced some steps to third-party companies that specialize in simpler parts of the process, such as ASE and Amkor."

"NVIDIA Alone Has TSMC’s Advanced Packaging Lines Booked for Several Years Ahead, Leaving Little Room for Competitors"

That really leaves only a few foundries that have already received the 1.4nm EUV machines, such as Intel, Samsung, and Rapidus to run these experiments. It doesn't seem like Intel is interested in testing my very expensive experimental idea, but they own all the IP to make it happen (Quark, ARM architectural license, eDRAM, HighNA EUV).

Curiously, Nvidia also owns a 386SX/IP license, called the M6117C https://www.nvidia.com/en-us/drivers/uli-m6117c/, but it's unclear if they'd be willing to manufacture one at 1.4nm with 16MB HBM3e RAM. They also have an ARM license, so really it's not an issue whether they pick one or the other. I like to think of my idea as an expensive CERN experiment- it's like particle physics, and very little immediate practicality."

1993- Pentium P54C was on 800nm

2011- Pentium Claremont was on 32nm. Intel was testing out its newest and most expensive lithography machine to produce a 17 year old chip

2026- Intel has began piloting a 1.4nm EUV machine from ASML, but no information is available whether they ever repeated the 2011-like test on 1.4nm (or really, any other nodes).

If you insist on 64-bit, then start with the Pentium IV Northwood Prescott

The first 64-bit Pentium was the Prescot 2M in 2003:

"The Prescott Pentium 4 contains 125 million transistors and has a die area of 112 mm².^[33]^[34] It was fabricated in a 90 nm process with seven levels of copper interconnect.^[34] The process has features such as strained silicon transistors and low-κ carbon-doped silicon oxide (CDO) dielectric, which is also known as organosilicate glass (OSG).^[34] The Prescott was first fabricated at the D1C development fab and was later moved to F11X production fab.^[34]"

"The only advantage the 3.73 GHz Pentium 4 Extreme Edition had over the 3.46 GHz Pentium 4 Extreme Edition was the ability to run 64-bit applications since all Gallatin-based Pentium 4 Extreme Edition processors lacked the Intel 64 (then known as EM64T) instruction set."

Many Linux distributions such as Red Hat Linux 10 no longer support earlier versions of x86_64 & v2:

"Red Hat will upgrade the instruction set architecture (ISA) baseline to the x86-64-v3 microarchitecture level in RHEL 10 and x86-64-v1 and x86-64-v2 x86-64 microarchitecture level of CPUs will be marked deprecated in RHEL 8 and RHEL 9 and unsupported in RHEL 10."

https://developers.redhat.com/articles/2024/01/02/exploring-x86-64-v3-red-hat-enterprise-linux-10.

"Compatibility impact

The x86-64-v3 level has been implemented first in Intel’s Haswell CPU generation (2013). AMD implemented x86-64-v3 support with the Excavator microarchitecture (2015). Intel’s Atom product line added x86-64-v3 support with the Gracemont microarchitecture (2021), but Intel has continued to release Atom CPUs without AVX support after that (Parker Ridge in 2022, and an Elkhart Lake variant in 2023)."

Why this has anything to do with mainline linux maintenance has to do with the ever increasing system requirements of computers that limit compatibility with running a modern OS. Sure, the 486 got nearly 37 years of support. But for newer systems, they might not even get 5 years of support (just look at Windows 11 support for first generation Ryzen- not that anyone would want to use Windows 10 on Zen 1). And for a lot of users, having a chip with an advanced instruction set is only needed if there is some obscure security benefit (which, admittedly, is always easy to claim, but not always needed for less private activities)

Mark Weiser wrote in his 1991 "The Computer for the 21st Century" essay for Scientific American Ubicomp Paper,

"Jim Morris of Carnegie-Mellon University has proposed an appealing general method for approaching these issues: build computer systems to have the same privacy safeguards as the real world, but no more, so that ethical conventions will apply regardless of setting. In the physical world, for example, burglars can break through a locked door, but they leave evidence in doing so. Computers built according to Morris's rule would not attempt to be utterly proof against cracker, but they would be impossible to enter without leaving the digital equivalent of fingerprints."

Perhaps so much effort has gone into preventing security breaches that such an effort might appear to be naive and impractical or simply counterintuitive. There are, systems that do employ some form of eidetic memory. However, I think such a system might run into caching issues, and run out of data fairly quickly. I also do not think it's a foolproof way to catch an intruder, although Bitcoin is able to track the visible part of the transactions- the ledgers, but not always the recipients or shell companies.

Thus designing simple systems today should be fairly lightweight on a smaller node- a chip designed to rely on public wifi, mesh networking, or citizen band radio (CB) is obviously less secure, but not synonymous with lightweight systems. An experimental processor wouldn't be worth the time of an expensive foundry without a cryptoprocessor, or a way to encrypt the home folder (e.g. ZFS), that I understand. It was not so long ago that home telephone lines also depended on party lines. Thus, while it may seem like requesting a simple, unencrypted or lightly secured chip technology from the 1990s might sound impractical in today's age, the amount of individuality a personal computer today offers is far greater ever since the Xerox Alto was designed to be for a single person, compared to the time-sharing systems in the preceding decade.

When I drew these system on a chip diagram in 2021, I didn't really put much thought into the video ram, but one thing is still constant- 6uA/MHz is a solar powerable CPU:

And this chip idea should put things into more context:

Sure, one could integrate an i915 graphics chipset with a Pentium IV Extreme Edition, but a certain food critic might immediately pooh pooh anything slightly stale because it doesn't have AVX-512 or beyond.

As I wrote in my 2025 paper, a Pentium IV might utilize 55 million transistors (the non 2M version):

Compared to the amount of transistors 16MB, 64MB, or 512MB of DRAM might utilize, (even without capacitors), 55 million (and 125m on the EE) is a drop in the bucket.

relentless-innovation-brings-euv-to-dram-manufacturing_1

Whether or not 2D DRAM will improve yields and lower costs enough to make this a more trivial question remains to be seen, but for the time being the most common customers are not people like you and I:

If these product ideas are ever found valuable, they are never likely to admit it to me. But I am open to consulting work, and I prefer full-time imagineering, an occupation first coined by Alcoa in the 1940s.

Edit: I was going to write a supplement to this on MICE (money, ideology, compromise/coercion, and ego), but I wasn't sure if it had much relevance to this post. I suppose in a way, it could prevent or delay certain types of development, but it's not immediately clear how technology projects and ideas get stalled. Speculating on that is only for the most pessimistic and paranoid, which I am not (although if I really wanted to search deeper for answers, I'd probably find more conclusive evidence, which no one wants to be found with).

Side note:

In elementary school, I had a teacher who used to ask questions that would get few to no responses, kind of like "Bueller, Bueller?"

She'd mention that we were the "Peanut gallery." Back then, I imagined it was a silent audience like the Mr. Planters peanuts, who looked like people but were actually mute (like mannequins in Home Alone). But recently, I learned it's the opposite:

"Peanut gallery" is slang for a group of people offering unwanted, uninformed, or heckling criticism. It commonly refers to spectators, social media commenters, or observers who provide disruptive or sarcastic feedback. The term implies the commentary is insignificant, petty, or from an audience with lesser understanding.

"Usage: Often used in the phrase "no remarks from the peanut gallery" to silence noisy critics or irrelevant advice.

The funny thing is, the phrase can be flipped upside down, by adding a question mark:

"no remarks from the peanut gallery?" Maybe that's how she phrased it.

In this case, I think the lack of comments, as many blogs are, makes this a journal with one reader. So if you do enjoy anything I wrote here, or have any comments, I would be happy to hear from you. Your discarded peanut shells are welcome too. It's good to break out of your shell anyways.

Thursday, April 2, 2026

20 years ago, the ARM996HS was released. Where is it now?

At a time when mobile devices were becoming even more mainstream, lowering power consumption became a key objective. However, as the Android and iPhone market in 2007 shortly revealed, power consumption took a distant third in priority when it came to having a competitive product. Performance remained the number one improvement, allowing an ever increasing amount of desktop-level apps on a smartphone, as the Symbian era came to a close. A couple years ago, in adding to my research collection of processors that were unique or rare, I stumbled upon the ARM996HS.

The ARM996HS: What was it?

It is a "clockless" processor in that it is not a synchronous process, but its 100MHz speed may refer to an average speed. Its benefit was offering 1/3rd power consumption of a similar processor, on just 90,000 logic cells. Other processors had been designed to accomplish this, albeit with far more circuitry- the 386SLC, which had around 800,000 transistors (compared to the original 386's 275,000, although comparing a 386 to an ARM was never really a fair comparison), and the AMD LX700, which was also a static 486 Geode processor produce by National Semiconductor (at the time). Edit: The 386SLC was not actually static, the 80386EX was. It is a static processor used in exotic environments like space. The IBM 386SLC did introduce power management features & 8KB of cache, however.) I am curious whether TI has a 486 "license," considering its patent expired more than 17 years ago.

It had appeared in the industry news at the time, but quietly disappeared. Reasons given range from performance to licensing (multiple companies helped develop it), to soft errors/bugs, and economic- smartphones might not have used it, and it wasn't worth the R&D costs if the market was moving towards higher performance chips.

"The ARM996HS current peaks are reduced by a factor 2.5 to 40 percent of the ARM968E-S, the data showed. The ARM996HS has a gate count of 89,000 compared with the 88,000 of the ARM968E-S. Both are based on the ARM9 processor and the ARMv5Te instruction set.

The ARM996HS performance varies between about 50 percent of a 100-MHz clocked ARM968E-S at 1.0 volts and a temperature of 125 degrees centigrade and about 75 percent of the performance at nominal voltage of 1.2 volts and 25 degrees centigrade."

It's also possible it became integrated into smartphones as a co-processor (e.g. Intel Management Engine), but not something the user would have access to.

I was able to find a few presentations that had been distributed at the time.

Link 1

Link 2

Link 3

"Today, hundreds of millions of asynchronous circuits are produced every year,
and many of us may use it on a daily basis without being aware of it. As an example,
asynchronous circuits designed using Handshake Solutions' Timeless Design
Environment (TiDE) may be found in the vast majority of electronic (biometric) passports,
in in-vehicle networks like CAN and LIN, in MEMS-based sensors such as for measuring
tire pressure, in access-control systems, and in Near Field Communication devices such
as Nokia's 6131 NFC phone."

A couple papers were published on it:

ARM996HS: The first licensable, clockless 32-bit processor core
April 2007

Richard York

IEEE Micro 27(2):58 - 68DOI:10.1109/MM.2007.28

Architectural Design Issues in a Clockless 32-Bit Processor Using an Asynchronous HDL

Myeong-Hoon Oh, Young Woo Kim, Sanghoon Kwak, Chi-Hoon Shin, Sung-Nam Kim
First published: 01 June 2013
https://doi.org/10.4218/etrij.13.0112.0598

https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.13.0112.0598

Was it a practical processor? Maybe it just hadn't found its niche for the consumer-facing market.

Edit 2: In 2025, a Rochester Institute of Technology student designed a 4-staged asynchronous RISC processor for his Masters Thesis.

Thursday, January 15, 2026

The NM10 Chipset, Backporting to the Pentium Era, and Makimoto's Wave

Today's post (I actually began writing this post on a Wednesday night at 9:51pm) is going to branch out from central processing units and cover two highly interrelated integration steps from the 2010 era, just before the release of the Intel Graphics Technology. A couple years ago, in March 2024, I did a multi distro test on my Sylvania netbook from 2011, an Intel Atom N450 with 1.66Ghz, 1GB of RAM, NM10 chipset, and an 8GB SATA based SSD card. This era was defined as a transition between the Northbridge based video graphics controllers (after the AGP based i740 was integrated into the Intel 810 in 1999) and the Intel HD Graphics, which has been a series since 2010 (along with other products such as the Arc). The most common that I found in netbooks and entry-level PCs at the time were the GMA 915, 950 and the GMA 3150, the latter two which had DirectX 9 support. I had briefly owned the Eee PC 701 4G for a few years shortly after launch, when it cost $200 for the 4GB model, and it included a Celeron M 630MHz with the Intel GMA 900.

https://en.wikipedia.org/wiki/Intel_Graphics_Technology#History:

Before the introduction of Intel HD Graphics, Intel integrated graphics were built into the motherboard's northbridge, as part of the Intel's Hub Architecture. They were known as Intel Extreme Graphics and Intel GMA. As part of the Platform Controller Hub (PCH) design, the northbridge was eliminated and graphics processing was moved to the same die as the central processing unit (CPU).^{[citation needed]}
The previous Intel integrated graphics solution, Intel GMA, had a reputation of lacking performance and features, and therefore was not considered to be a good choice for more demanding graphics applications, such as 3D gaming.

https://en.wikipedia.org/wiki/Intel_GMA:

The GMA line of GPUs replaces the earlier Intel Extreme Graphics, and the Intel740 line, the latter of which was a discrete unit in the form of AGP and PCI cards with technology that evolved from companies Real3D and Lockheed Martin. Later, Intel integrated the i740 core into the Intel 810 northbridge.^[3]
The original architecture of GMA systems supported only a few functions in hardware, and relied on the host CPU to handle at least some of the graphics pipeline, further decreasing performance. However, with the introduction of Intel's 4th generation of GMA architecture (GMA X3000) in 2006, many of the functions are now built into the hardware, providing an increase in performance. The 4th generation of GMA combines fixed function capabilities with a threaded array of programmable executions units, providing advantages to both graphics and video performance. Many of the advantages of the new GMA architecture come from the ability to flexibly switch as needed between executing graphics-related tasks or video-related tasks. While GMA performance has been widely criticized in the past as being too slow for computer games, sometimes being derogatorily nicknamed Intel 'GMD' (Graphics Media Decelerator) and being essentially referred to as the world's first "graphics decelerator" since the low-performing S3 ViRGE, the latest GMA generation should ease many of those concerns for the casual gamer.

A few days ago, as I was reading a story on the latest industry news on HBM3e, the memory dense RAM by TSMC's N3E node:

With HBM3e, density is on the order of ~200 Mb of DRAM per mm², while at TSMC’s N3E node, 1 mm² of silicon can hold only ~38 Mb of SRAM.

While I had been thinking of ways to integrate 16MB system memory and a Pentium or ARM9 processor into a single die, preferably within 1mm^2 to consider the economic benefits to lowering manufacturing costs, one essential component I left out was the video graphics. If I had to be realistic in how a chipset would be designed, in retrospect, an integrated video chipset makes a lot of sense. And what would that entail? Before I get into that, I'll dive into the ISA card era, where I had built and replaced systems with video cards ranging from 1-2MB of RAM on the perpendicular slots for the 486 based Socket 3 and Socket 5 (of which I was more familiar with).

The ISA cards at the time were fairly large with exposed caps microcontrollers, and and wide (although not heavy like the current discrete flagship or even midrange GPUS) https://youtu.be/D3lcByH_CDI?t=608

One could upgrade the video either via PCI or ISA, but since there were only a couple PCI slots (not PCI-e), the unused ISA was better used for graphics than nothing, and and the PCI slots were best reserved for modems, ethernet, and ATSC tuners. Although I might have had an ISA-based modem.

I also worked on a couple other machines at the time- one was an Opteron Pentium II or III, and another was a 486, so it might have been the latter that I installed an ISA card to upgrade its poor or non-existent video memory.

In the Packard Bell 3540 case, the OEM integrated a 1MB Cirrus logic GD5430 on board, which meant the motherboard, and it didn't require a discrete card to be included in the purchase (Best Buy, 1996). Obviously, this was before the integrated GMA era and HD/Iris series, so this kind of custom, non-discrete but not on the same die type of PCB integration was not uncommon, since it was economical for the manufacturer and possibly even Intel and AMD who were not yet major graphics suppliers (although the latter would acquire ATI).

My first PCI-express based desktop PC was a Presler-based Dell XPS400 2.8GHz, with an ATI X300, a very lightweight, but discrete card in 2006- this later upgraded to a Nvidia 7600GS, briefly an 8800GTS w/320MB VRAM, but then I settled on the cheaper 512MB 9600GT with smaller bus width.

In the netbook sector, it would be a couple years before I would purchase the GMA 900-based Eee PC 701 4G. I remember it being slow, and in retrospect, 4 gigabytes of storage was incredibly small, since it was only 4x larger than my 1GB drive in 1996, and this was 2008 or so. But it wasn't slow because of the processor so much as the software, and at the time, there weren't many linux distros that had already removed much of the cruft that had already started to accumulate. And while youtube was accessible at the time, video playback was far smoother when downloading videos and playing them back with the only available decoders at the time (Even the Windows 95 installation CD included some video clips, and they were designed for a 512KB-1MB video card). All processors require a right-sized codec to play- so there was no point in trying to use the CPU to decode a MPEG-4/H.264, or heavily compressed vp8 (or especially H.265/AV1 codec). What it had was an MPEG-2 decoder, and wmv playback, which was better than what preceded it (MJPEG or something, and AVI if you had the space).

And it worked well- it could play HD video, though typically the netbooks' resolutions were limited to 1024x600 or 720p, therefore there was no point in trying to run the Celeron to decode a heavier 1080 file.

The GMA 3150 on the Sylvania N450 netbook continued this process node to a 45nm shrink, but crucially, took a big step into integrating the formerly Northbridge based video controller not just into the Platform Controller Hub, but into the same CPU die as the Atom N450 with the NM10 Family Express

https://en.wikipedia.org/wiki/Platform_Controller_Hub

From the Press Sheet:

"The Intel® Atom™ processor is based on Intel’s groundbreaking low-power Intel Atom
microarchitecture and manufactured on Intel’s 45nm High-k Metal Gate technology."
Low power chipset
o Intel® NM10 Express Chipset
 Integration and 45nm manufacturing enables significantly smaller overall package
size, improved performance, and lower power.
Integrated Graphics and Memory Controller: Integrated Intel® Graphics
Media Accelerator 3150 combined with the integrated memory controller
provides enhanced performance and system responsiveness.
Small Form Factor CPU Package: The new lead free2, halogen free3 Micro-Flip
Chip package is 70% smaller (22mm x 22mm) than a desktop CPU (37.5mm x
37.5mm), saving system board real estate in a much thinner and smaller industrial
design, enabling small entry-level desktop form factors.

NM10 Datasheet (p.31)

One of the advantages to integrated chipsets is shorter vias, lower energy losses due to electrons travelling across the chipset's former bridges and busses. Even in the on-board Cirrus Logic GD5430 case, 1MB of video RAM still needs to travel down the board and to the socket where the CPU is housed, even if it is not making a perpendicular turn from an ISA card's slot to the motherboard.

So it was quite an innovation to place the GPU as close as possible to the GPU. In order to compete with laptops that offered discrete graphics, Intel included as much as 384MB of shared RAM to be able to support games that had higher VRAM requirements.

And so while many leading chipmakers like AMD followed with their own integrated solutions (I am typing on an AMD Raphael graphics) the same technological innovation that helped improve performance, energy efficiency, and thermal management also could be transplanted to other sectors like the embedded microcontroller market: The PIC32MZ with External DRAM (DA) Starter Kit. (📷: Microchip)

https://www.hackster.io/news/the-first-ever-microcontroller-with-an-integrated-gpu-e83e29a7a952

AI Overview (Prompt: microcontrollers with integrated video)

Yes, many modern microcontrollers (MCUs) integrate video capabilities, often featuring dedicated graphics controllers, GPUs, and sometimes even built-in memory (DRAM) to handle display tasks like driving LCDs and rendering GUIs, with popular examples including Microchip's PIC32MZ DA, STMicroelectronics' STM32 series, and some powerful Cortex-A-based System-on-Chips (SoCs) like TI's Sitara, blurring the line towards application processors for rich graphical interfaces

In fact, Intel also integrates eDRAM, something I had wondered about but never really confirmed:

"Intel also offers higher-performance variants under the Iris, Iris Pro, and Iris Plus brands, introduced beginning in 2013. These versions include features such as increased execution units and, in some models, embedded memory (eDRAM)."
The 128 MB of eDRAM in the Iris Pro GT3e is in the same package as the CPU, but on a separate die manufactured in a different process. Intel refers to this as a Level 4 cache, available to both CPU and GPU, naming it Crystalwell. The Linux drm/i915 driver is aware and capable of using this eDRAM since kernel version 3.12.^[13]^[14]^[15]

I didn't find any mention of eDRAM in the Xe Graphics series, but Kaby Lake/Amber Lake/Coffee Lake showed as much as 128MB of eDRAM as recent as 2017.

Knowing these three things, I wonder why haven't more embedded system makers branched out into designing educational netbooks, reusing the integrated graphics for basic systems? Likewise, why hasn't Intel integrated 1MB of eDRAM for dedicated video, and made it similar to a Cirrus Logic or Tseng Labs ET4000? I am not sure what the power consumption of HBM3e is, but it is is composed of a density 24MB/mm^2 at or around 3nm TSMC’s N3E. Whether Intels 18A or 14A will attempt to integrate eDRAM is another question, as the Iris GT3e may not have used the same transistor node for its eDRAM that was co-packaged.

It would be interesting to see that 24MB allocated for embedded and netbook designs, so that 1mm^2 can fit both CPU, memory, and video. Because surely more than 1-2MB VRAM isn't needed for a 16MB system that can run Windows 95 and 98, the latter which had USB support...

I understand that the video accelerators were designed for the outputs of their era, in that they may have been optimized for a VGA cable, rather than a digital out like DSI or MiP. In fact, I don't know too much about how the 15-pin analog signals were designed for TVs and CRTS, rather than lower power LCD screens, other than their interfaces are somewhat obsolete for the use-case I am seeking to hybridize. Something old and something new.

I have previously estimated the transistor count of microprocessors in certain square dies, but I have spent less time examining the kerf width, which can be an important factor in determining the number of wafers needed to manufacture x amount of chips, along with the foundry's willingness to run a large batch and make many cut. Plus, they'd probably charge more for a smaller batch than one by Apple, who paid as little as $130 per 100m^2 (so that is my benchmark comparison). From my notes:

"1mm^2 chip might allow, with 10um-99um kerf width, up to 81 chips in a 100mm^2 wafer die space.

https://semiengineering.com/laser-ablation-dicing-revolutionizes-ultra-thin-wafer-saws-beyond-the-capability-of-blade-dicing/

10um = 0.01 millimeters. 100um - 0.10 millimeters.

More than 100um kerfs would result in less than eighty one 1mm^2 chips in 100mm^2." Potentially only 64.

On the plus side, each wafer could produce 58,880 Pentiums with 4MB SRAM & 512KB-1MB of Video, or 20MB of System RAM using HBM3e

A single 300mm wafer could produce 58,880 Pentiums with embedded 4MB embedded SRAM.

From Poe:

The number of microchips (dies) obtainable from a 300 mm wafer depends on the die size, the usable wafer area, edge losses, and yield. Below is a concise method plus worked examples and typical adjustments.

Method (stepwise)

Wafer area = π × (radius)^2 = π × (150 mm)^2 ≈ 70,685 mm².
Theoretical maximum dies = floor(wafer area / die area). This treats dies as perfectly packable rectangles/circles and ignores edge truncation and scribe streets.
Account for edge losses: multiply theoretical maximum by a packing/utilization factor. For rectangular dies a typical utilization is 78–92% (smaller dies pack more efficiently); for large dies utilization can drop to 50–70%.
Account for yield: multiply by die yield (probability a die is functional after manufacturing). Die yield depends on defect density and die area; common simple model is Y = exp(–D × A) where D = defect density per mm², A = die area mm². Real yield also includes process maturity and redundancy.
Quick examples (using typical utilization 85% before yield):

Very small die (1 mm²)
Theoretical = 70,685 / 1 ≈ 70,685 dies
After utilization ≈ 70,685 × 0.85 ≈ 60,082 dies
With yield (D = 0.02 defects/mm²): Y ≈ exp(–0.02×1)=0.980 → usable ≈ 60,082 × 0.98 ≈ 58,880
Small die (10 mm²)
Theoretical = 70,685 / 10 ≈ 7,068 dies
After utilization ≈ 7,068 × 0.85 ≈ 6,008 dies
With yield (D = 0.02): Y ≈ exp(–0.02×10)=0.818 → usable ≈ 6,008 × 0.818 ≈ 4,914
Medium die (100 mm²)
Theoretical = 70,685 / 100 ≈ 706 dies
After utilization ≈ 706 × 0.85 ≈ 600 dies
With yield (D = 0.02): Y ≈ exp(–0.02×100)=0.135 → usable ≈ 600 × 0.135 ≈ 81
Large die (500 mm²)
Theoretical = 70,685 / 500 ≈ 141 dies
After utilization ≈ 141 × 0.80 ≈ 113 dies (lower utilization for large)
With yield (D = 0.02): Y ≈ exp(–0.02×500)=3.7e-5 → usable ≈ 0 (practically very few good dies; semiconductor fabs use redundancy/reticle strategies)

Alternate quick rule-of-thumb (no yield):

Very small dies (≤10 mm²): ~50k–60k usable per 300 mm wafer.
Medium dies (~50–150 mm²): ~3k–1k usable.
Large dies (>300 mm²): a few hundred or fewer; yield often dominant.
How to get a precise number for a specific chip type

Use the exact die outline area (including scribe streets per die if reticle tiling requires it).
Calculate theoretical dies = floor(70,685 / die area).
Use actual die-per-wafer calculators (online tools account for wafer edge trimming and rectangular packing).
Apply measured utilization from your foundry (or use 75–90% depending on die shape).
Apply a yield model using measured defect density D for that process node.
Summary

Start with 70,685 mm² wafer area.
Theoretical dies = wafer area ÷ die area.
Real dies ≈ theoretical × packing/utilization factor (≈0.75–0.9) and then × yield (exp(–D×A) or measured yield).
Example concrete figures above show how die size and defect density rapidly reduce usable die counts.

The Atom N450 had 123,000,000 transistors, for the single core, and 176,000,000 for the dual core D410/D510.

The 32-bit Atom (N270) had around 47,000,000 transistors. What explains this dramatic jump in transistors from the 3.3 million Pentium (and the shortly thereafter Lakemont at 6.6 million)? Cache. But that doesn't seem to be the only story.

Atom (32-bit, large cache)	47,000,000	2008	Intel	45 nm	24 mm²	1,958,000
SPARC64 VII (64-bit, SIMD, large caches)	600,000,000	2008^[83]	Fujitsu	65 nm	445 mm²	1,348,000
Six-core Xeon 7400 (64-bit, SIMD, large caches)	1,900,000,000	2008	Intel	45 nm	503 mm²	3,777,000
Six-core Opteron 2400 (64-bit, SIMD, large caches)	904,000,000	2009	AMD	45 nm	346 mm²	2,613,000
SPARC64 VIIIfx (64-bit, SIMD, large caches)	760,000,000^[84]	2009	Fujitsu	45 nm	513 mm²	1,481,000
Atom (Pineview) 64-bit, 1-core, 512 kB L2 cache	123,000,000^[85]	2010	Intel	45 nm	66 mm²	1,864,000
Atom (Pineview) 64-bit, 2-core, 1 MB L2 cache	176,000,000^[86]	2010	Intel	45 nm	87 mm²	2,023,000

Perhaps the N450's 123,000,000 includes the GMA3150, as the difference between the single core N450 and the D410/510 is 53 million. Therefore the 512KB cache is only part of the remaining 70,000,000 transistors

Shrinking this process to a 3nm node today might result in 1000x energy improvement

"On 2 March 2008, Intel announced a new single-core Atom Z5xx series processor (code-named Silverthorne), to be used in ultra-mobile PCs and mobile Internet devices (MIDs), which will supersede Stealey (A100 and A110). The processor has 47 million transistors on a 25 mm² die, allowing for extremely economical production at that time (~2500 chips on a single 300 mm diameter wafer)."

First generation power requirements

Although the Atom processor itself is relatively low-power for an x86 microprocessor, many chipsets commonly used with it dissipate significantly more power. For example, while the Atom N270 commonly used in netbooks through mid-2010 has a TDP rating of 2.5 W, an Intel Atom platform that uses the 945GSE Express chipset has a specified maximum TDP of 11.8 W, with the processor responsible for a relatively small portion of the total power dissipated.

An Atom Z500 processor's dual-thread performance is equivalent to its predecessor Stealey, but should outperform it on applications that can use simultaneous multithreading and SSE3.^[4] They run from 0.8 to 2.0 GHz and have a TDP rating between 0.65 and 2.4 W that can dip down to 0.01 W when idle.^[5] They feature 32 KB instruction L1 and 24 KB data L1 caches, 512 KB L2 cache and a 533 MT/s front-side bus. The processors are manufactured in 45 nm process.^[6]^[7] Poulsbo was used as System Controller Hub and the platform was called Menlow.

I don't recall any netbooks sold using the Z500x series, although they would have had excellent battery life, especially if integrated with an NM10-like chipset, although that came later on the slightly more power hungry Pineview:

Pineview microprocessor

On 21 December 2009, Intel announced the N450, D510 and D410 CPUs with integrated graphics.^[18] The new manufacturing process resulted in a 20% reduction in power consumption and a 60% smaller die size.^[19]^[20] The Intel GMA 3150, a 45 nm shrink of the GMA 3100 with no HD capabilities, is included as the on-die GPU. Netbooks using this new processor were released on 11 January 2010.^[19]^[21] The major new feature is longer battery life (10 or more hours for 6-cell systems).^[22]^[23]
This generation of the Atom was codenamed Pineview, which is used in the Pine Trail platform. Intel's Pine Trail-M platform utilizes an Atom processor (codenamed Pineview-M) and Platform Controller Hub (codenamed Tiger Point). The graphics and memory controller have moved into the processor, which is paired with the Tiger Point PCH. This creates a more power-efficient 2-chip platform rather than the 3-chip one used with previous-generation Atom chipsets.^[24]

So it may be apples to oranges when comparing a 123 million transistor integrated chipset to a Pentium with a 1MB IGP, but the Atom line from the start aimed to:

"The Bonnell microarchitecture therefore represents a partial revival of the principles used in earlier Intel designs such as P5 and the i486, with the sole purpose of enhancing the performance per watt ratio. However, hyper-threading is implemented in an easy (i.e. low-power) way to employ the whole pipeline efficiently by avoiding the typical single thread dependencies.^[3]"

Since the mobile phone products in the late 00's didn't quite pan out as Intel planned, with the failed launch of the Medfield, it's understandable to see this market disappear, but I feel like it's not just historians who will examine this era to see where there were missed opportunities, and room for a new approach, given what is known now.

I am not really sure external memory controller would be needed in the most advanced nodes. While LPPDR2 packages can be purchased off the shelf, it would be far more interesting to see what would be the first company to attempt to integrate a fully desktop-like solution on an embedded-like chip.

If I Had a Trillion Dollars

First of all, $1m is not enough money to design a 22nm chip from scratch, let alone 1.8nm.

But even though a foundry today can produce 58,000 Pentium class chips on 3nm, instead of 2500 on 45nm, no company is rushing to manufacture it, even though it would only take 51,725 wafers to produce a global supply of 3 billion. Even Z500-like Atoms could fit 1mm^2 with less eDRAM, but it wouldn't be an impossible or unreasonable idea in the near future.

Second of all, $1b might not be enough money to design a 1.8 nm chip.

Waterworld was $75 million over budget. It's unclear how much Intel, TSMC, Samsung, and Rapidus are investing on 1.4 & 1.8nm, but it's over a billion (Edit: since they're the foundries, The fabless companies pay the above amounts.)

I had previously made a diagram with a 1mm^2 chip concept (omitting many important busses and IPCs, but since I had left out the video processor at t the time, I decided I needed to specify that.

The original design:

As a reference Semi-Analysis in 2022 did an estimate of the size of the A15 E-Cores & the Apple M1 and M2 E-cores, likely modifications of ARM's stock cores due to them owning an architectural license.

This suggests there might be less space left over to integrate 8-16MB of eDRAM, but Apple's A15 cores and E cores likely have a lot more transistors than the early Pentiums, and to be fair, was on a 5nm process (not to be confused with Cortex-A15). And transistors aren't everything, but they can sometimes approximate relative performance (when using factors of 10)

The Quark D2000 serves as a useful starting point. There are a lot of linux distros that aren't even compatible with non-x87 chips lacking floating points (such as the 386DX & 486SX) (unlike the 486DX, the 386DX needed a 80387DX to run floating point operations. But linux can work on them (not that they are in mainline anymore)

Intel Quark D2000 Developer Board with Arduino Headers (source: Intel)

The floating point co-processor was removed from the Quark, presumably to lower the energy consumption. The 386 had around 275,000 transistors, and with the 486 having around 1.1million, some of that was due to the 8KB cache, although it's possible the Wikipedia figure uses the SX version rather than the DX with x87. But the Pentium had 3.3million (and 16KB of cache), so the size of the floating point is 2 million transistors at most, and likely under 500,000.

A GMA3150 is probably too large for a Pentium, but the i740 is listed as having 3,500,000 transistors:

i740	3,500,000	1998	Intel, Real3D	Real3D	350 nm

Model: i740
Date Released: 1998
Interface: AGP/PCI
Shader Model: N/A
DirectX: 6
Manufacturing Process: ?
Core Clockspeed: 55MHz
Memory Clockspeed: 100MHz
Memory Bus: 64-bit
Transistors: 3.5 million

Intel i740 and GMA

"You're no doubt familiar with Intel's integrated GMA graphics that litter the low-cost landscape today, but did you know Intel also came out with a discrete 3D graphics chip? The year was 1998 and Intel had grand plans of competing in the 3D market, starting with the i740. Part of the reasoning behind the release was to help promote the AGP interface, and it was widely believed that Intel's financial status and manufacturing muscle would give the chip maker a substantial edge in competing with Nvidia and ATI.
Instead, poor sales and an underperforming product led Intel to abandon the discrete graphics market less than 18 months after it had entered, which also meant the i752 and i754 -- two followup GPUs -- would never see the light of day. And ten years after its launch, at least one site would look back at the i740 as one of "The Most Disappointing Graphics Chips in the Last Decade."
The original i740 design lives on, however, as it provided the basis for the much longer lasting GMA line, which still exists today. Moreover, Intel has on more than one occasion showed interest in re-entering the discrete graphics market, and its Larrabee architecture could see the light of day as early as this year.
Fun Fact: Sales of the i740 were so bad that some accused Intel of anticompetitive practices for allegedly seling its 740 graphics chips below cost to overseas videocard vendors in order to boost its market share."

Considering it had typically between 4-8MB of SDR RAM, that would be more than enough to run Windows 98 or a lightweight linux in the era of Mandrake or SuSE 7.0, even a later distro that didn't require a whole lot

Fun note: The i752 and i754 cores were later used for the integrated graphics in the Intel 810 and 815 chipsets, respectively. Intel no longer hosts i752 drivers, and advises users of i752-based cards to use the 810 drivers.^[7]

Intel 810

The Intel 810 chipset was released by Intel in early 1999 with the code-name "Whitney"^[1] as a platform for the P6-based Socket 370 CPU series, including the Pentium III and Celeron processors. Some motherboard designs include Slot 1 for older Intel CPUs or a combination of both Socket 370 and Slot 1. It targeted the low-cost segment of the market, offering a robust platform for uniprocessor budget systems with integrated graphics. The 810 was Intel's first chipset design to incorporate a hub architecture which was claimed to have better I/O throughput^[2] and an integrated GPU, derived from the Intel740.^[3]

There are five variants of the 810:

810-L: microATX (4 PCI), no display cache, ATA33 hard disk interface.
810: microATX (4 PCI), no display cache, ATA33 and ATA66.
810-DC100: ATX (6 PCI), 4 MB display cache (AIMM), ATA33 and ATA66.
810E: added support for 133MHz FSB, Pentium III or Celeron "Coppermine-EB" Series CPU.
810E2: added support for Pentium III and Celeron CPUs with 130 nm "Tualatin" core, ATA100 and 4 USB 1.1 ports.

Intel 810 attempted to integrate as much functionality into the motherboard as possible. Features include:^[2]^[4]

66 and 100 MHz bus support
2 USB ports
An integrated graphics processor.
Based upon the Intel740 2D/3D accelerator (i752).
Optional dedicated video RAM cache or use of system RAM.
Hardware motion compensation for DVD playback.
Digital video output
AC'97 modem and audio

The hub design consisted of three chips, including the Graphics & Memory Controller Hub (GMCH), I/O Controller Hub (ICH), and the Firmware Hub (FWH). These components are connected by a separate 266 MB/s bus, double the previously typical 133 MB/s attachment via PCI-Bus. The added bandwidth was necessary because of increasing demands data transfer between components.^[4]
The early GMCH (82810) chips (A2 stepping; S-spec numbers can be found on the fourth line of the chipset: SL35K, SL35X, SL3KK, SL3KL, Q790, Q789) could only support Celeron processors as they were unable to handle SSE instructions correctly.
810 supports asynchronous bus clock operation between the chipset and CPU (front side bus) and the system RAM. So, if the machine is equipped with a Celeron that uses only a 66 MHz bus, PC100 SDRAM can still be taken advantage of and will benefit the IGP.^[4]
Boards based on the chipset do not have an AGP expansion slot, leaving the user to make do with PCI for video card options. 810-based boards include an AMR expansion slot. Additionally, the integrated graphics does not support 32-bit graphics mode, forcing the user to downsample the 810's standard 24-bit mode to 16-bit in order to run most games or full screen DirectX/OpenGL programs; many games will automatically downsample the output to 16-bit upon loading, however others will simply exit with or without an error or even crash due to the 24-bit mode not being supported by the game. The onboard graphics' performance in games was also unsatisfactory, and many games of that time had to be run at low resolution and low detail levels to be playable.

So it's likely that the i810 chipset used a lot more transistors than the 740 to accommodate many I/O buses like USB, but it's not impossible that many of the core features of later GMA and HD revisions can be found in the earliest integrated 840 series and still fit into 1mm^2. The cards are over 25 years old, so I am curious if the patents expired...I remember one of the first desktops that had a USB port- it was a Pentium II or III, and it had ONE USB 1.1 port in the back of the PC. It might have been a Pentium 3- and it was very useful for my portable MP3 player and digital camera, but it's amazing how times have changed- desktops now routinely include 4-10 USB ports including the back and front.

Northbridge

undefined

ARM Solutions (2012):

Microchip PIC24

The PIC24FJ256DA206 (Figure 1) 16-bit MCU features three graphics hardware accelerators to facilitate rendering of block copying, text and unpacking of compressed data and a color look-up table. The IC has 256 Kbytes of flash and 96 Kbytes of SRAM.

The chip also has an Enhanced Parallel Master Port (EPMP) for up to 16 Mbytes of external graphics RAM, if needed. This device has five timers, USB v2.0 On-The-Go, UART, SPI, and I2C I/O, 24 channels of 10-bit A/D conversion, and a real-time clock. It is said to be one of the lowest cost graphics solutions for QVGA and WQVGA displays.

PIC24 graphics solution

Fujitsu MB86R01

The Fujitsu MB86R01 ‘Jade’ SoC (Figure 2) has a 32-bit ARM926EJ-S CPU core with a high-performance graphics display controller core with a 320 MHz internal memory frequency and enough performance for display resolutions up to 1024 x 768. It also features six layers of overlay window displays, with an alpha plane and constant alpha value for each layer, and two separate video-capture units that support YUV, RGB, ITU656, and other formats.

Jade targets automotive graphics applications, but is equally suited to many other high performance applications. The device features a hierarchical bus system that isolates high performance functions, such as 3D graphics processing, from routine operations such as low speed I/O. The ARM core runs at 333 MHz and the graphics core at 166 MHz. The external memory controller supports 302 MHz DDR2.

The chip supports two video inputs (YUV/ITU656 or RGB) and enables both upscaling and downscaling of a video image. It also can support two unique displays, and has two CAN ports, A/D and D/A converters, IDE, USB, SPI, FlexRay, and a Media LB port.

Fujitsu’s MB86R1 (click for full-size)

Atmel SAM9G

The Atmel SAM9G10 has an ARM926E-J-S core with DSP extensions and Java acceleration running at up to 400 MHz. It features an advanced graphics LCD controller with 4-layer overlay and 2D acceleration (picture-in-picture, alpha-blending, scaling, rotation, color conversion) and a 10-bit A/D converter that supports 4- or 5-wire resistive touchscreen panels. The chip has a 64-Kbyte ROM, 32 Kbytes of high-speed SRAM, and a 32-bit external bus memory interface supporting DDR2, static memories, and has circuitry for MLC/SLC NAND flash with ECC up to 24 bits. The device has no flash, but 32 Kbytes of fast ROM and 16 Kbytes of SRAM.

The SAM9G graphics controller supports 1 to 24 bits/pixel with scaling up to 800 x 600 pixels and has a 384-byte asynchronous output FIFO. The chip’s 10-layer bus matrix coupled with 2 x 8 DMA channels and dedicated DMAs for the communication and interface peripherals ensure uninterrupted data transfers with minimal processor overhead.

Multiple communication interfaces include a soft modem supporting the Conexant SmartDAA line driver, HS USB, FS USB Host, a 10/100 Ethernet MAC, two HS SDCard/SDIO/MMC interfaces, USARTs, SPIs, I2S and TWIs.

Texas Instruments AM3358

The Texas Instruments Sitara Am3358 Cortex-A8-based SoC is aimed at portable navigation devices, hand-held gaming and educational devices, home and building automation equipment, and other devices that require portability or low power consumption. It features a touch screen controller user interface, a 3D graphics accelerator (20 million triangles per second), an LCD display controller and 7 mW standby power. The LCD controller consists of two independent controllers, the raster controller and the LCD interface display driver (LIDD) controller

ARM graphics processing

Another thing you might consider is ARM Mali graphics. At present, this graphics processor is used in a number of smart phones and tablet computer SOCs, such as those for the Samsung Exynos 4212 phone and Galaxy Tab, but it is not yet available in a microcontroller – stay tuned. Obviously, one feature of Mali is very low power and, based on the graphics quality of the end products mentioned, its performance is good as well.

The integrated display controller is capable of directly driving almost any LCD display with an RGB or STN/CSTN interface, which includes a wide range of TFT, STN, and some OLED displays. These features effectively create a complete graphical subsystem that is fully integrated on the same chip as the MCU, driving up to 640 x 480 (VGA) display resolution.

So many options, so little time...

After reading the several ARM options, the way Intel standardizes their chipsets across platforms (at least within a generation) seems a lot less convoluted. Understandably, having an IP core to license from ARM isn't the same as needing to make a product compatible with customers at least wanting some semblance of standardized solutions.

That said, many of the Southbridge and Northbridge functions have been obsoleted by process shrinkage, which isn't to say it needs a lot more cruft. It's very likely that some rudimentary upgrades from the i810, perhaps including 32-bit graphics, which would support would benefit compatibility, and VRAM that isn't shared, such as eDRAM (call it eVRAM to distinguish it from system memory), without being too slow so that it is 4MB of discrete video and not using the precious system RAM.

A lot of the innovations of the Atom chipset were found in the Pentium M line, and transferred via the Stealey line. I was vaguely aware of the Pentium M in college- I had a roommate who had a desktop, or laptop with one, although I could never determine if it was faster than a Pentium IV or a dual core, but I realized it had some efficiency improvements. I checked if there were any laptops that used integrated GMA 915, and it seems like the Pentium M also used them:

The Pentium M processor, designed for laptops,
did not support Hyper-Threading (HT); it prioritized power efficiency with features like SpeedStep, while HT was a hallmark of the desktop Pentium 4 (NetBurst architecture) for multitasking, though its benefits varied, with the later Core series vastly improving HT implementation. Hyper-Threading allows one physical core to act as two logical cores, improving throughput by using idle execution units, but early P4 versions sometimes saw slowdowns due to cache contention, a problem largely fixed in later Core processors.
Key Differences & Features:
Pentium M (Banias/Dothan): Focused on mobile efficiency, using the P6 architecture (like Pentium III) with a shorter pipeline, low power, and SpeedStep for dynamic clock scaling, but lacked HT.
Pentium 4 (NetBurst): Aimed for high clock speeds with very long pipelines, and introduced Hyper-Threading (SMT) to make one core appear as two, boosting parallel performance.
Hyper-Threading (HT): Duplicates instruction-tracking parts of a core, letting two threads run concurrently, filling execution units when one thread stalls.
Performance Impact: While great for heavily threaded tasks (video editing, rendering), early P4 HT could hurt single-threaded apps due to shared resources like the cache, causing "cache thrashing".
In essence: If you have an older mobile CPU with "Pentium M" in the name, it doesn't have HT; if it's a "Pentium 4" or later "Core" processor, it likely does (or did, before Intel temporarily removed it from P-cores in newer designs)

While the Northbridge became obsolete, a redesign of the graphics bus to the CPU would not really cause a Northbridge-like bus to disappear in a modern process node.

https://book.huihoo.com/pc-architecture/chapter22.htm

But on reading about improvements to the NM10, a modern Northbridge isn't going to be discrete:

The northbridge was replaced by the system agent introduced by the Intel Sandy Bridge microarchitecture in 2011, which essentially handles all previous Northbridge functions.^[10] Intel's Sandy Bridge processors feature full integration of northbridge functions onto the CPU chip, along with processor cores, memory controller, high speed PCI Express interface and integrated graphics processing unit (GPU). This was a further evolution of the Westmere architecture, which also featured a CPU and GPU in the same package.^[11]

What if there is way to make all the IGP functions simplified for a very lightweight framebuffer. Containing many of the integrated functions, but for 256 colors instead of 24 bit, and far less cache. In other words, a Sandy bridge manufacturing process (32nm, GPU on same die) but using an Intel740-sized GPU (4MB).

It would not, however, need any PCI-express, as most of the bandwidth would be under the GB/s speeds:

https://en.wikipedia.org/wiki/Direct_Media_Interface

In computing, Direct Media Interface (DMI) is Intel's proprietary link between the northbridge (or CPU) and southbridge (e.g. Platform Controller Hub family) chipset on a computer motherboard.^[1] It was first used between the 9xx chipsets and the ICH6, released in 2004.^[2]^: 1 Previous Intel chipsets had used the Intel Hub Architecture to perform the same function, and server chipsets use a similar interface called Enterprise Southbridge Interface (ESI).^[3] While the "DMI" name dates back to ICH6, Intel mandates specific combinations of compatible devices, so the presence of a DMI does not guarantee by itself that a particular northbridge–southbridge combination is allowed.

DMI is essentially PCI Express, using multiple lanes and differential signaling to form a point-to-point link. Most implementations use a ×8 or ×4 link, while some mobile systems (e.g. 915GMS, 945GMS/GSE/GU and the Atom N450) use a ×2 link, halving the bandwidth. The original implementation provides 10 Gbit/s (1 GB/s) in each direction using a ×4 link. The DMI provides support for concurrent traffic and isochronous data transfer capabilities.^[2]^: 3^[4]
DMI replaced FSB (Front-Side Bus) which was eliminated in 2009.^[5]

So I present to you, a Sandybridge-style IGP process, but for an i752 (a picture of the 810 was included since that was the first integrated Intel GPU chip that wasn't an AGP card) ported to 2-3nm:

RAM not drawn to scale (but would likely flank the chips in 8MB modules and use most of the 1mm^2 die. This would most likely fit in 3nm or less, if using HBM3e, while SRAM might only fit 4MB). While it might not all fit 1mm^2, the next best option is 2mm x2mm (4mm^2) and could still produce ~14,700 chips per 300mm wafer (Edit 1/21/26: 1mm x 2mm wafers and <1mm^2 are also possible, resulting in potentially up to 29,000+ chips per wafer). (Edit 2/21/26: As you recall, I quoted $725m for a 2nm chip. That figure is likely the cost of an 100mm^2 Apple A18 chip , rather than a tiny 1mm chip. Verification of logic on a chip 100x larger is likely to cost more, but many costs are nonrecoverable. It could still cost over $500 million to produce such a 1mm chip, but likely less once optimization of yields and processes from other chips can be recycled to much simpler designs- known as Wright's Law.) For a list of videos that show Operating systems (linux, windows, etc) running in less than 32MB RAM and around 32MHz as pictured above, see 386s @16-40MHz, and 486+586s here, here and here.

Pentium II Klamath (32-bit, 64-bit SIMD, caches)	7,500,000	1997	Intel	350 nm	195 mm²	39,000
AMD K6 (32-bit, caches)	8,800,000	1997	AMD	350 nm	162 mm²	54,000
F21 (21-bit; includes e.g. video)	15,000	1997^[50]	Offete Enterprises	?	?	?

Pentium II Deschutes (32-bit, large cache)	7,500,000	1998	Intel	250 nm	113 mm²	66,000


Hitachi SH-4 (32-bit, caches)^[61]	3,200,000^[62]	1998	Hitachi	250 nm	57.76 mm²	55,400
ARM 9TDMI (32-bit, no cache)	111,000^[29]	1999	Acorn	350 nm	4.8 mm²	23,100
Pentium III Katmai (32-bit, 128-bit SIMD, caches)	9,500,000	1999	Intel	250 nm	128 mm²	74,000

Pentium II Mobile Dixon (32-bit, caches)	27,400,000	1999	Intel	180 nm	180 mm²	152,000

I also realize that the i740 was more commonly paired with a Pentium II or III than a Pentium 1, and this isn't meant to strictly limit it to that generation, but the idea is to try to maximize the 1mm^2 die space utilization with a chip that doesn't use many more transistors and pair it with a video processor that improves performance the most. While the i752 might be bottlenecked by a P54C, the ratio of CPU transistor space to GPU transistor space is 3.3m (P54C) to 4 million for the i752/740, and that might confer a better performance per watt ratio than a heavier PII (which used up to 27 million transistors) Some adaptation to the P54C bus would be needed to ensure the i752 isn't bottlenecked too much- it would most likely require a major restructuring of the chip (considering this is the main topic of this article), but it would appear that a 1:1 ratio of CPU to GPU transistor space might be more beneficial than a GPU-heavy processor.

Some reviews of the Intel 740: https://vintage3d.org/i740.php#sthash.TZcKXH40.dpbs

https://www.vogons.org/viewtopic.php?t=70454

https://www.youtube.com/watch?v=UROVoautAyM

I also thought about how this integration is one step of the Makimoto wave, but since it doesn't happen every generation, its integration is not obvious, especially for a backport of a 2010 process to a 1995 architecture.

A timeline illustrating the balance between standardization and customization in technology from 1967 to 2017, highlighting key developments.

A diagram illustrating the balance between customization and standardization, highlighting factors like cost effectiveness and differentiation.

"Today, we are moving into a period which could see a return to more custom architectures, driven primarily by low-power trends. Changes in processes technologies have also caused dynamic power to return to importance because the finFET resolves what had been a growing issue with leakage power. With a growing concern for dynamic power optimization, we may start to see more functions migrating away from general purpose processor solutions."