this post was submitted on 06 May 2025
1174 points (96.9% liked)
Programmer Humor
23119 readers
2217 users here now
Welcome to Programmer Humor!
This is a place where you can post jokes, memes, humor, etc. related to programming!
For sharing awful code theres also Programming Horror.
Rules
- Keep content in english
- No advertisements
- Posts must be related to programming or programmer topics
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The client wants to drag and drop their own personalized excel file with no guaranteed formatting or column order or data contract in order to import their data into our system <3
Needs more AI to randomly guess what the columns might be
I love how this is a universal experience.
Jesus, this gave me war flashbacks.
Please, do elaborate. Let others feast on your suffering.
May I?
A controlling department wasn't granted any money for digitializing their workflow.
So these guys created their own solution(s!). Things like dedicated "user interfaces" loading data from tables created by hand. After years these people realized that data formatting is quite the issue.
They started to put random rules into different tables:
Two empty lines: New Group Data Record. One empty line: New Subgroup Data Record.
Excel tables aggregating this data via hardcoded links.
A dedicated table to start calculations on parent tables.
They mutated data like this:
Load data from excel files into one. Manually delete, add or change lines (or columns). Start a collection run from dedicated excel file and load new excel file data and replace old excel file data.
They had files where 'it was easier to read' when they pivot the data. This was troublesome since some values are intermediate results. Dropping one column may imply dropping another one as well.
All workflows required manual alignments along the way.
They were only able to process 10% of the data from a year within a year. Managing millions in cash.
Their data input came from different internal sources. Programs which were written two decades ago once and without any tests. Talking like VB, macro's from host servers and copy-pasta data from other internal programs.
And don't get me started on customer tables.. They created a zip-code encoded filesystem hierarchy where each customer data (you guessed it, excel file) was renamed and then saved. In each of these directories where randomly named files if something went wrong; So no actual file patterns to rely on.
I respect them.
They creates a diagram for their tables with word. Word! (Didn't know either: you can select the web view in the bottom right corner and you get an infitive canvas..) Madness.
Holy cow :O
I had a potential client, an accountant. They had their own, uh, system within a spreadsheet. They wanted me to program another system to be able to send their spreadsheet output into our governments IRS. Did a little back-and-forth but could not convince them to drop the idea.
Strangely enough we actually solved this problem with AI a few months back. We upload the excel file to Gemini and have a prompt to extract the data we need in a specific json format. And it works surprisingly well.
How well? Bet your life on it well, or "fewer hallucinations than we would have guessed" well? I've considered and toyed around with openAI models for logging supply room check offs in a JSON format and it went better than I hoped but worse than I needed.
Really well. Temp turned down all the way, and Gemini has this new feature to run and execute code.... Not function calling... It can write a small python script, run it and return the output.
So our prompt explains the excel spreadsheet, then tell it exactly the format we need it in, and then tell it to use python and pandas to read in the CSV, clean it up and reshape it the way we need it to match what we expect and voila.
So hallucinations are not really and issue with the data as it's simply writing code which then deterministically processes and returns the data.
Edit to add more info: basically Gemini can create and run a lambda function on the fly. And if you're a coder you can really guide the prompt. Eg "load this into pandas. Then remove all the empty columns. Also remove the total rows. Now unpivot the data so the months are not columns but in separate rows with a column called month.
You get the idea.
Google thanks you for your data.
Holy smokes, how did I not think to turn the temperature down?! That's smart! Thanks for getting back to me!
The second one. It is always the second one.
It would still have to be in at least somewhat of a consistent format. Even a human would require that.
If they're just going to write the details however they feel on any particular day and then just expect someone or something to be able to interpret that they're going to have a bad time.
Do we have the same client?
Everyone has and is that client.
Yeah but when I'm like that it's justified
Or headers. Just unlabeled data in a CSV.
Yep, RIP