The original article (OpenAI 開源 GPT-3 系列模型時間已近) was written in Traditional Chinese. Below is the version translated by AI.
Previous Predictions
Following the leak of internal meeting information from OpenAI on May 29, 2023 (OpenAI's plans according to Sam Altman), I wrote an article on June 4th, titled Discussing OpenAI's Open Sourcing of GPT-3 and Recent GPT-4 Quality Issues. In it, I predicted that GPT-3 would be open-sourced by the end of this year (2023) or early next year. Below are excerpts from my article:
His real aim is actually to suppress other open source works, hoping that everyone will come back to rely on OpenAI models, rather than develop their own, which in the end even OpenAI can't stop.
When will it be released? When open source models begin to pose a threat to GPT-3.
When is that??? Given the pace of development of open source models, I predict that it will be within this year, and at the latest next year.
Recently, Llama 2 and Mistral have genuinely shown the strides in open-source development, effectively starting to challenge GPT-3.5's standing.
Current Information
Today, I came across OpenAI's latest posts on Frontier risk and preparedness and Preparedness Challenge. One section read:
Imagine we gave you unrestricted access to OpenAI's Whisper (transcription), Voice (text-to-speech), GPT-4V, and DALLE·3 models, and you were a malicious actor.
I sense signals that an open-source release is imminent.
(It's odd that OpenAI previously used the notation DALL·E 3 instead of DALLE·3.)
Bold Forecasts
I believe it serves as a beginning signal that OpenAI is gearing up to open-source their models and is currently gathering public opinion against the move to prepare their counter-arguments. I boldly predict that OpenAI has officially taken steps towards open-sourcing. Here are my predictions:
OpenAI will announce the open-sourcing on November 6th at OpenAI DevDay.
Whether they will officially allow downloads on the same day is still uncertain. I suspect the announcement will be made, but the actual release will occur at a later opportune time (still predicting late this year to early next year).
What will be released is GPT-3, not GPT-3.5. The rationale behind this is to roll out the technology in phases, rather than revealing all the cards at once. This strategy applies incremental competitive pressure on rivals. Additionally, releasing versions in stages generates more news and conversations. (And setting the stage for the future release of GPT-3.5.)
I expect that it will include RLHF protections, demonstrating OpenAI's responsible approach.
Since GPT-3 was developed in 2020, there's a three-year gap between it and the currently popular new models. I speculate that the runtime environment will differ somewhat from that of recently open-sourced models. However, OpenAI might have already rewritten it to seamlessly integrate with the existing open-source ecosystem.
Due to the need for open-sourcing, technical details will also be thoroughly disclosed. However, I speculate that the training datasets will still not be explicitly revealed.
I believe that OpenAI also has smaller models with fewer parameters developed during the operation of ChatGPT over the past year. I predict these will also be released, offering a range of models with varying parameter sizes.
As soon as it’s open-sourced, numerous experts will adapt it to run on various platforms.
I also speculate that on the same day, the next generation of Whisper (transcription) will be announced, along with a new text-to-speech model.
The above is purely speculative content, take it as a reference.
This prediction's main part about open-sourcing unfortunately turned out to be incorrect (but I still believe that this day will come in the future), but two other parts were spot-on (though I haven't seen more information about the text-to-speech part yet):
> I also speculate that on the same day, the next generation of Whisper (transcription) will be announced, along with a new text-to-speech model.