During the Google Cloud Next 2024 event in Las Vegas, Google announced that Gemini 1.5 Pro would be available to all users. A preview of Gemini 1.5 Pro has been made available to the public with a 1 million context window. There is no longer a waitlist for the model and there is no need to sign up.
When I tried accessing the Gemini 1.5 Pro model from a new Google account, the model was immediately available. All of this is free of charge.
As a result, you cannot use Gemini 1.5 Pro on the Gemini portal. For now, you must visit aistudio.google.com (visit) to access the model. It will be available on the Gemini portal after a few months of public preview. To use the model, you will most likely need a subscription to Gemini Advanced.
Despite being a middle-tier model built on the MoE architecture, the Gemini 1.5 Pro model easily beats the Gemini 1.0 Ultra model. As well, Gemini 1.5 Pro demonstrated impressive capabilities in our comparison with GPT-4. You can expect Gemini 1.5 Pro to perform better than GPT-4 and Claude 3’s Opus model when it launches on the Gemini portal.
The new version of Gemini 1.5 Pro also supports audio files. Uploaded audio files or videos can be listened to by the model without generating a transcript manually. In audio meetings or discussions, it can be incredibly useful for finding quick and structured information.
1) I provided Gemini with an audio of a conversation between a college student and a librarian. I asked Gemini to summarize the audio in a structured format that included all the important information mentioned in the audio. I also asked a follow-up question. pic.twitter.com/gNCIOTnPsR
— Dhruv (@dhruvvvvvvvvv_) April 9, 2024
Now that audio files are supported in Gemini 1.5 Pro, it becomes a powerful multimodal model with a context length of 1 million tokens. The audio processing capability of the Gemini 1.5 Pro model was tested. As it turned out, here’s what happened.
How to Process Audio Files on Gemini 1.5 Pro
Head over to aistudio.google.com (visit) in a browser.
Select the “Gemini 1.5 Pro” model from the drop-down menu.
You then need to upload the audio file by clicking on the “Audio” menu in the top row. There are several audio file formats supported by this program: FLAC, MIDI, MP3, M4A, OPUS, OGG, OGA, WAV, and MID.
The audio file will be processed and tokens will be consumed.
Using the audio, Gemini 1.5 Pro will find information from your questions and respond accordingly.
It generates transcripts in a structured format with labels for each speaker. It doesn’t hallucinate at all.
So here’s how Gemini 1.5 Pro processes audio files. The Google DeepMind team has developed a powerful model and I am excited to have it available free of charge to the public. Feel free to give it a try and let us know what you think in the comments.